CLASSES EVALUATION – METHODS AND TOOLS

This study presents a method, tools, course and results of foreign language classes evaluation conducted in the summer semester 2012/2013 in the Andrzej Frycz – Modrzewski Krakow University. Because a new evaluation procedure has been implemented at the University, the former method – based on paper forms filled in by the students – was abandoned. On the surveyanyplace.com website, a free account has been registered and the form of the evaluation questionnaire has been inserted. This coverage presents results of a taxometric analysis aimed at checking the degree of mutual correspondence (correlation) between certain criteria and instancing a graphic presentation of the evaluation results in a multidimensional perspective. In order to classify the grading criteria, the Ward’s agglomerative method, along with Euclidean metric as a measure of criteria similarity, have been used. Calculations have been made with the use of Statistica package. Results of the questionnaire show that foreign language teaching at the Andrzej Frycz Modrzewski Krakow University is conducted professionally and on a high factual level.


INTRODUCTION
This study presents a method, tools, course and results of foreign language classes evaluation conducted in the summer semester 2012/2013 at the Andrzej Frycz -Modrzewski Krakow University.Because a new evaluation procedure has been implemented at the University, the former methodbased on paper forms filled in by the studentswas abandoned.It was essential to create a new evaluation system that would be an element of the education quality system.
The following variants of conducting the evaluation procedure have been considered: 1.
Internet evaluation, based on making the evaluation form available on the virtual dean's office website and calling on students to take part in the evaluation.

2.
Internet evaluation based on devoting a part of the last classes (15 -25 min) to fill in the evaluation form available on the Internet.

3.
Evaluation with the use of mobile devices, and to be more specifictext messages sent by the students during the last classes from their mobile phones; the messages would include answers to questions projected on a slide.
Undoubtedly, the best solution is the first of the above mentioned procedures (Patton, 1987).Unfortunately, because of formal and legal reasons, it is impossible to make students participate in a questionnaire, e.g.conditioning passing a semester on taking part in the evaluation.As a result, the socalled 'reflexivity' of the questionnaire is quite small in this case, which, when there are few answers, makes their analysis pointless, because of the lack of the data's representative nature (Millman, & Darling-Hammond, 1990).
The next two solutions guarantee reflexivity of opinions on the attendance level.Because these are the last classes of the semester, when teachers often give final grades, attendance on those classes is relatively high.
Each of the two procedures has minor shortcomings.In the second variant, Internet access is essential.If classes take place in a classroom without computers, it is necessary to book a lab, an Internet café or a room in the library.
In the third variant, during the last but one class students must be warned about the evaluation and asked to bring their mobile phones with them.It may be argued that additional costs (text messages) are imposed on students.This idea may be considered arguable by pointing out that these are very small costs, much lower than e.g. the costs connected with travelling to school, buying books, materials for notes, etc., which students must take without expecting any compensation.
There are also possible solutions which are a combination of the above mentioned variants.For example, joining the Internet procedure (variant 1) with one of the two remaining variants.The drawbacks of those procedures are difficulties connected with identifying those students who would like to hand back two opinions on one teacher.
In the discussed case, the second variant of evaluation procedure was chosen and treated as an experiment.Next semester, it is planned to conduct the evaluation using the third procedure.Only a comparative analysis of the two approaches will allow to judge which of these variants is going to be chosen as the standard one.This coverage presents results of a taxometric analysis aimed at checking the degree of mutual correspondence (correlation) between certain criteria and instancing a graphic presentation of the evaluation results in a multidimensional perspective.In order to classify the grading criteria, the Ward's agglomerative method, with Euclidean metric as a measure of criteria similarity, have been used.Calculations have been made with the use of Statistica package.
The survey in 2013 lasted six weeks, starting from the middle of May.Altogether, 1390 students' opinions on 15 teachers have been gathered (more than 90 questionnaire forms for one teacher, on average).Range of the number of opinions characterizing specific teachers is considerable and is included in the bracket between 244 and 37.It must be mentioned that about 10% of the opinions have been eliminated; those were opinions for which the questionnaire code did not fit the password, and also opinions for two teachers which were small in quantity (below 30).

EVALUATION TOOLS
As the opinion gathering tool, an application available on https:surveyanyplace.com was chosen, mainly because it is possible to use it on mobile devices.The website contains detailed instructions on creating forms and monitoring the course of acquiring, saving and analyzing answers given by the surveyed.
The chosen tool has achieved its aim.The application did not cause any problems, neither on the stage of creating the electronic form, nor while conducting the evaluation (loggingin, saving and exporting data).
The second element of the evaluation was the questionnaire form.It consists of three parts:  10 questions evaluating teacher's work, each with an identical 5-point grading scale (during the analysis letter answers A, B, C, D, E are changed into numbers 5, 4, 3, 2, 1),  2 questions concerning usefulness of the e-workbook (if it was used during classes),  a part containing general descriptive comments.
Apart from the answers to the above mentioned 12 questions, students inserted into the form a code identifying the teacher, the student group (year, the field of studies, form and degree of studies).The code has a form of a 6-element sequence of small letters, e.g.'abcdef', and is a synthetic identifier of a class.In other words, it is an equivalent of certificate questions.The advantage of using codes instead of certificate questions is the possibility of exercising easier control over the data correctness.The system allows to fill in the form only after congruity of the code inserted by the student with the access code provided by the person who conducts the evaluation has been verified.The password is removed by the person conducting the evaluation after all the students have completed the evaluation process.
Subsequent symbols of the code have the following meaning: a 1language , e.g.a=angielski (English), n=niemiecki (German), etc.In the second part of the data record, there are answers to individual questions covered by the form, as well as an arithmetic mean of the grades given in the first nine questions of the evaluation questionnaire.When calculating the mean, two last questions connected with only a part of classes, and question 10 which concerns evaluating the teachers' work in a smaller degreewere not taken into consideration.
The questionnaire form was saved as an electronic document, according to the demands of the Internet application.Graph 1 shows, in the form of screenshots, the welcome page of the e-form as well as a few example questions from the questionnaire.
The person filling in the form may change their answers until they click the FORM COMPLETE icon, situated on the last screen of the e-form.

THE FORM OF PRESENTING EVALUATION RESULTS
Evaluation results are presented in four forms: 1. Individual opinions.
Advanced forms of opinion presentation.

INDIVIDUAL OPINIONS
Detailed evaluation results for each question have been presented in tables containing layouts of students' answers for every teacher, a total of given answers [N] and an arithmetic mean of the grades given.When pointing the means, a 'school' scale was used, i.e. the highest grade Teachers in tables are arranged according to declining values of average grades.Moreover, in a separate line in the tables, total averages (set on the basis of all 1390 students' opinions) are quoted.That is why it is easy to see which teacher is below or above the average level of students' grades.
Table 2 shows example collation of evaluation results for question 1 concerning punctuality in conducting lessons.If it is necessary, data from the tables may be presented in the form of graphs.

OPINIONS IN DISAGGREGATED CROSS -SECTIONS
Pieces of information included in the questionnaire code allow to set parameters that characterize opinions in more disaggregated student groups sectioned according to:  form of studies (full -time, part-time),  degree (undergraduate and graduate),  field of studies, student group, language being taught, etc.
For example, in tables 4 and 5 information for the first both variants was given.
If a teacher had classes with students from only one form or one degree of studies, his/her name appears only once.Therefore, the number of lines in the tables varies.To make interpretation easier, parameters for those teachers who have classes with full time and first degree studies were marked yellow.The last line of the tables presents differences between average grades of the teachers who teach fulltime and parttime students, as well as between those who have classes with students of first and second degree studies.
The quoted tables show that from the point of view of all the criteria, fulltime students are more demanding than parttime students, and graduate students and more demanding than undergraduate students.

TAXONOMIC ANALYSIS OF THE EVALUATION RESULTS
The data from the evaluation questionnaires can be analyzed taxonomically, which will check the extent of the correlation of the evaluation criteria and the graphic presentation of the evaluation results with a multidimensional approach.
The starting point of the analysis is the matrix of average ratings, as determined from the particular criteria (questions) in the questionnaire (table 6).In the following tables and illustrations symbols from X1 to X13 have been introduced to mark the subsequent evaluation criteria, as well as the symbol X to mark the average rating of the first 9 criteria.Moreover, the number of opinions referring to particular teachers -Nhas been adopted as an additional feature in the taxonomic analysis.
All the information quoted above is in line with the following conclusions: 1.The opinions formed in questions X 1 and X 2 -X 10 are mutually highly positively correlated.

2.
The criterion X 2 (punctuality) is positively correlated only with two other evaluation criteria: X 1 (involvement) and X 6 (the pace of the lesson).

3.
There is no correlation between the criteria X 1 -X 10 and the opinions on the usefulness of the e-platform (X 12 and X 13 ).4.
There is no correlation, either, between the number of opinions N and any other criterion analyzed (with the exception of the negative correlation between the number of answers and the rating of the e-platform attractiveness.It means that the bigger is the number of the people evaluating, the worse are the opinions of the attractiveness of the e-platform).5.
Negative correlation has been found between the application of the eplatform (X 11 ) and the opinions included in the questions X 6 -X 10 .It means that the students using the e-platform usually form worse opinions in the criteria X 6 -X 10 than the students who do not have any classes on the e-platform.
The next part of the analysis encompasses a classification of the evaluation criteria in view of their mutual correlation.The objective was to determine whether the evaluation could be limited to a smaller number of questions, given their high mutual correlation.It is a problem of redundant information and unnecessary multiplication of the same information in the subsequent criteria.
Ward's agglomeration method, with the Euclidean metric as the measure of the similarity criteria, has been applied in the classification of the ratings criteria.The calculations have been made using Statistica package.The results thus acquired have been shown in illustration 2. There is a dendrogram showing the associations of all the 11 criteria.The set of criteria analyzed here can be divided into 3 groups: 1.
The first group encompasses 5 criteria (X 6 -X 10 ).The students' opinions formed in these questions are usually more critical than in the other points of the questionnaire.

2.
The second group includes 3 criteria (X 3 -X 5 ), in which the opinions of the students are the closest to the results of the average aggregate (X).

3.
The last group includes two criteria X 1 and X 2 which the students rated the most favorably when compared to the other criteria.
Generally speaking, the classification of the criteria was based on the division into the criteria with the highest ratings for the teachers (X 1 and X 2 ), the average ratings (X 3 , X 4 , X 5 ) and the relatively lower ratings (X 6 -X 7 -X 8 -X 9 -X 10 ).

Correlation factors between the evaluation criteria included in the evaluation questionnaire
The correlation factors marked are essential when p < ,05000 N=15 (Lack of data was removed in some cases) Av.

ANALYSIS OF THE TEXT OPINIONS
Text opinions, which the students include in the open question (appendix A), are an interesting element of the evaluation.Altogether there were 281 texts -210 texts with different wording and 71 identical ones, e.g.no comments, from a single word to a few sentences.There is a reference to the particular teacher that is being evaluated next to every remark.
NONEa text meaning that the student has no remarks to make (9) 4.
MET = METHODSthe usefulness of the methods of teaching used in class (17) 6.
CLIPa comment on the usefulness of the e-workbook (15) 8.
ORGsuggestions on how to modify the lesson (31) On the basis of the numbers of the remarks, quoted in brackets, the following conclusions can be drawn: 1. Very good and good remarks prevailthere are 47% of them (131 out of 281), or 60% (129 out of 210) when identical remarks are eliminated from the list.

2.
Negative opinions are scarce (only 2) and they constitute less than 1% of the remarks.

3.
The remarks on the organization of the lesson constitute a large portion (over 30, i.e. 12/15%) of all.4.
Relatively few remarks (7) were made in the section PODR (coursebooks).

FINAL REMARKS
The results of the questionnaire allow one to draw a conclusion that language courses at the Andrzej Frycz Modrzewski Krakow University are taught in a professional way and at a high level.The teachers are involved, they explain the tasks set and the issues presented in a clear way, the classes are run in a friendly atmosphere, at a right pace and in an interesting style, andmost importantlythey are efficient.The teachers are trying to motivate the students the best they can and they fairly evaluate their workload and achievements.On a scale from 1 to 5, the average rating of the language courses varies from 4 to 5, which is a very high result, especially when seen in the context of the evaluation process of other classes, the basic and the professional ones.
Apart from the sum of points for the answers to ten basic questions evaluating the teacher's work, there have been two additional questions given that referred to the teaching using the electronic form of the workbook and one open question that allowed the students to express their general views on the language courses and the teachers running these classes.
In the last part of the questionnaire special attention should be given not only to the very positive and often repeated opinions about the teachers, but also to the remarks about the teaching process that would allow the management of the Foreign Languages Center at the Andrzej Frycz Modrzewski Krakow University to improve it.
Several critical opinions have been noted, as well, and they referred to the excessive number of students in one group, placing students at various language levels in the same group, late hours of the language classes, the obligatory learning materials required by the Foreign Languages Center (SJO) that must be purchased by the students and some problems in using the educational platform.Many of the problems listed must be discussed with the University Authorities, who could give their permission to innovate the organization of the language courses, especially in the master studies.
When interpreting the results the following issues must be considered: 1) It is the first evaluation using the new formula.Therefore, it has been assumed that its results cannot be compared with the former evaluations.For this reason this elaboration is an analysis of one examination only, with no reference to the former evaluations. 2) It follows from such an assumption that the teachers' ratings should be interpreted carefully.It will be possible to consider their diagnostic value only if these ratings turn out to be similar to the next 2-3 evaluations. 3) The number of answers given must be considered, too.They vary from 37 to 244.Therefore, the average results have different diagnostic values.

4)
Because of the limited volume of this report, the deviations from the standard ratings have been neglected.Hence, the trust intervals for the average ratings have not been determined, which could improve their diagnostic value.

5)
Another important problem to consider is the objectivity of the evaluation ratings in the context of the lecturers' requirements for their subjects.In order to determine if there is such a link, a correlation analysis between the lecturers' ratings and the final grades they give should be carried out.Such analysis is planned after the next edition of the evaluation.

6)
Every language teacher receives a complete report of the evaluation in which the surnames are substituted with the teachers' codes (A, B, C,…).The teacher is informed about his/her code but doesn't know the codes of the others.This way every teacher has all the information about his/her evaluation results, knows his/her place in the rating which is determined on the basis of the average ratings given by the students, but at the same time full anonymity of the evaluation results is ensured.If the rating falls below the teacher's expectations, he/she can identify the faculty, the year and the form of studies (full-time or part-time) and then consider the possible reasons for the low results, his/her mistakes and the possible solutions and ways to avoid them in the future.
[!!!] The amount of material and hours is catastrophic, too small per one semester.The same applies to small tests that would motivate students to learn English systematically and would allow them to reach a higher level of proficiency.The lack of the division into groups is a complete failure.How is it possible that the people who have never learned English are in one group with the people who went to an extended English course in high school?The prices of the coursebooks and the access cards are much too high, the cards should be given free of charge because the activity on the platform is a requirement.The platform itself includes many mistakes.

[!!!]
The teacher can't explain grammar.It seems to me she doesn't know the basic notions.I haven't learned anything new yet.MET We should have more conversations.

MET
The pace of the information passed in class could be higher.The time of the lesson is used to a satisfactory extent.

MET
In the case of classes conducted in English, new notions and issues should be introduced in Polish to give student more psychological comfort.The pace of the lesson and the level of the class are too high since the beginning of the semester, at this stage of learning.MET More Polish translation.MET More focus on conversation.MET More interesting vocabulary items, less grammar!MET More listening -films, "real life" situations, to minimize the element of surprise that a student gets abroad when he encounters a different English, spoken faster.MET More legal vocabulary.MET More classes connected with business English and management.MET More focus on colloquial speech!

MET
The teacher should focus on the pronunciation of English words and should check the students, otherwise the students learn to mispronounce.I think that the overall grade cannot be lowered because of the absences.Learning a language is about the level of knowledge and not the number of absences.We can't have English classes on Friday evenings till 8.45 p.m. or on Saturdays because we are on full-time studies.MET Too much grammar and too little CLIP in the overall grade.BOOK Give us a possibility to buy used books, this is thievery!

BOOK
Using books that cost 100zl is irrational.Moreover, the content is not worth such a price.At least the one that we are using now.

BOOK
The learning materials required by the university are, to put it mildly, worthless and uninteresting, making teaching difficult.Not dividing the students into groups according to their level of language is a big mistake.The system of teaching English is too expensive and troublesome in the formal sense.In this sense, we don't have time for that!And your keyboard is broken, there is no "a with a diacritic mark" key.Please change something, and make it quick!BOOK In my opinion the books are too expensive.They should be much cheaper.or completely.It isn't a problem of the English course only, the same applies to all other classes, too.The teachers are unhappy with this situation, as well, and getting from Lipowa Street to Building A or B in a 5-minute break IS IMPOSSIBLE! ORG In general, English is a cool subject but scheduling the classes for Friday, late in the evening, or for Saturday, for full-time students, who spend every day at the university, from dawn to dusk, is just inappropriate.It is insane, I hope that the Department of Education will understand their mistake.

ORG
First thing: there should be a division into groups according to the students' level of language, we should have more practical preparation for our profession, discuss more.

ORG
My first remark about teaching English: the way of correcting the exercises sent by the clip system is not very reliable because even if the whole task is grammatically correct, points are still deducted for not putting a full stop or using too many spaces between words.There is no possibility to correct the task afterwards.I think that the clip access cards should be given free of charge because doing the tasks is obligatory.Another points is about the way of teaching.I have no objections here but I think that the lessons should motivate students more to learn individually.

ORG
A division into groups according to the students' level of English.Slower pace of the lesson that would allow a deeper analysis of the material and the tasks discussed; giving the key words / a vocabulary list after every lesson to be learned by the students; introducing revisions before tests and exams that facilitate passing them.ORG I would like to see the end of the monopoly on the English books.I think that there should be a possibility to pass the English course in a form of an additional final exam or something like that for the students on a high language level.Attending an English course on a low level discourages students and is a waste of their precious time which they could otherwise devote to attending a language course on a higher level or just use more effectively.

Figure 1 .
Figure 1.Screenshots of chosen pages of the questionnaire form.

Table 1
Record structure with source data

Table 2
Example table with evaluation results for question concerning punctuality Note: Compiled by author.
table 6 two elements have been determined: a matrix of correlation factors among the particular evaluation criteria (table 7) and a matrix of probability that a given correlation factor is insignificantly different from zero (table 8).The correlation factors significantly different from zero, showing associated features (positively or negatively), have been marked in red in table 7.In table 8 zero values correspond with these factors.

Table 3
Average grades according to partial criteria and aggregated measures(general  average, minmax values, spread)

Table 4
Average grades according to partial criteria and aggregated measures (general average, minmax values, spread) set on the basis of full -time students (yellow colour) and parttime students

Table 5
Average grades according to partial criteria and aggregated measures (general average, minmax values, spread) set on the basis of students of the I degree studies (yellow colour) and students of the II degree studies