Peer assessment of teacher performance

: Peer assessment is increasingly used in schools and higher education, especially in health education. However, there remains insufficient evidence that peer assessment conditions are beneficial for teacher education. In this article, empirical research literature on peer assessment of pre-service teaching performance are reviewed. The articles were from the ERIC and Scopus databases, from 2002 to 2020. Only fifteen studies met the selection criteria described herein. The studies differed in the type of assessment used but converged toward the conclusion that incorporating peer assessment into different stages of teacher education was appropriate and worthwhile. We discuss the theoretical perspectives on why peer assessment might work in teacher education, pointing out practical implications for decision-makers in this field. Finally, recommendations and constraints for researching and implementing peer assessment are discussed from the perspective of innovation within pre-service teacher education.


Introduction
In the last decade, an increasing interest in peer-learning and peer-assessment (PA) in higher education has occurred, especially in health profession areas (Arnold et al., 2007). However, current evidence does not necessarily support its worthiness (McNulty, 2019). In the field of teacher education, teaching in front of peers, is a quite common practice for diverse purposes (Abdulwahed, 2011;Amobi, and Irwin, 2009), with positive effects in self-efficacy beliefs (d'Alessio, 2018). Despite the relevance of an authentic assessment process to monitor the development of pre-service teacher competences, those which rely on peers as a source of information and collaborative learning are still scarce (Charalambous, Hill andBall, 2011, Ratminingsih, Artini andPadmadewi, 2017). Indeed, the evidence supporting PA is less robust than other types of assessment in the educational field (Li et al. 2020), which constitutes a research problem, not only for teacher educators bur for decision-makers in this field.
The studies which form the subject of this review, focus on pre-service teachers (PST) teaching performance -rather than on more typical peer-assessed tasks such as written assignments (e.g. Tsai, Lin and Yuan, 2002). Although other prior reviews of PA exist (e.g., Gielen, Dochy, and Onghena, 2010;Li et al. 2020;van Zundert et al. (2010)), these are not focused on teaching performance nor were developed for teacher education. Hence, the present review analyzes a more specific area. Its purpose is to orient stakeholders to critical aspects of the design of PA in teacher education which are empirically based and focused on issues that require further research in order to gain sufficient understanding.
We start by defining PA, its benefits and disadvantages, followed by the essential organizational features and its implementation. After this, the reviewed studies are discussed in terms of outline trends and connection of principles for PA in pre-service teachers and practical implications for working PA in teacher education.

Definition and Types of Peer Assessment
Peer assessment (PA) is a form of evaluation that is designed for enhancing learning (van Gennip, Segers, and Tillema, 2009). Thus, apart from serving an evaluative function, it offers a learning opportunity (Bunch, Aguirre, and Tellez, 2009). PA is understood as "an arrangement for learners to consider and specify the level, value, or quality of a product or performance of other equalstatus learners" (Topping, 2009, p.20). PA by itself or as a complement to other types of assessments has broad areas of application (Gielen, Dochy and Onghena, 2010;Li et al., 2020;van Zundert et al., 2010). The purpose of PA on performance is to help learners make judgments about structured tasks and provide their impressions to peers. PA processes include judging whether specific actions are performed, their quality and suitability for a purpose (Norcini, 2003).
In its nature, PA is a social process, in which one of the essential components is the feedback given to and received from others (Sluijsmans and Prins, 2006;van Genip et al., 2009). Peer feedback is usually reciprocal between assessor and the individual assessed. It can be delivered face-to-face or remotely, verbally or in a written form, immediately or delayed. It can have an affirmative, corrective or suggestive orientation, and reduce errors if received thoughtfully. Useful feedback requires understanding the assessment goals and criteria, and the ability to judge the relationship of the specific performance to these goals (Topping, 2010). PA as an assessment method can be summative, formative or both (Gielen et al., 2010). Formative assessment involves participants helping each other to identify their strengths, weaknesses, and target areas for remedial action, aiming to develop metacognitive skills for future performance (van Gennip et al., 2009). Otherwise, a summative assessment gives feedback often when it is too late to affect the production of the present task, although it may affect the production of future tasks (Topping, 2010). Figure 1 shows the essential organizational features of PA in education.

Figure 1. Typical PA implementation in education
In this process, working in small groups and trying to avoid close friends or enemies/adversaries has been suggested to facilitate group involvement in the assessment (Topping, 2010). The person in charge helps the group to agree on the assessment criteria that will be used in the PA. This is followed by the exemplary application of the criteria to past cases/evidences or representative examples of the task. Using anonymous examples is recommended to avoid anxiety (Sluijsmanset et all., 2003). The application of the assessment criteria is again discussed and clarified. The participants are then encouraged to prepare the performance of a task knowing that it will be peer-assessed. Furthermore, the PA is conducted using the agreed assessment criteria. Kilic and Cakan (2007) recommended between three and five assessors in a set, so that participants (assessees -those being assessed) can balance feedback from different peers, which helps enhance PA reliability. Peer feedback is given to offer the assessee the possibility of improving the performance. The quality of subsequent performances is therefore relevant. The person in charge should evaluate the process to encourage accuracy in applying the criteria and giving peer feedback. Finally, reworking the task in the light of peer feedback is essential, to promote a sense of agency in the assessee.
The cycle may be repeated with the same or different groups to complement the feedback. Wen and Tsai (2008) suggested three rounds of PA. Nevertheless, it is essential not to overload the participants with too many loops, because the benefit of gaining more feedback has a cost in terms of time, which might become a disadvantage. van Zundert et al. (2010) indicated that most studies on Peer Assessment (PA) had shown benefits, but disadvantages can also be identified. Nevertheless, most of these were identified on researchers' opinion more than on evidence. This poses the need for reviewing empirical studies on the topic.
Furthermore, PA can save teaching time because of the more immediate and individualized feedback from peers (Sun et al., 2019). Nonetheless, this saving is not often achieved in the short run because the implementation of good quality PA requires a period for organization, training, and monitoring (Falchikov, 2001).

Disadvantages of PA in education:
One of the reported difficulties of PA is the amount of time required for the organizers and the participants (Okhremtchouk et al., 2009). To help with this problem, PA should be integrated into the curriculum (Kilic and Cakan, 2007;Strijbos and Sluijsmans, 2010). Likewise, initial reluctance and anxiety to participate is quite frequent (Arnold et al., 2005). Assessors beginning with positive feedback to the assessee could reduce this and improve subsequent acceptance of more critical feedback (Topping, 2010). Also, discussion, negotiation, and joint construction of assessment criteria with concrete exemplary material before PA might be worthy (MacArthur, Schwartz, and Graham, 1991). Indeed, performing in front of peers might be less stressful than doing it for the first time in front of teachers (Britton and Anderson, 2010).
Another issue is the reliability of PA, because friendships, popularity, enmity, perception of criticism as socially uncomfortable, or the trend to assign average scores can all be affected respectively. Nonetheless, using performance checklists or rubrics, extensive exemplification and careful monitoring of the PA process can increase reliability (Topping, 2009). Ensuring validity, reliability, and fairness of the measures is a more critical issue when assessments are used to make high-stakes decisions (Sandholtz and Shea, 2012). Hence, having clear assessment criteria (Sluijsmans and Prins, 2006), more than one assessor and anonymity between assessors and assessee might help in both summative and formative assessment (Kilic and Cakan, 2007;Vickerman, 2009). This is especially relevant in professional learning contexts. According to Gielen et al. (2011), PA between teachers can be an excellent way to assess teaching skills and also to improve them, but only if all different peer opinions enrich the assessment and if the assessment criteria are clear for all teachers. Although the benefits and disadvantages of PA are known, there still exists a gap in knowledge with regards to which characteristics of PA are supported by evidence within the context of teacher performance. Thus, this review is required.

Peer Assessment of Performance in Teacher Education
One of the goals for pre-service teacher education is to prepare student teachers to critically reflect on their conceptions about teaching, their practice and their peers' practice (Amobi and Irwin, 2009;Feiman-Nemser, 2008). Peer Assessment (PA) in teacher education is a strategy for helping learners to examine their progress in teaching (Sluijsmans and Prins, 2006) and be familiar with it before teaching in classrooms (Wen and Tsai, 2008). Teaching practice is an exercise used to expose student teachers to the practical aspects of teaching (Oluwatayo and Adebule, 2012). Although teaching practice is essential in teacher education (Jian, Odell, and Schwille, 2008;Oluwatayo and Adebule, 2012), attempts to improve it through early assessment are not widely reported (Charalambous, Hill and Ball, 2011).
Moreover, if there is a lack of performance assessment during pre-service education, student teachers do not know if they possess the required classroom teaching skills, nor the quality criteria used to measure their performance or how to improve it (Oluwatayo and Adebule, 2012). Likewise, considering the difficult shift from the student role to the teacher role without adequate support in the transition process (Jian, Odell, and Schwille, 2008), the lack of clear and agreed recognition of teaching competences is a problem. However, early experiences in the classroom -for example, by student teachers observing an expert teacher and slowly taking more teaching actions -might help in this (Hume, 2012), also developing skills to critically observe teaching practice (Sonmez and Can, 2010). Furthermore, PA in early teaching stages enhances student teachers' ability to analyze and reflect on their practice (Harford and MacRuairc, 2008), i.e., it makes them more analytical when appraising the teaching performance (Sluijsmans et al., 2004), and more able to bridge the gap between their conceptions and practice (Ostroskyet al., 2013). Despite these potentialities of PA, there is still uncertainty on which specifically are supported by recent evidence and what kind of interventions work for teacher education. These particular questions constitute the problem the present review will help to solve. PA studies in other areas of higher education have contradictory conclusions, thus, there is a gap in empirical research oriented to aspects of PA that might be more efficient in teacher education and assure its benefits can overcome the respective difficulties (Li et al. 2020).

Materials and methods
This article reviews empirical studies from 2002 to 2020 in PA of teaching performance in preservice teacher education to inform researchers and decision-makers, answering the research question 1: (RQ1) What evidence is there that PA of teaching performance in pre-service teacher education has worthwhile effects?
Moreover, in a broader scope, this article seeks to answer the stated questions using the antecedents and the systematic review, the research question 2: (RQ2) What are the theoretical underpinnings of PA of teaching performance in pre-service teacher education?

Review of Studies
A systematic review was used to approach this work following the steps recommended by Cook and West (2012); 1. Defining the question in focus, 2. Identifying information sources -we decided to use two main educational databases; ERIC and Scopus, 3. Searching for eligible studies with defined search terms -we used 'peer assessment' + teach* + performance*. 4. Defining inclusion criteria -English language, articles only with open access. 5. Defining exclusion criteria -articles not based on empirical research. 6. Defining data abstraction elements -we removed duplicates based on the title. 7. Analyze and synthesize -we screened the articles and excluded those not centered on teaching performance in pre-service teachers. Following these steps, we arrived to include fifteen articles ( Figure 2).

Schema of steps in the review process
The fifteen studies included in the research review were conducted in different places, with diverse conditions and varied aims, as summarized in Table 1. The nature of the studies allowed grouping of them into categories which are described in the following section. Studies focused on PA of teaching performance for its improvement are reviewed first (A); studies centered on pre-service teachers', specifically their development of assessment skills (B), later.  Harford and MacRuairc (2008) underlined the relevance of formative PA and feedback. In their study, the student teachers gradually moved their focus of analysis towards more meaningful reflection. They deconstructed the practice of their peers more critically and analytically. The researchers used focus groups, the results of which suggested that student teachers had developed their reflective skills. Moreover, pre-service teachers felt able to transfer the good practice observed in their work, evaluating the project as a powerful mechanism for conducting self-review. They valued informal and formative feedback, remarking that formal assessment would have reduced their engagement with the process and International Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE),8(2), 121-132 www.ijcrsee.com 126 the quality of the reflective dialogue. In this study, intervention and research were conducted by the same researchers. The results were based on a qualitative analysis of self-reports, so its reliability is open to challenge.
In the study of Kilic and Cakan (2007), the peer assessors and instructor evaluated pre-service science teachers' content and teaching knowledge, the teaching and learning process, class management and communication, as presented in a microteaching episode. They used a form with a 5-point Likert-type scale between very good and very poor. Peer scores were considerably higher than the instructor's, but the two scores significantly correlated. Improved correlations were then found in the second attempt at PA. They showed the number of peer assessors needed to be around five to sustain acceptable reliability. This study did not determine if the assessed performance improved. Oren (2012) studied the influences of participant gender in scores from peers, self and instructor on communication skills for teaching science during a 10-week semester. In this study, female participants obtained significantly higher mean scores than males in all score types. However, as this study did not count with a baseline measurement, we do not know if female pre-service teachers had better skills or they benefited more from the course. Similarly, Kiliç (2016) conducted multiple assessments by peers, self and instructor. The results showed PA was significantly higher than the other types of evaluations based on their scores. PA was perceived as an enhancer of successful performance and higher confidence for presenting lessons between student teachers. Although this assertion is based on participants perceptions and not supported by a more objective indicator, it is a positive signal for strengthening PST's self-confidence.
Charalambous, Hill and Ball (2011) investigated mathematics pre-service teachers' thinking and performance when delivering instructional explanations. The 20 participants co-constructed a list of criteria for determining the quality of instructional explanations, giving a performed example in a simulation of a lesson. The peers completed reflection cards and shared their comments with the performers. Through a case analysis of four participants, the researchers concluded that their performance could grow to vary degrees after PA and reflection.
The study of Cabello and Topping (2018) had a within-subject repeated-measures design. The 20 student teachers received training in PA and constructed assessment criteria. They peer-assessed microteaching in two rounds through a rubric. The PST significantly increased their microteaching performance after PA, with a reasonable effect size (d=1.4). This improvement was maintained and transferred into real-life teaching when followed up. Nonetheless, the participants' self-selection and the limited sample size affected the generalizability of the results.
In the experimental study of Lin (2018), online PA was carried out in a Facebook-based learning application, looking for effects of anonymity on affective, cognitive and metacognitive peer feedback. The student teachers were randomly assigned to write feedback to five peers' microteaching videoed performance in an anonymous or identifiable condition. The author applied a 6-point Likert scale for the perceived learning, fairness of peer feedback and attitude toward the online PA system. The anonymous group gave more cognitive feedback and the identifiable group more affective and metacognitive. In the role of assessees, the group, when in an unidentified condition had a better attitude towards the system. However, they perceived peer comments as less fair than those of the participants in the identifiable group. The study argues that the cognitive and pedagogical benefits of anonymity in online PA are well demonstrated, although, the data analysis only allows inferences on their perceptions. d'Alessio (2018) conducted a study to help student teachers to build self-efficacy beliefs. The student-teachers performed microteaching, self, and peer assessment of the events, which was analyzed using a rubric. The quality of microteaching based on PA and mastery of the content had the most significant influence on participants' self-efficacy beliefs in this study.
Al-Barakat and Al-Hassan (2009) explored early childhood student teachers' preparation in various applied contexts. They received extensive training on PA reviewing videos. Then each PST was observed once a week by 4-5 peers over ten weeks in a 45-minute lesson. The classroom observations were discussed in a group, and feedback was given. Using interviews, they found PA developed participants' classroom performance, especially on competencies such as designing objectives, activities, teaching strategies, interaction, students' assessment and classroom management. Moreover, they found PA helped student teachers to form a set of criteria for judgement on classroom performance and improvement of self-confidence within the assessment.
Similarly, the use of peer dialogue assessment was researched by Eather et al., (2017) in physical education PST. They found significant improvements in perceived teaching confidence and competence, and teaching self-efficacy based on self-reports. In both studies, the same researchers conducted the interviews and the course, making the validity more open to challenge. In the study of Sluijsmans et al., (2004), the participants were randomly assigned to similar-sized control (n=47) and experimental (n=46) groups. The experimental group was trained in PA while the control group was not. The student-teachers defined a set of assessment criteria for designing a creative lesson plan. The study used PA forms, questionnaires and interviews. The researchers found the experimental group was more capable in applying the criteria than the control group, the experimental group also used the criteria more often and felt more able to assess after PA than before PA. Still, there was no significant effect of the training in PA skills on performance. Even so, the researchers concluded that PA skills could be successfully trained, but the value of the training is uncertain if no impact was found on their resultant performance.
Kim (2009) used a metacognitive awareness questionnaire and a motivation survey. PST's performance was measured in an assignment to create a concept map on instructional design. All the participants submitted their tasks for peer feedback and tutor marks. After receiving feedback, the participants were randomly assigned to an experimental condition that received a back-feedback opportunity (n=40) and a control condition without back-feedback (n=42). Back-feedback consisted of giving their opinion on the feedback, enabling reflection on peer feedback. After revision, PST resubmitted their concept maps and completed the metacognitive awareness questionnaire and survey. The experimental group subsequently showed higher metacognitive awareness, better performance and better attitudes towards PA than the control group. The researcher did not report the size of effect of the improvement, or the correlations between peer and tutor marks. Thus, the study seems unspecific in the manner which is published.
The study of Sluijsmans et al., (2003) had a within-subject repeated-measures design. Questionnaires, PA forms and the student teachers' reflections were used to assess written reflection papers. PST received several training sessions on assessment skills, how to give feedback and write assessment reports. They agreed nineteen assessment criteria (i.e. self-criticism, work field experiences, personal expectations, etc.) and marked the peers' task and wrote their written reflections in a virtual learning environment. The instructor also marked the reflection papers. The researchers concluded there was significant progress for most variables studied: the participants used the assessment criteria better; their feedback was better, and their assessment reports were more structured. Likewise, they wrote better reflection papers, based on the instructor's marks; however, the effect size of the advance was not reported. Moreover, the design of the study does not allow PA to be related causally to better performance because a comparison group was not incorporated.
Seifert and Feliks (2019) studied attitudes concerning self-assessment and anonymous PA to improve PST assessment skills. The participants assessed several products and noted they benefitted from the process and developed good attitudes to PA. Mercader, Ion and Díaz-Vicario (2020) used different instructional designs to guide PA. PST perceived that long-term mediations, two rounds of PA and giving feedback were the most useful. Both studies had quantitative analysis and linked them with PST's perceptions.

Results and Discussions
Here the research findings of this review are summarized, then interpreted. After this, they are discussed in terms of practical implications for teaching and future research.
The analysis of the studies showed some similarities and several differences. Firstly, they were conducted from the early to the final years of teacher education, but the trend in similarity seemed to be stronger towards the latter years. The participants were from a wide range of subjects, with a slight tendency towards science and mathematics. The sample size varied from 16 to 556. The performances assessed were based on teaching skills (Al- Barakat and Al-Hassan, 2009;Charalambous, Hill and Ball, 2011;Charalambous, Hill and Ball, 2011;Kilic and Cakan, 2007;Lin, 2018), assessment skills (Kim, 2009;Mercader et al., 2020;Seifert and Feliks, 2019;Sluijsmans et al., 2004) and a combination of teaching practice with the development of peer assessment as a skill (Cabello and Topping, 2018;Sluijsmans et al., 2004). The purposes of PA were summative (Kilic and Cakan, 2007;Sluijsmans et al., 2004), formative (Al-Barakat and Al-Hassan, 2009;Cabello and Topping, 2018;Charalambous, Hill and Ball, 2011;Harford and MacRuairc, 2008;Kim, 2009;Lin, 2018) or both (Sluijsmans et al., 2003). Feedback was face-to-face in most of the studies, but d'Alessio (2018), Lin (2018), Mercader et al. (2020) and Sluijsmans et al. (2003) who used an online platform.
The studies also differed depending on the assessment criteria used for PA. Kilic and Cakan (2007), Lin (2018) and Al-Barakat and Al-Hassan (2009) used criteria defined by the staff member, while Cabello and Topping (2018), Charalambous, Hill andBall (2011), Sluijsmans et al. (2004) and Sluijsmans et al. (2003) agreed on the criteria between the participants. Harford and MacRuairc (2008) and Kim (2009) did not use structured criteria to guide the PA processes, although they found an improvement in awareness about teaching practice. Only some studies reported a guided training received by the participants (Al- Barakat and Al-Hassan, 2009;Cabello and Topping, 2018;Kiliç, 2016;Mercader et al., 2020;Sluijsmans et al., 2003).
Most of the studies reported benefits of PA on the performance assessed. However, the study of Sluijsmans et al. (2004) did not show an effect, Kilic and Cakan (2007), Lin (2018) and d'Alessio (2018) did not mention it. Nonetheless, two studies used a repeated measures design and reported a measurable improvement in the performance (Cabello and Topping, 2018;Sluijsmans et al., 2003). They found a significant and robust advance in participants' performance after PA -despite the vast difference in their sample sizes (20 vs 110), the type of performance assessed and the length of PA training. Even so, without a comparison group, establishing the cause of progress is open to challenge.
Lastly, only three studies were found with an experimental design, which allows linking the results obtained through PA to outcomes in a causal relation. Sluijsmans et al. (2004) did not find statistical differences that supported an effect on the assessed performance. Kim (2009) reported significant differences, but not the effect size of the improvement. Lin (2018) related one condition of PA -anonymity with the peer feedback and perceived learning, fairness and PA attitudes: not with effects on the teaching performance.
Most of the studies had quantitative measurements, but some of these were questionable in their reliability or accuracy, perhaps due to the limiting conditions of researching in higher education contexts, such as lack of possibilities for randomizing groups or having external evaluations on student teachers. This finding supports the view that research in PA needs more systematic work (Li et al. 2020).
Moreover, the short-term parameters of the studies reviewed here must be considered. Similarly, only two of them dealt with the transference of teaching practice to real contexts (Cabello andTopping, 2018, Al-Barakat andAl-Hassan, 2009). Thus, generalization and maintenance of the effects of PA in teacher education into work contexts are not robustly supported by evidence so far.

Theoretical Basis of Peer Assessment in Teacher Education
Some authors have presented theories or ideas to help understand the role of PA with respect to performance in teacher education. For instance, negotiation of meaning is a construct that might explain the possible success of formative PA in teacher education, primarily when the student teachers design the assessment criteria (Al- Barakat and Al-Hassan, 2009;Stiggins, 1991), which has been empirically tested by Sluijsmans et al. (2004). We strongly believe the crucial benefit of this is the construction of a third space in between common knowledge and teaching knowledge, where student teachers can jointly redefine what elements constitute good teaching. This construction is triggered with the interaction with peers, through assessment reflection and discussion, as discussed by Ratminingsi, Artini and Padmadewi (2017).
Additionally, PA provides students with skills to form judgments about what constitutes high-quality work (Cabello and Topping, 2018), and this challenges their conceptions about good teaching (d'Alessio, 2018). Moreover, PA within teacher education could enhance self-regulated learning, by giving student teachers the opportunity to talk about their decisions, beliefs and practices (Vermunt and Endedijk, 2011). This might lead to them gradually becoming the owners of their learning processes when they actively construct new ideas in interaction with peers (Ratminingsih, Artini and Padmadewi, 2017). However, selfregulation of learning is a necessary but not entirely sufficient condition (in of itself) to develop pre-service teachers' conceptions and skills (Vermunt and Endedijk, 2011). Cabello (2017) argued that changes in student teachers' conceptions and practice during PA might occur on the basis of two cognitive mechanisms: projection and reflection. The assessee's performance reflects what student teachers in the assessor role would do in a similar situation. The assessors identify themselves with this practice because a peer, who shares experiences performs it directly to them. Thus, the assessors project their own possible decisions and practice on the assessee's performance. Furthermore, discussing the assessment criteria and using them to analyze their practice gives a shared space for critical reflection on typical teaching performance, possibilities and understanding which might enhance changes in their conceptions with consequences towards their practice.
Critical reflection itself is required for making reliable judgments about peers' work because a comparison of peers' performance against teaching performance criteria is required (Ratminingsih,Artini International Journal of Cognitive Research in Science,Engineering and Education (IJCRSEE),8(2), 121-132 www.ijcrsee.com 129 and Padmadewi, 2017; Sluijsmans and Prins, 2006). Likewise, critical reflection can also develop selfassessment skills and help student teachers improve their practice. As Stiggins (1991, p. 38) stated, "once students internalize performance criteria and see how those criteria come into play in their own and each other's performance, students often become better performers". This might be one of the reasons for the positive results obtained by Kiliç (2016). Thus, PA is understood as a cognitive and social activity to enhance student teacher professionalism. In relation to this point, Lin (2018) states that PA in teacher education is an integral part of the learning process. We extend the argument for PST, based on the idea that discussing assessment forms or rubrics with peer assessors is crucial (Kilic and Cakan, 2007).
Considering that PA is an activity that all professionals may expect to experience at different times of their professional life, implementing PA at university seems to reflect demands for transferable skills. PA can help student teachers face changing teaching environments by giving them not only an active role in the detection and remediation of their weaknesses (Inoue, 2009) but also in the development of their communication skills and collaboration (Sluijsmans and Prins, 2006). Even so, collaboration is not the only explanation of why PA might be efficient in teacher education. The internalization of assessment criteria for enacting good quality performance (Stiggins, 1991) is an underlying principle, which might function as an enhancer of self-regulation in teaching practices (Vermunt and Endedijk, 2011), from a cognitive perspective (Kollar and Fischer, 2010).

Practical implications
Although most of the studies reviewed supported the feasibility of PA in teacher education, some studies were questionable in their methods. Disadvantages of PA can appear during its implementation. Most of these can be avoided -i.e. anxiety -but some are unavoidable, such as the time required for conducting good quality PA. It is vital that stakeholders who decide to embed PA in teacher education take careful actions to minimize the disadvantages. For instance, arranging the PA groups to avoid adversarial instances, assuring anonymity and applying the assessment criteria with others' performance first, have all been suggested (Lin, 2018).
The incorporation of PA in pre-service teacher education helps the diagnosis of competences (Sluijsmans and Prins, 2006) and understanding of how effective the teacher education program is being or has been (Pecheone and Chung, 2006). For instance, PA can reveal certain shortcomings of the students within the program, such as students lacking skills to analyze teaching practice or to give feedback (Sonmez and Can, 2010), or informing about areas of strength and weakness (Darling-Hammond, 2006). This is a form of accountability in teacher education programs. Thus, PA could bridge the gap between instruction and assessment, monitoring student teachers' progression and helping them to improve. This point might be interpreted as the potential of PA in initial teacher education, as a learning, teaching, diagnosis and intervention tool at the same time, and consequently, a cost-effective innovation in teacher education. However, to state this argument, more evidence is needed to support the ideas.
Furthermore, it would be of interest and benefit to investigate whether establishing a continuum of PA can have an impact on professional teachers' skills. For instance, comparing programs that systematically use PA from the early years of teacher education with others that only use them in the final years or even in only teaching placements. This is under the assumption that it might be advantageous to include PA when student teachers already have some practical experience, so they can use it to further strengthen their teaching competences. Preparing student teachers from the early stages of teacher education in PA skills could be an option for creating a culture of formative assessment in teacher education. Nonetheless, more robust evidence is needed to support advances in their teaching due to PA than is currently available.
The possible effect of PA on the professional identity of PST is also an exciting field to explore. It is known that the first years of teaching are essential to model teaching practice, and in this period, new teachers receive several influences from colleagues and mentors (Day, 2008). Thus, exploring the movements of identity when future teachers take on the judgement of the practice of others -and themselves-could adequately a complementary systemic for preparing teachers for self-regulated learning and lowering their dependence on others' judgements of the quality of their practices. Further research may also explore the influence of PA between peers on the self-concept of early student teachers.
van Zundert et al. (2010) and Ratminingsi, Artini and Padmadewi (2017) indicated that there is a lack of research into PA as an integral part of pre-service teacher education. Thus, the extent to which widespread PA would carry different benefits in embedded versus focal interventions is a question emerging from this review.
Of course, this review has some limitations, regarding for instance, only articles written in English were analyzed. This might unintentionally have restricted the broad view of PA. However, the reports International Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE),8(2), 121-132 www.ijcrsee.com 130 mentioned covered several countries and not exclusively in English speaking contexts. Thus, this review is still broad in its scope in terms of its research locations. Nevertheless, this work serves to fill the gap with respect to diverse PA implementation that exemplifies what works for PST in various contexts, which is a novel contribution to decision-making in teacher education.

Conclusions
The main contribution of this review is to illustrate the current trends in PA of teaching performance in teacher education. The studies presented here showed that PA has been applied in all the years of teacher education with small and large groups, with summative and formative objectives, face-to-face and online. The assessment criteria of the studies mentioned differed in their nature and design, with most of the studies reporting benefits of PA. Nevertheless, in some cases, the study design could only associate outcomes with PA, rather than showing plausible evidence of a causative link. Thus, these results do not form conclusive evidence. This review suggests that significant effects of PA, found within experimental studies which incorporate PST as a specific application have not been widely demonstrated yet. However, the few studies with a measurable impact of PA on teaching performance found a notable increase. This gives stakeholders ideas about the expected results if well-designed PA is implemented.
In summary, student teachers need to learn how to teach and assess performance of peers during their professional life. PA provides an assessment and learning scenario to critically reflect and judge teaching performance, which might give them the tools to monitor their practice as well as peers', based on the internalization of criteria for teaching. In addition, the extent to which PA impacts teaching practices needs further empirical support.