Teacher Evaluation: A Comprehensive Study

teacher evaluation

How do you know if you’re good at your job? Whatever you do – butcher, baker, software maker – the standards of success are probably clearly understood by you, your superiors and your clients. Sales figures, mortality rates, Michelin stars and investment returns are easily quantifiable ways to evaluate the performance of those involved. There are, of course, certain intangibles that come into play in work evaluations, too. Even professional athletes, for example, who live and die by the sword of statistics, know that their on-the-job performance reviews include consideration of clubhouse skills like leadership, consistency and calm.

Now, how do you know if someone else is good at her job?

It seems like it shouldn’t be tough to figure out. You have no trouble assessing the job performance of, say, your waitress, hairdresser, mechanic or dermatologist. The measures are straightforward in all these cases. Did your meal arrive in a timely and hygienic manner? Do you look good? Is your car still making that rattling noise at low speeds? How’s that rash doing?

Final questions, then: How does a teacher know if she’s good at her job? And how can we assess her job performance?

Teacher performance assessment tools

The answers would seem to hinge on the assumption that if kids are learning, then they’re being taught well. The equation might look like great teacher = successful students. But unlike sales figures, post-surgical results or investment yields, measuring student learning and progress is a murky endeavor. If we accept that a teacher’s job performance can be accurately determined by her students’ academic performance, then what are the means by which we measure the students’ (and thereby the teacher’s) achievement? There’s a constellation of quantitative data we could throw in the hopper: standardized test results, attendance rates, parent surveys. But as with all raw data, the question remains: how to use it?

Setting aside the number crunching, there’s also the matter of qualitative data: things like student perceptions of the classroom, the faculty’s cooperative climate, community support for the school. Next, we have to consider how to adjust for issues that can complicate a teaching assignment; special needs students, challenging school settings and other issues too large to be tackled by an individual teacher’s efforts. To begin to evaluate teaching performance, we first have to assign relative values to all these different moving parts. We also have to agree on the premise that good teaching matters.

We can put the problem into algebraic terms:

If quality teaching = the most determinant predictor of student learning


A teacher evaluation system = a method to ensure the highest quality teaching


Developing a rigorous, valid and reliable teaching evaluation system is the single most important bureaucratic process in the service of student learning.

Easily understood in those terms. So now the real work begins: define good teaching; find a way to calibrate it; and build a process that evaluates teaching performance and ensures excellence.

A professional interest in teacher evaluation

Rob Weil is a former high school math teacher with more than twenty years of classroom experience under his belt. In his current position as Director of Field Programs for the American Federation of Teachers (AFT), he is devoting his energies to developing the teaching profession in the United States. Ask where his passion for professional development comes from, and he’ll tell you about his rookie teaching assignment. He had a courseload that included pre-calculus and college level math. Although confident in his mastery of the subject matter, he was new to teaching and hoping to learn a lot from his assigned mentor, a veteran teacher.  A veteran home economics teacher. Decades later, Weil still recalls the pairing as well-intentioned but fruitless, peer support in name only. He described the district’s teacher evaluation system, of which this mentorship was a part, as “useless and a waste of resources.”

Weil’s experience as a young teacher informs his work now as a leader in the AFT’s policy on teacher evaluation. AFT represents hundreds of thousands of K-12 teachers nationwide, and what Weil is calling for is nothing less than a paradigm shift in the way that we think about teacher development, retention and assessment.

The idea is to consider “assessment as part of the ongoing professional growth of teachers. This means changing the culture and the mindset of our educational institutions to make teaching a continual growth process” rather than a self-contained or tacked-on protocol. Citing work in other countries (Finland, Canada and South Korea are some of Weil’s model systems), the AFT envisions a development and evaluation system that is woven into the fabric of daily school life, part and parcel of a teacher’s work; and rejects the idea of what Weil calls “a silo model,” which stands separate in its own space and time.

As a student of international educational systems, Weil notes that countries with robust educational systems have a radically different understanding of the purpose of teacher assessment than we do in the U.S. “High-performing countries don’t look to [teacher evaluation] as a way of sorting teachers into ‘bad’ and ‘good,’ because they start from the premise of ‘Why would we allow untrained teachers to be teaching at all?’ These countries are doing front-end work in teacher development, so that they enter the profession already sorted.”

Teacher evaluation in other countries

In Finland, for example, where no teacher evaluation system exists, every single teacher is required to hold a Master’s Degree. In Singapore, Weil admires the culture’s “holisitic approach to growing a whole teacher,” which features “Lesson Study,” daily collaborative time among colleagues that inspires more effective classroom instruction. In South Korea, entrance into teacher preparatory programs is highly competitive and yields a selective and accomplished work force, so the notion of teacher evaluation is a moot point; if you already know you have the right people in place, what makes most sense is to ensure their continuous growth and collaboration, rather than appoint time and resources to assess their performance.

In short, these countries do enough on that aforementioned “front-end” that there’s no need for a sorting process. It’s worth noting that these countries enjoy markedly lower teacher turnover rates than the U.S.

Tim Bollin shares this concern about quality control in the teaching candidates’ pool. In addition to marking a quarter century in the science classrooms of inner-city Toledo, Bollin is also the Chair of the Ohio Educator Standards Board, a committee that reports directly to the Governor. Bollin asserts that “application into teacher prep programs is part of the problem.

Traditionally, teaching doesn’t necessarily attract the top-tier students, because those candidates are expected to go into ‘prestige’ careers. Unfortunately, our society equates prestige with money, and so these cream-of-the-crop types won’t gravitate toward teaching when they’ll find lucrative careers elsewhere.” To skim from the top of the cream to fill teaching prep programs, then, would mean nothing less than a societal upending of the notion of status.

Define effective teaching

But that’s getting ahead of ourselves. The fundamental problem is this: what good teaching is hasn’t been studied adequately in the U.S. educational system. From educators to students to parents to interested lay observers, we may not be able to define good teaching, but (with apologies to the Supreme Court) we know it when we see it. The National Board for Professional Teaching Standards is a non-profit, non-partisan, non-governmental body that seeks to define rigorous standards for what effective teachers should be able to know and do. The group has, since its founding in 1987, built a system offering National Board Certification to candidates.

In the ultimate example of peer review, qualifications for National Board Certification are, according to the organization’s mission statement, “developed by teachers for teachers, with teachers heavily involved in each step of the process, from writing standards, designing assessments and evaluating candidates.” To date, the group has granted certification to nearly 100,000 educators, and notes that most states and many school districts offer financial incentives to teachers seeking certification.

The private sector is also recognizing the problem of defining effective teaching. The Bill and Melinda Gates Foundation (www.gatesfoundation.org) is investing millions of dollars in the effort to identify what makes good teaching, and in the process has made waves with its Measures of Effective Teaching Project. The mission of the Gates Foundation’s efforts in this area is “to rethink the way we recruit, retain and evaluate teachers in our schools in order to improve student outcomes.”

Evaluating teaching effectiveness

This sounds suspiciously like the paradigm shift that Weil sketches. Not surprisingly, Weil is keen to see more work from the Gates Foundation’s research into professional development and evaluation. When asked if there is a place for creative statistical thinking (fantasy baseball’s “sabermetrics”) in this area, the AFT point-man was enthusiastic. “Yes, by all means, we need to look at student learning in every way we possibly can. Test scores taken alone are not proof of effective or ineffective teaching. Peer review is extremely promising but cannot stand alone. There is room for more metrics. We’re looking for more metrics.”

The Gates Foundation’s Measures of Effective Teaching Project (METP) is, in fact, all about metrics. Although the MET report has not yet formally been released, preliminary results published on the study’s website give considerable attention to quantifiable data of many different stripes. In a nod to Bollin’s preference for student and parent input, the report gives special consideration to surveys of student perceptions of classroom environment and learning.

The National Education Association (NEA), which counts teachers and administrators among its members, is also making its collective voice heard on the matter of teacher evaluation. Both the NEA and the AFT agree that there is no one-size-fits-all model that can be applied across the board in every school district, but each organization’s platform outlines a professional development and assessment framework that can be altered to fit accordingly.

Both stress a judicious and appropriate use of standardized test results as tool of limited use in the rubric, but also take pains to articulate examples of multiple measures of student learning that must also be a part of any assessment algorithm. The NEA’s position paper is also careful to spell out the many complications of using test results as a teaching evaluation tool.

Both the NEA and the AFT, as unions representing educators, call for teaching assessment systems that not only recognize and reward top performers, but that also identify and remediate professionals who need help to develop their teaching skills. In fact, the unions’ top thinkers on this issue resist the use of “evaluation” without its partner, “development.” In message boards and blog comments, member teachers consistently indicate that continuing education and professional growth need to be a part of any sort of faculty-supported system.

Of course, the ideal system would be good for kids, fair to teachers, and result in the unending development and growth of all parties. Weil, for one, would like to see an American educational system that, taking inspiration from its international counterparts, is structured so that best practices can be shared out among the teaching corps; an infrastructure that allows for continuous collaboration among colleagues. The AFT points to promising reforms in urban districts including New Haven and Pittsburgh, where collaboration between teachers, administrators and community involvement is fostering a new model of teacher development and evaluation.  The bottom line, according to Weil, is that assessing the work of teachers is still a strange alchemy of art and science, and will remain a murky process until we find new ways to measure teaching itself.

Teacher mentors

It’s not every day you meet a Tim Bollin. At least, not in real life. But you’ve encountered someone like him in a schmaltzy movie: parochial school kid studies hard, makes good grades.

After graduating from the local campus, he stays in his rough-around-the-edges hometown to make a go of it as a schoolteacher. No easy street for him, no comfortable suburban school with comfortable suburban students. Instead, he finds his home in the inner-city schools, where his skills and passion can make the proverbial difference.

As a rookie teacher, he struggles to apply the instructional theory he learned to his challenging teaching assignment. He discovers it’s one thing to learn about teaching, but it’s another thing altogether to teach. He’s surprised to learn that classroom management is half the battle. With the invaluable guidance of a seasoned mentor, he finds his teaching voice, connects with the kids, opens their minds to science, technology, the future.

But Tim is not a two-dimensional character. He’s a hardworking, overcommitted professional who says in all earnestness, even after nearly twenty-five years in the classroom, “I still believe there’s no more important profession in our society.” After an upbringing in a relatively rural neighborhood of Toledo, he has chosen to make his life’s work in the city’s urban neighborhoods. He is by any measure a leader in his field; a teacher in Toledo’s competitive TRACS program, Chair of the Science Department at the six-year-old magnet Toledo Early College High School, a member of TECHS’s design team, and current Chair of the Ohio Educator Standards Board.  Decades of classroom experience as well as his continuing involvement in extracurricular professional leadership positions have given Tim unique and varied perspectives on some fundamental questions: How can we define good teaching? How can we develop good teaching practice? And how should we most effectively assess teaching performance?

In his attempt to answer these questions, Tim reflects back to his early years in the classroom. He largely credits his development as an instructor to the teaching evaluation system in place and, more specifically, to the mentoring prescribed therein. “My students were only four or five years younger than I was, and the experience gap between learning and doing [teaching] was only overcome by working with an invaluable peer mentor.” Although Tim, like most novice teachers, had arrived with strong content area mastery, he quickly realized that “in a classroom, the content is 20%, but everything else you have to build through relationships, classroom leadership and management, mentoring.”  Of his assigned mentor, veteran science teacher Dick Fisher, Tim says, “I couldn’t have asked for a better individual to work with me. He filled me in on how to approach lessons to be relevant to my students, how to evaluate student data.”

Ultimately, the evaluation system was a collaborative one.  Whenever possible, “intern” teachers were matched with mentor by subject matter, and worked under their attentive guidance. As per protocol, the pair followed a format for lesson plan submission. Frequent classroom observation by the mentor and, less often, by the principal, would be followed with post-visit debriefings and notes. The mentor shared his opinions with the principal. At the end of the two-year internship, all the evaluations of the mentor and administrator formed a narrative report that went to an Intern Board of Review (a mix of teachers and administrators from the district), who would then vote to renew for a five-year license. Following that, candidates would thereafter be re-evaluated at three-year intervals. The re-evaluation stipulation, Bollin argues, is imperative. “There are too many districts that, for all intents and purposes, don’t re-evaluate teachers with tenure, and that’s where problems creep in.”

What is the relationship between student achievement and teacher achivement?

Tim describes the feedback generated by his early evaluation process as “invaluable. My confidence and skills developed incrementally, day-to-day, over lots of visits. We had conversations about classroom management, how to deal with behavior issues, respect, getting to know the kids, talking to them, listening to them, establishing relationships with students and parents. These are the soft skills you don’t get in college, that don’t come from a teacher prep program. Dick offered course correction when my classroom management skills needed help. He looked at me and said, ‘This is your classroom. You are in control.

What you do determines how the kids react, how they learn, the outcome.’ As a veteran teacher, I now know that reactive classroom management is a mistake; anticipating what’s going to happen is the key. Talking to a kid before he enters the room, calling parents. Relationships were important to Dick, and I’ve taken that lesson from him.”

As a department chair at TECHS, Tim says that he no longer plays an active role in teacher evaluations. Instead, he sees his small department as a collective, meeting on a regular basis to collaborate. But in his role as Chair of the Ohio Educator Standards Board, he does have strong opinions about effective measures of teaching proficiency. The group, which includes teachers, principals, superintendents, legislators, business representatives and parents, has worked for the last eighteen months to produce teacher evaluation standards to send to the governor.

Although Tim is proud of Ohio’s record as a leader in producing teacher and principal standards, he is concerned that the incoming administration has shown hostility to the major teachers’ unions, and will be unwilling to heed the OESB’s research-based findings. The climate of the times and the enormity of the issue, Bollin argues, may leave the OESB’s report dead in the water. “The economy, the family, politics, funding; all have hands in the problem of education. Talking about measuring a teacher’s performance: we can teach to any test you want us to, but is that what our society really wants? Is that going to make a child college-ready? Understand technology? Know how to use data?”

As a science specialist, Bollin aims to teach his students analytical thought, evaluation and application —  not just science facts to be regurgitated on a standardized test. But Bollin is also a parent, a taxpayer, and a rational thinker. He understands the urge to want to use standardized test data as an instrument to measure teaching effectiveness. In fact, he’s all for it – so long as the tests make sense for that purpose. As a scientist, he knows that hard numbers come from large samples. In this case, numbers can come from national standards, and evaluation tools can come from those standards. The SAT and ACT, Bollin says, are acceptable instruments in a kit of teaching evaluative tools because they have been shown to be closely predictive of college success. In contrast, Bollin says, student scores on the Ohio Graduation Test (OGT) would not be a meaningful measure of teaching effectiveness. Tim sits on the content review committee for the OGT, and says that there has never been any discussion about the test being a teaching evaluation tool. “The OGT is not a measure of much. It doesn’t correlate with college-readiness or building 21st century skills.”

So what’s Bollin’s dream scenario for evaluating teaching effectiveness? A system that looks not only at the content area, but also parent and student input (he’s a fan of student and parent surveys); a standards-based report card which would look at multiple criteria in multiple ways; peer evaluation; and whole school data (taking into account the collaborative climate of the entire school). He stresses that the ideal evaluation system is a continuum, which stresses continuous professional development and collaboration. It should also allow for course corrections. Teacher evaluation, he says, “should not be a ‘gotcha!’ thing.

Remediation should be developed locally. If someone’s just ‘not getting it,’ they need to be pulled back and given an opportunity to visit classrooms that are working effectively, like a second student teaching stint.”