Act 1: Repair and Recovery

In the previous blog, Act 1: Damage Done, I attempted to illustrate how and why the teacher evaluation plan outlined in Act 1 is invalid and damaging to the teaching profession. The main reasons being:

  • The design of the VAM is not being adhered to.
  • The cost of adminstering VAM as it is designed would be prohibitive.
  • The understanding and implementation of the design across the state is so varied that the results are completely contradictory to its purpose.

The premise behind the design of VAM is truly admirable, but the idea that a teacher’s effectiveness can be measured to the extent that they grow the achievement of individual students is misguided. There are too many influences outside  the boundaries of the classroom that affect a student’s performance on a single test.

The details of Act 1 outline that 50% of a teacher’s evaluation should be determined by the achievement of their students. It also states that multiple measures should be used to determine that; meaning that a teacher’s effectiveness cannot be measured by a test, alone. However, in most cases it is determined by a single test and an observation which hardly qualifies as “multiple.”

In October of 2013, the Center for Public Education released its study, Trends in Teacher Evaluation: How states are measuring teacher performance. The idea was to provide a summary of the various ways in which states qualified for Race To The Top (RTTT). In the study, they found that of the 50 states, 13 were “highly involved” in the design of the teacher evaluation. Seventeen were moderately involved giving districts the autonomy to create their own evaluation as long as it met certain criteria. The remaining 21 had minimal influence over the evaluation design, except for giving final approval; however, state involvement in design was very low. Louisiana, of course, being “highly involved” at the state level.

Forty-one of the 50 states require that evaluations include multiple measures. In Louisiana, we use two measures; student achievement and classroom observation. With a 50/50 formula, such as this, it is quite possible for a teacher to get a low overall rating despite doing everything correctly in the classroom observation. If the teacher’s student growth target is not achieved, the overall rating is reduced significantly. There are 23 states that base 50% of teacher evaluations on student achievement.

Of the 23 states using 50%, 12 of them are using the Value-Added Model. Here is where it gets interesting. If you cross-reference this information with the information provided on the Stand For Children website, you find that SFC has an established presence in 7 of these states, and in fact, was heavily involved in the legislation and development of teacher evaluations. In short, they pushed for the use of a model that they explicitly warn legislators about using when tying compensation to evaluation. What this translates to is that ultimately the use of the highly complex VAM to determine student growth as it relates to a “specific” set of academic standards, and then using the outcomes to determine compensation, will likely be challenged in court; most likely with an unfavorable result. Why this hasn’t been done, yet, is a mystery.

At this point, I’m certain that most are asking, “Well, what model do the other states use?” Funny you should ask. What many of the other states are using is something that has been around for a very long time. Let me start with a brief explanation of assessments. Keep in mind, I’m providing a “quick and dirty” explanation to the extent that I understand it and the events that took place.

Since the birth of No Child Left Behind, many states, including Louisiana, have been using what is known as a “criterion-referenced test,” or CRT. A criterion-reference test measures a student’s achievement against a predetermined set of standards. The intent was to implement a method for holding states accountable for educational outcomes and to compare states. The problem with this was that the standards varied greatly from state to state so that no accurate measure to compare states was available. This led to the federal government trying to influence state standards which still resulted in inconsistency. Then the idea came that if a set of standards were provided, any inconsistencies in the data could be tied to the teacher; hence, the Common Core State Standards and the concept of tying compensation to performance. That, in a nutshell, is how we got to where we are. You will likely think that it is unusual to make the following comparison, but the concept that I’ve just described is very similar to the design used to judge dogs in a dog show. Each dog, classified by breed, is judged against a predetermined standard for the breed. The breeders who consistently produce winners get rewarded greatly because they earn high fees to allow breeders to breed with their winners. Much like teachers, there are only so many variables that a winning breeder can control. Even in the best scenarios, there are some dogs in a litter that do not have the potential to be champions. The key difference here is that breeders are only judged by their winners. Teachers are judged by the entire litter.

How exactly did we measure student achievement prior to NCLB? The primary method was through the use of Norm-Referenced Tests (NRT) which administers the same test across the board and then compares the student’s achievement to other students in their demographic. One of the most popular NRTs available is the Iowa Basic Skills Assessment (IBSA), formerly known as the Iowa Test of Basic Skills (ITBS). The Iowa Assessments have been used for more than four decades and are among the highest rated for reliability and validity. The student receives a percentile score that indicates where they sit in the grand scheme. For example, if a student is at the 74th percentile (74 %ile) they have performed better than 74% of test-takers in their demographic. The scores are extremely easy to calculate and understand. Students, parents and teachers get useful feedback about where their students are performing. The cost of administering an NRT is far less than a CRT, and the turnaround for scoring is quick. There’s no waiting for, or deciding on, what the cut scores should be. They are scored and compared. Period.

How can we tie this information to the evaluation of teachers? Student Growth Percentiles (SGP) are currently being used in 16 states. This not only provides feedback about where a student is performing, but also their percentile in expected growth as compared to other students in their demographic. Of the 16, only 3 weight student achievement at 50% of a teacher’s evaluation. As I am sure you’ve already concluded, Stand For Children is established in those three states.

In short, the use of an NRT to evaluate student achievement and teacher effectiveness is infinitely more accurate than the incorrectly utilized current method. In addition, an adjustment in the weight of student achievement towards a teacher’s evaluation is warranted. In the next blog, I’ll discuss options for teacher evaluation and the use of a common set of standards.

