Mitigating MOSL Madness

Linking student test scores to teacher evaluations has always been a mistake. Good teachers can end up under legalized micromanagement known as a Teacher Improvement Plan (TIP) because of a quirk in a formula, input issues, systematic inequities in the system, or simply a bad test day. A couple of years of problems in a row and even a tenured teacher can be on the chopping block.

Still, in recent years, UFT Leadership has stood by the practice, largely citing MOSL as a check and balance to weaponized observations. It’s true, in our system, test scores and observations are ‘averaged’ together to give teachers their final rating. That means that if a vindictive principal contrives a Developing MOTP rating for a teacher they’ve targeted in observations, that same teacher’s MOSL score can often bump them up to Effective overall. Unfortunately, this argument conveniently omits the opposite possibility. A teacher with Highly Effective observations can be bumped down to a Developing if they have a fluke MOSL score.

Why MOSL Doesn’t Work (even in a good year)

One of the big flukes in MOSL calculations has to do with bad inputs. You’d be surprised how many of us get tied to phantom classes that don’t really exist (or which are closed down after a few months or weeks). One year, I received a 3.50 (Effective, .01 short of Highly Effective) on MOTP, and was surprised to return to school with an overall rating of Developing. It turned out I’d gotten an Ineffective on the MOSL, which trumps even Highly Effective MOTP scores. That prompted me to look through the raw data, where I figured out the problem. It turned out that my scores were linked incorrectly to a phantom class we ended up cancelling, so my scores were based on the test results of students I didn’t actually teach. Because I was listed incorrectly as their teacher of record, on paper it looked like I didn’t teach them anything while being their teacher. In truth, I didn’t teach them anything, because I wasn’t their teacher (indeed, these were students retaking an old Regents from years ago). It took me a few months, but my principal signed the paperwork acknowledging the mistake, and I got the DOE to fix the problem. With scores recalculated based on students I actually taught, I had an Effective overall. But the onus was on me to prove it. The same thing has happened to other teachers I know, many of whom opt to keep the Developing rating just to avoid doing the paperwork.

Are your students ‘Present’ enough to count?

Another problem I’ve seen come up many times is students who are members of a teacher’s roster counting against the teacher even though they don’t actually go to class. Technically, if a student doesn’t show up to class, they’re supposed to be taken out of your equation. This should happen if a student doesn’t show up to school at all during the year. But, let’s say you teach a student first period, and they never show up to your class, but do attend third period every day. That means that they are officially present. So, if that student takes your Regents and fails it, you’ll see a bad growth score tied to you, even though you might have done all the outreach in the world and still never met them. I’ve seen this many times, and unlike the previous situation, there’s currently no way to petition the DOE to address a MOSL score that resulted from faulty attendance assumptions.

Which schools are ‘comparable?’

Currently, teacher growth scores are calculated by comparing the results of teachers from schools with similar student populations. What that means is vague to most of us, but even if schools have populations who come in at similar levels, it’s absurd to assume equal conditions.

Here is an example of why. I used to teach at a high school that gave freshmen two hours a day of instruction in Living Environment. Now I teach at a school where students get only one 45 minute period of Living Environment. Differences like these are not accounted for in the system. There is no weighting for the fact that students are getting more than 2x as much instruction in school 1 vs school 2. Inevitably, this means that because of differences in school policy on which the teachers have no control, teacher 2 would be at a significant disadvantage in a competition with teacher 1. If the practice at school 1 was prevalent enough, a good LE teacher who only saw their students for 45 minutes would arbitrarily be deemed Developing or Ineffective.

Similarly, there is currently no adjustment made for students attached to you who are not correctly serviced. Wrongly given a bilingual section of a class even though you don’t speak the language of the students? Have an ICT class without a coteacher? Teaching a class with 40 students? Too bad, you’ll be judged under the assumption that your classes are in compliance and compared to teachers at schools where students actually are fully serviced. Again, teachers doing the best with what they have are penalized for failings of their administration or of the system more broadly.


Finally, we come down to COVID. This is the first time students will be taking Regents in years. Many students will be taking the culminating Regents in a given discipline, like Algebra 2, even though they were arbitrarily passed through with waivers for the introductory subjects, like Geometry and Algebra (perhaps without actually understanding the subjects well enough to pass an exam). Students have been through tons of trauma, all while getting accustomed to not doing state testing. In that context, how can we really project what student growth will look like with so little data on the past?

We are kidding ourselves if we think we have any idea how to calculate growth in the context of a pandemic where students haven’t been tested in so long. I’m not a fan of standardized testing to begin with – I’m a special education teacher and think it’s absurd to subject our students to the process. But, if we are going to have standardized tests this year, we at the very least shouldn’t use them as part of APPR. If we’re really going to have them, we should consider this our ‘benchmark year’ so that when we resume MOSL in 2022-2023, we at least have some recent data with which to calculate growth. In the meantime, we can take our cue from NYS to disband with APPR this year entirely, or at least use a city-wide MOSL that defaults to Effective, so that no one is at risk of getting Ineffective or Developing because of a likely testing fluke. And maybe, just maybe, before we subject our teachers to individualized MOSL again, we should find a way to fix the flaws discussed above too.

Leave a Reply

Your email address will not be published. Required fields are marked *