By KEITH SHARON, SARAH TULLY TAPIA and RONALD CAMPBELL The Orange County Register
MISSING OUT ON AWARDS, Stoddard Elementary Principal Pat Hart no longer can use this room for tutoring. The school's score was within the margin of error. Source: Michael Kitada / The Register
California's $1 billion school testing system is so riddled with flaws that the state has no idea whether one-third of the schools receiving cash awards actually earned the money or whether hundreds of other deserving schools were wrongly left out, an Orange County Register investigation found.
The problems stem in part from the hundreds of thousands of kids who slip through loopholes in the testing system. About one in five children simply aren't counted in a system created with the intention of measuring the progress of every child.
Newly elected Gov. Gray Davis pushed the Academic Performance Index through the Legislature in 1999, tying cash to test scores despite warnings that such a system was unreliable. Experts who then designed the API stayed silent on what became an average 20-point margin of error in the score - a startling swing in a system where a single point can be the difference in receiving thousands of extra dollars to upgrade computers, buy library books or hire tutors.
The state did not publish the error rate until July, and only after months of questioning by the Register. But by then, $744 million had been awarded to schools based on improvements in scores that had little or no statistical significance. An additional $288 million went to help failing schools. And nothing has been done to fix the problem.
"This is insanity. It's indicative of what's wrong," said Assemblywoman Jackie Goldberg, D-Los Angeles, who was elected in 2000 and sits on the Assembly Education Committee.
The state has assigned an API score to every school based on the Stanford 9 test, an exam that compares students with their peers across the country. Each gets a target score to meet out of a possible 1,000 points. A school needing to score 650 to win money, for instance, would miss out with a 649. But a 20-point margin of error means that the system can't reliably measure such fine changes in school performance, and a 649 indicates only that the true score is somewhere between 629 and 669.
And the fewer students counted, the larger that margin of error.
That error rate means that of the $67.3 million the state plans to give to schools as rewards this year, 35 percent is slated for schools where improvement had little or no statistical significance. The Register found that of 284 Orange County winners, about $2 million is ticketed for 104 schools that might not deserve awards.
API scores ripple through communities: Families buy homes based on the scores of the neighborhood school. Principals and teachers can be reassigned for poor scores. Some schools this fall could be punished if their scores drop as little as one point for the second consecutive year.
Scores have political ramifications as well. Davis, who pushed for the system to scrape California off the bottom of national rankings, is campaigning for re-election in part on the rise of API scores. The governor declined to be interviewed.
The testing flaws hit hard est in the classrooms of schools such as Rio Vista Elementary in Anaheim. It needed an API of 537 last year to qualify for $35,884 in rewards. The school missed by five points, and now the state is threatening to take it over if scores don't improve. But given the margin of error for a school its size, the state has no way of knowing whether Rio Vista's 532 is accurate. The testing system can only determine with 95 percent certainty that Rio Vista's actual API falls somewhere between 517 and 547.
"I don't think they had fairness in mind when they created the API," said Kjell Taylor, last year's principal who since has taken another job within the Placentia-Yorba Linda Unified School District.
State officials stress that all tests have some error, and they believe the API is stable enough to be the basis for rewards and punishments.
"I have to trust the experts who designed the system that they have taken into account the statistical issues, and they've adjusted for that. And they've done the best they can to put together the system," said Kerry Mazzoni, Davis' Secretary for Education and a former assemblywoman who was the education committee chairwoman when the API law was written.
But the Register found that those statistical pitfalls were ignored, and even those closest to the classroom have questioned the reliability of API scores.
"I wouldn't buy a house based on the API," said Austin Buffum, deputy superintendent of Capistrano Unified School District.
Finding the error rate
The Register analyzed three years of test scores and student participation rates at the state's 7,300 public schools. The newspaper found the error rate after first investigating the large numbers of students whose scores were dropped.
API loopholes exclude about 828,000 students of the 4.5 million second- through 11th-graders statewide. Many scores are removed after students take the exams. Last year, more than 58,000 Orange County students - a group almost the size of Santa Ana Unified School District - weren't counted. While the state routinely reports 98 percent or 99 percent of students are tested at individual schools, only 82 percent of the test scores are counted on average.
The fewer students included, the easier it is for a school to change its score, giving small schools an advantage in winning awards. And students who traditionally score lower - blacks and special education students - are excluded at a higher rate than white and Asian students. The Register found no evidence that schools deliberately were excluding children to boost scores.
Such exclusions, researchers point out, mean that the API scores are unreliable, and they contribute to the error.
The margin of error, outlined in a report by Richard Hill, the director of the nonprofit National Center for the Improvement of Educational Assessment, is as great as 50 points for some small schools - meaning scores actually could fall anywhere within a 100-point range - and as narrow as eight for the largest. The center is an organization founded to help states improve education accountability systems.
State officials say such errors exist in all tests, and point especially to the SAT, the college admissions test. But unlike the API, the SAT publicly reports its 30-point margin of error so colleges know how much weight to assign any score. Colleges then look at other factors - grades and extracurricular activities - before deciding whether to admit or give scholarships to a student.
"You don't make a high- stakes decision based on a small difference," said Rosemary Reshetar, a group leader at Educational Testing Service, which oversees the SAT.
The API is not without merit. A school that scored 800, for example, obviously performed better than a school that scored 500. Money awarded to schools whose gains fall outside the margin of error is more likely to be deserved, and schools that drop outside the margin of error might need intervention from the state. Also, schools can track gains or drops over time as a gauge of whether they are on the right track.
How errors occur
The margin of error occurs for two main reasons, testing experts say.
A student's score could be influenced by distractions such as an illness, the weather or a barking dog. This "measurement error" happens in any test.
But in the API, other factors, including what experts call "sampling error," affect the score.
Sampling error works for two reasons: school size and population changes.
Take Westpark Elementary School in Irvine, for example. At that school, 384 students were tested in 2000. The margin of error was 17. But in 2001, just 368 students were tested when the school scored an 856. The population changed so much that it was impossible for the state to get a precise measurement of students' progress: sixth-graders graduated, new second-graders started taking tests, and students moved in and out. That made the margin of error 21, meaning the API ranged from 835 to 877. On top of that, the test results of 60 children were excluded from the API, adding to the uncertainty.
National testing standards say those who administer tests should tell the public about the margin of error so that awards and other decisions can be weighed. State officials initially said they didn't disclose the error rate for three years because it would be too confusing.
"That's condescending," said Amy Schmidt, director of higher education research at the College Board, publisher of the SAT. "It's part of our professional duty to give as much information as possible."
That secrecy has upset some lawmakers.
"You have to be kidding me. A 20-point margin of error? That's outrageous," said Sen. Ray Haynes, R-Riverside, a member of the Senate Education Committee who voted against the API bill. "These experts who created this system cannot be trusted to come up with a new system to measure excellence."
Edward Haertel, chairman of a group that designed the API, was one of the authors of the national standards. But the margin of error was not reported in the Public Schools Accountability Act that launched the API nor in documents sent to schools explaining it. Even now, it has been buried since last month on the state's Web site under the heading of "research reports."
"It certainly is a good idea to get the information out to the public in an understandable form," Haertel said. "There are lot of more pressing things that had to be dealt with. ... I would agree that it would have been better if it was done sooner."
Schools miss out
While the math behind the API can be hard to decipher, the effect the scores have on Orange County families isn't.
Jan Helmuth is a PTA president and mother of three in Saddleback Valley Unified, which is cutting some music, science and sports programs. Her children attend Valencia Elementary, La Paz Intermediate and Laguna Hills High schools - and only Valencia scored over its goal (by six points) and qualified for awards last year. La Paz missed by one point. Laguna Hills missed because one of its ethnic group's scores was seven points too low.
But all three scores were within the margin of error, meaning the state can't say for sure whether any or all of them deserved awards.
Missing out on the money meant that La Paz Intermediate couldn't afford training for about 40 teachers on new programs to help students prepare for college, relying instead on parent donations to send 12 teachers to training this summer. Laguna Hills put plans to upgrade student computers on hold. Valencia, which won $22,165, is planning to buy playground equipment for its lower grades to replace 30-year-old swings, a slide and bars that are safety hazards.
"The API money is very significant when you're cutting things," Helmuth said. "This seemingly negates the whole premise of testing."
Close calls also occurred throughout Orange County. Nicolas Junior High in Fullerton lost out on $35,100 by five points. Santiago High in Garden Grove lost $57,865 by two. Jackson Elementary in Santa Ana missed $31,459 by one. Stoddard Elementary in Anaheim missed $26,479 and had to cancel an after-school program.
All with scores within their margins of error.
Students not counted
The calculation that ends in the API starts every spring, when thousands of students across Orange County and the state enter what school officials call "the test window," the culmination of a year of preparation for exams that determine their schools' fate. Few of those involved - principals, teachers, parents and kids - are aware that about six students in every classroom of 30 will be dropped from the final score because of API loopholes.
Lawmakers decided not to count the scores of students new to a school district, saying it was unfair to hold a school accountable for children it hadn't had a chance to teach. Other loopholes excluded special education students, and parents could sign waivers to excuse their children.
Lawmakers and state officials said they had no way of knowing how many students would be excluded when they approved the exceptions.
"The real-world implementation sometimes doesn't turn out the way you'd hope," said Dede Alpert, D-San Diego, who wrote the API bill.
Because the number of students taking tests can affect a school's API, leaving out students can have an impact. Topaz Elementary in Fullerton, which is set to win $17,223 in awards, has seen its API score rise to 582 from 510 in the past three years. At the same time, the school's participation rate has fallen to 69 percent from 93 percent.
The effect of these missing children also was spotted by Stanford University researchers, who had received $500,000 from the state to study ways to improve the API but concluded so many students were left out that the scores were unreliable.
"I just don't think (the API) is accurate," said Margaret Raymond, a co-author of the study, which was published by the state in April. The more controversial findings were left out of a news release by the state Office of the Secretary for Education.
"It's not an accounting of what they are doing with all students in the school."
When Davis took his oath of office in January 1999, momentum was building to improve schools after years of California sitting near the bottom of national rankings and an ever-changing system of tests that frustrated teachers and students.
One of his first acts was to call a special session on education, giving legislators three months to craft a system that would be in place by the time kids returned from summer vacation.
Policy-makers used parts of two plans.
One was called "Steering By Results," a report written in 1998 by a committee under then-Gov. Pete Wilson. It recommended that an accountability system begin in 1999 but be studied for five years to ensure reliability before handing out awards. The report said it would be "irresponsible" to do so earlier.
To calculate the scores, the state borrowed from a system in Kentucky. It was developed in the early 1990s but was replaced in 1998 because it was unreliable. Haertel, the chairman of the California group that developed the API, also had been a member of a technical group that critiqued Kentucky's testing program.
Haertel said California designers learned from Kentucky's mistakes. Some of Kentucky's main problems were that students were only tested in three or four grades, and the amount scores needed to improve to get an award was even smaller than in California. That made results less reliable. But Kentucky posted its error rate.
"We made improvements," Haertel said.
The Public Schools Accountability Act became a law in April of 1999 before its details were figured out. Once enacted, Davis appointed a committee to create the testing system and hand out cash based on test scores. The Alpert bill passed without any mention of the margin of error. The first record of any mention of it came five months later, when it was discussed briefly in a September 1999 meeting, minutes show.
"I don't remember it being discussed," said Geno Flores, testing coordinator at Long Beach Unified who attended the technical and advisory meetings. "All I recall is the frantic rush to finish."
Politics vs. academics
Members of the technical group said they put the law into action as it was presented to them, even though they knew awards could go to schools that might not have gained. They said it wasn't their place to lobby politicians for major changes.
"It's beyond the scope of my role," said Ted Bartell, a member of the technical group and Los Angeles Unified School District research and evaluation director.
The politicians say they relied on the testing experts to give them a solid system. Although representatives from the state Senate's Education Committee and the state Board of Education attended the meeting in which the margin of error was mentioned, no one raised concerns.
State officials say that the accountability system is "evolving." This year, more tests will be used along with the Stanford 9 - the latest results of which will be released Aug. 29. More students also will be included as part of a federal education push, making the API more reliable. While awards are suspended because of the budget deficit, state officials have not ruled them out in the future.
They also remain largely wedded to both the API and its system of awards. Even some district officials are reluctant to criticize. Jeff Bristow, Capistrano Unified's testing coordinator, sent a statewide e-mail to his fellow educators last week dismissing the Register's series before knowing the conclusions.
He reminded his colleagues of his own frustrations with the state and concerns over a system set up by politicians. Still, he urged all to show a united front in public.
"For those of you who know me, you also know that I have been guilty of expressing dissatisfaction with policies or practices of the (California Department of Education)," Bristow wrote. "However, that was 'in house.'
"I believe that the CDE was put in an untenable situation when the (API law) was implemented - short deadlines and horrendously unreasonable expectations. Should I receive a call, I will be most supportive of all the folks in the Standards and Assessment Division."
Some are less charitable.
Teacher Sherri Fankhauser of Portola Hills gasped when she learned that the API had an average 20-point margin of error during a meeting with Principal Judy Blankinship.
Portola Hills missed out on about $23,382 by four points.
"Should we write a letter telling the state where to send our check?" Blankinship asked.
Staff writer Kimberly Kindy contributed to this story.