Statistically Insignificant

Scientist: "Practical significance (importance) and statistical significance (detectability) have little to do with each other"

Educator: "If the words 'Statistically insignificant' doesn't imply 'insignificant' to you, I cannot help you"

Our teachers now teach our children, and most of our children now accept, that if something is statistically insignificant, that it's practically insignificant. TIMSS proved that nothing could be further from the truth--but TIMSS is what our educators HATE with a passion and will do anything, even LIE, to discredit. Their jobs depend on a LIE.

The multi-billion dollar corporations who build the casinos in Las Vegas count on such ignorance. Each poker hand, each roll of the dice, each ring of the slot machine, is statistically insignificant, but the corporations who invest billions of dollars in gambling know that not a single one is practically insignificant. They know that at the end of the day those educated in the US education system will lose and they will win, big time.

When a human poker player beat the computer poker player Polaris, his win was statistically insignificant. To maintain that it was practically insignificant is the height of ignorance.

### Statistical Significance and Practical Importance

If the null hypothesis is rejected, one says that the effect or test is "statistically significant at level___," where the significance level or the *P*-value goes in the blank. "At level___" often is omitted, which makes it impossible to know what the chance of a false alarm might be. All too often, the word "statistically" is dropped too, leading one to think that the effect is important, not merely detectable. The difference between importance and detectability is considerable. A small, unimportant effect can be detected if there are sufficiently many data of sufficiently high quality. Conversely, an effect can be both large and important, but not statistically significant if the data are few or of low quality. That can lead to peculiar locutions, such as "no other leading brand has been shown to surpass ZZZ." Aside from the ambiguity in the word "leading," one might not reject the null hypothesis that no brand is better than ZZZ because ZZZ really is at least as good as all other brands, or because the data are too few or of too low quality to allow one to detect that another brand actually is better than ZZZ.

Statistical Significance and Practical Importance

Practical significance (importance) and statistical significance (detectability) have little to do with each other.

An effect can be important, but undetectable (statistically insignificant) because the data are few, irrelevant, or of poor quality.

An effect can be statistically significant (detectable) even if it is small and unimportant, if the data are many and of high quality.

**Quote from: daniel_von_flanagan on December 16, 2008, 04:18:36 PM**

**Quote from: jacobisrael on December 16, 2008, 04:10:44 PM**

**Are you sure that you've read that TIMSS study about our 12th grade scores? The methodology for picking the cohorts was the same in both the 8th and 12th grade **

** **

They nevertheless are *not the *__same__ cohort. The reason is that the 8th graders were in 8th grade that year, and the 12th graders were in 12th grade that year. In many cases, when the 12th graders were in middle school they had different curricula than the 8th graders did when *they* were in middle school.

This is not complicated stuff. Really. - DvF

**Now I understand your point. Thank you very much for clarifying it.**

Please point me to the evidence that there was a national, across the board, change in the curricula between 1991 and 1995 if you believe this to be a possible explanation. Can the same be said for all of the other countries which took TIMSS?

If anything DID change (and this is not to even hint that anything changed) then would you not agree that our change was clearly for the worse and theirs was for the better?

Austria's scores were an exception in Europe, as they followed a similar pattern to the US, only more extreme. While our boys' scores decreased 56 points, theirs decreased 85 points. And while our girls' scores decreased 104 points, their decreased 137 points. So while just the increase in the gender gap was 48 points in the US, it was 52 points in Austria. This is not an insignificant decrease, since the standard deviation for US girls was 53, making this 0.91 S.D. Since the standard deviation for Austrian girls was larger, at 71, the increase in their gender gap was smaller, at 0.73 S.D.

But there was already an 8 point gender gap in Austrian 8th graders, making their total gender gap by 12th grade 0.85 S.D.

I'm not clear on how changes in the curricula could have affected any of this. I don't even know what can be changed to cause such huge race and sex gaps, or to make them bigger or smaller. So it would be greatly appreciated if you'd provide an example.

Actually, I can think of one small example. Not too long ago, Chinese educators were invited to visit the US to study our education system. They asked many great questions, and my input was they should implement calculus in high school as Japan had. They did that, and now 95% of Chinese students complete calculus before they graduate from high school.

Pretty smart, eh? What have our educators done lately to top that?

**cgfunmathguy**

I've tried to stay out of this one as DvF has done an admirable job of presenting the points I wanted to make. However, please allow me to add my two cents' worth. First, you are comparing different systems that do different things. You are comparisons are being made between countries where there are NATIONAL curricula, those where there are STATE curricula, and at least one where it is a hodgepodge of STATE and LOCAL curricula. So, we are comparing apples to oranges to pears.

**The entire PURPOSE of an international study IS to compare different education systems to each other, which is exactly what TIMSS does. Just like the entire PURPOSE of a national study like NAEP is to make state to state comparisons to see what works and what fails. It’s not BAD to make international and national comparisons, it’s GOOD.**

**cgfunmathguy**

Also, we need to address the differences in systemic student handling. In the US, we send the vast majority of our students to high school; other countries reverse this entirely. Thus, the 12th-grade cohorts aren't even comparable between countries, even though they are presented as such by the media (among many others). While the 4th-grade cohorts may be similar, there is even some question about the comparing 8th-grade cohorts by some. For the two reasons above, I don't believe TIMSS is as valid an indicator of differences between national systems as its exhorters proclaim.

**This is patently false. Fortunately, it’s PROVABLY false. Our OWN data from NCES claims that 74% of American 18 year olds graduate from high school, compared to more than 90% in most industrialized nations:**

** **

**http://nces.ed.gov/pubs2001/2001034.pdf **

** **

**The reason nobody has ever posted a cite which disputes that is that there is no cite, AND TIMSS disputes it in a different direction, claiming that they found that only 63% of American students are in their “TCI”, compared to 82% in Switzerland, 84% in Norway, 75% in Germany, 88% in Slovenia, etc.**

** **

**http://timss.bc.edu/timss1995i/TIMSSPDF/SRAppA.pdf **

** **

**They found that 1,245,594 American children of high school graduation age, 67% of that population, weren’t even IN high school, and thus were never included in our already LOW TIMSS scores. If the worst students were the ones who weren’t in high school, can you even IMAGINE how low our scores would have been had they been INCLUDED? If this is the reason you don’t “believe TIMSS is as valid an indicator of differences between national systems as its exhorters proclaim”, you need to use your new-found knowledge to go back and rethink your position.**

**cgfunmathguy**

Finally, a word about why DvF keeps trying to get you to understand why comparing cohorts is important. Many states have been adjusting/rewriting their regulations (Pennsylvania), their state-mandated tests (Ohio), and their state-mandated curricula (Georgia) for the past decade or more. In mathematics, the National Council of Teachers of Mathematics (NCTM) issued its first set of standards on K-12 mathematics in 1989. This was the first step in the reform process, and several states began the process of reforming state curricula in the early 1990s. Others waited longer. However, the process is not an instantaneous one. As an example, Georgia instituted the Georgia Performance Standards (GPS) in 2003 or 2004. The standards still aren't fully implemented throughout the schools yet, and they won't be for two more years. So, yes, cohort matters, and we need to deal with the data that way. The only fair comparisons about gains and losses in the report's 12th-grade cohort would be to take the 2007 report's 12th-graders and compare that gap (assuming all the other confounding variables didn't exist) to the gap found in the 2003 report's 8th-graders and to the gap found in 1999 report's 4th-graders. This assumes that the tests across that EIGHT-YEAR SPREAD are equivalent.

**None of which is relevant. The entire POINT of TIMSS is to make international comparisons, not state to state comparisons. Your idea that something in our education system was the “first step in the reform process” is the same thing educators have been mimicking for years, and none of it ever worked. Furthermore, all American parents I know believe that every single one of these so-called “reforms” only brought us back quicker to the stone age and improved nothing.**

** **

**TIMSS also proves how SAT scores have been politicized, feminized, manipulated, and watered down to the point they’re no longer credible.**

** **

** **

**cgfunmathguy**

“For another view of it, let's look at your classroom. In a large lecture class, grades tend to be distributed "normally". This being the case, "curving" (with its true meaning) would assign Cs to the 68% of the students whose scores are within 1 SD of the mean. So, let's assume that the mean on Test 1 was 75 with a standard deviation of 8. So, any student with a score between 67 and 83, inclusive, should get a C. However, Susie with her 81 and Johnny with his 69 both got Cs! Is the difference significant? We don't know until we run tests on the scores. Even though the difference is 12 points (which is 1.5 SD), it is likely that this difference is NOT "statistically significant" at any appreciable level. To constantly quote raw numbers with no test results is worthless and misleading. Even those with an agenda don't do this because they know they will be accused of trying to bamboozle the people reading the report.”

**You complain about referring to different cohorts, then launch into a comparison between a large lecture room and an international study of hundreds of thousands of students.**

**You CANNOT compare these and make any sense out of it. You literally can’t adjust for guesses on multiple choice questions in the “large” lecture hall, but you CAN when there are hundreds of thousands of students taking the SAME test in their own languages. Do you know what TIMSS is? Before you invite anyone to “take a statistics class” again, you ought to invite yourself to examine their methodology. You are as wrong about this as you are about “In the US, we send the vast majority of our students to high school”.**

Note added Dec. 30, 2008: The standard error for a classroom of 25 students is the standard deviation of 100 divided by the square root of 25, or 5, which equals 20. The SE for a "large lecture class" of 289 is 100/17, or 5.9. To adjust for guesses on a five part multiple choice question, we need to add 20%, as this is how many would get it correct if they knew nothing and just guessed. In a classroom of 25, the standard error of 20 is the same size as the adjustment for guesses and is accurate in a range from 0 to 40. It gets better in a "large lecture class" with a standard error of 5.9, which means that the adjustment is accurate within the range of 14.1 to 25.9. With 9,000 tested by TIMSS in the US, the SE is 1, so the adjustment for guesses is accurate for the range 19 to 21.

Most like cgfunmathguy knows this and didn't want to admit it to the other forum members. But with the very low SAT and GRE scores for education majors, it's not impossible that he doesn't know this. Either way, I'm sure my math teachers weren't fun guys like him and instead took this subject a bit more seriously.