
The Error in Standardized Test Scores
The following data from the Harris Poll indicates that:
With such a large variation, it's important to know which type of poll is the most accurate when measuring personal drug use. It's much easier for someone on a telephone poll to give a misleading or dishonest answer than it is for someone in a personal interview for this type of question, so let's assume for this discussion that the personal interview is more accurate than the telephone poll. But with questions regarding personal issues which people don't want to be public record, or which could lead to legal action, it's obvious that an anonymous questionnaire would produce a more accurate result than a personal interview where embarassing or illegal activity could be tied to the specific interviewee. But a question about homosexuality would have a very different result. The least accurate poll would be a personal interview where people who are embarrassed about their homosexual behavior would just deny it. The most accurate poll would be the anonymous poll, with the telephone poll being in between.
"Measuring sexuality through polls can be shaky"by Felicity Barringer. New York Times, 25 April 1993, 23I. This article describes a Harris poll that estimated that 4.4 percent of American men and 3.6 percent of women had sex with a samesex partner in the five years before being interviewed. These results were part of previously unpublished research from a 1988 study of 739 men and 409 women aged 16 to 60. Taylor, president of Harris, comments on the difficulty of obtaining reliable figures and mentions some classic surveys to show the significance of measurement errors as compared to sampling errors. For example, a survey taken five years after Collier's magazine had ceased publication showed that a large number of people were still reading it. William Aquilino at Wisconsin has experimented with how answers differ depending on how the survey is done. For example, on an anonymous questionnaire 28 percent of the whites said that they had ever used cocaine as compared with 25% with the same question in a onetoone interview, and 21% when asked over the phone. For blacks the corresponding figures were 23, 16, and 12. Corresponding percentages when asked about crack were 8 for confidential, 3 for interview, and 1 for phone.
http://www.amstat.org/publications/jse/v1n1/resource.html
Teaching Bits: A Resource for Teachers of StatisticsJournal of Statistics Education v.1, n.1 (1993) Joan B. Garfield
This column features "bits" of information sampled from a variety of sources that may be of interest to teachers of statistics. Joan will be abstracting information from the literature on teaching and learning statistics, while Laurie will be summarizing resources from the news and other media that may be used with students to provoke discussions or serve as a basis for classroom activities or student projects. We realize that due to limitations in the literature we have access to and time to review, we may overlook some potential articles for this column, and therefore encourage you to send us your reviews and suggestions for abstracts. From the Literature on Teaching and Learning Statistics"A Course Called Chance"by J. Laurie Snell and John Finn (1992). Chance, 5(34), 1217. A new course inspired by Chance magazine has been developed and taught at several liberal arts colleges. The aim of the Chance course is to study important current news items whose understanding requires a knowledge of chance items. The course was not designed to replace an introductory course in probability and statistics but was instead developed to encourage students to think more rationally about chance events and make them more informed readers of the daily press. This article describes many details about the course, such as the topics, materials and resources used, the teaching approaches, and successes and failures experienced in implementing the course. "What's Missing in Statistical Education?"by Ronald Snee (1993). American Statistician, 47, 149154. Taking the stand that traditional approaches to teaching statistics have not worked and that significant changes are needed, Snee argues that we can help students better learn statistical thinking and methods by focusing the content and delivery of statistical education on how people use statistical thinking in reallife situations, by deepening our understanding of how people learn, and by creating value for statistical thinking. He offers recommendations for changing both the content and delivery of statistical education. "Getting More Data into Theoretical Statistics Courses"by Thomas L. Moore (1992). PRIMUS: Problems, Resources, and Issues in Mathematics Undergraduate Studies, 2, 348356. Although it has been argued that statistics is not a subdiscipline of mathematics but is rather a separate discipline, most introductory statistics courses are taught in mathematics departments by mathematicians. At many small colleges the only statistics course offered to a mathematics major is the standard sequence in probability and mathematical statistics, which does not teach students how to apply statistical methods to real data. Moore offers concrete suggestions on how to improve students' experience in this introductory math/stat course by infusing the course with more data and applications. Not only do students have handson experiences with data, they are also more likely to leave the course better able to appreciate the nature of statistics as a discipline separate from mathematics. "Probability and Statistics: Connecting Research to Teaching"by J. Michael Shaughnessy (1993). The Mathematics Teacher, 86, 244248. This paper summarizes some of the research literature related to teaching and learning probability and statistics and makes recommendations on how to better teach these topics. The review includes psychological literature on how people use and misuse heuristics in making judgments and decisions about chance events, educational research documenting attempts to change students' beliefs and misconceptions, and a review of how probability and statistics are often misused in everyday situations. Shaughnessy concludes that students' preconceived notions of chance, randomness and probability often conflict with what they are taught in statistics courses, that simulations can be used to help change some of students' misconceptions, and that examples of misuses and abuses of statistics should be included in courses so that students can detect and correct these misuses. "Statistical Thinking and Techniques: A General Education Requirement"by Mary H. Hudson (1992). In Quality Quest in the Academic Process, edited by J. W. Harris and J. M. Baggett, Birmingham, AL: Samford Press, 113126. Hudson suggests that learning and applying the statistical thinking theories and techniques of the Deming management philosophy of Quality Improvement in an introductory statistics courses can dramatically improve the overall quality of college graduates. A statistics course with an emphasis on using Statistical Process Control methodology is described that is designed to help students develop statistical critical thinking, team problem solving, and writing and communication skills that will enable lifelong learning, and develop important skills necessary for the competitive job market, namely statistical interpretation and analysis of data. "A Radically Different Approach to Introductory Statistics"by Robert L. Wardrop (March 1992). Technical Report No. 889, Department of Statistics, University of Wisconsin, 1210 West Dayton St., Madison, WI 53706. This paper describes a new approach to a one semester introductory statistics course that is designed to enable students to discover that statistics can be an important tool in daily life. Taking an approach that statistical work is like scientific work, the course focuses on scientific questions and how statistical thinking can shed light on their solutions. Data are preeminent and methods achieve importance through their ability to illuminate data sets, in contrast to the common practice of methods being the focal point and data sets being reduced to illustrating methods. "Teaching Elementary Probability and Statistics: Some Applications in Epidemiology"by Hardeo Sahai and Michael R. Reesal (1992). School Science and Mathematics, 92(3), 145149. Sahai and Reesal illustrate some common applications of probability and statistics in the field of epidemiology as they might be presented to students in an undergraduate course in probability and statistics. Because epidemiologists seek to discover associations between events, patterns and causes of disease in human populations, they frequently work with rates, proportions, and other quantitative measures of occurrence, prevalence, and causes of disease. Some common problems in epidemiological studies are described where techniques of elementary probability and statistical inference yield results of considerable interest. "Uncertainty"by David Moore (1990). In On the Shoulders of Giants: New Approaches to Numeracy, edited by Lynn Steen, Washington, DC: National Academy Press, 95137. This is a mustread paper by Moore outlining some of the basic ideas of "data" and "chance". Acknowledging that these topics are increasingly being included in school mathematics curricula, Moore outlines the mathematical ideas related to "data" and "chance," clarifying overall themes and strategies within which individual topics are naturally placed. He points out difficulties in teaching as well as advantages in using these topics in the mathematics curriculum. "A Cooperative Learning Activity on Methods of Selecting a Sample"by E. Jacquelin Dietz (1993). The American Statistician, 47, 104108. Cooperative learning has received a lot of favorable press, and this paper describes the use of a cooperative learning activity in an introductory statistics class. Students who had not yet studied sampling worked in small groups to generate three different methods for obtaining representative samples from a population of student data. After comparing sample statistics to population parameters, groups evaluated the advantages and disadvantages of each method. Inventing rather than learning about sampling techniques appears to be a powerful way to learn important ideas about sampling. "Teaching Statistics"by George Cobb (1992). In Heeding the Call for Change: Suggestions for Curricular Action, edited by Lynn Steen, MAA Notes Series, Washington, DC: Mathematics Association of America, 343. This comprehensive report on teaching statistics presents a summary of an email focus group discussion organized by the MAA on the introductory statistics course. Cobb divides the report into five sections: Recent Changes in the Field of Statistics, Some Differences between Mathematics and Statistics, What Research Tells Us (about teaching statistics), Examples (of some ways the recommendations are being implemented), and Making it Happen (broad recommendations about implementation). Topics for Discussion from Current Newspapers and Journals"Formula projects limits seen on human existence"by Malcolm W. Browne. New York Times, 1 June 1993, 1C. and "Implications of the Copernican Principle for Our Future Prospects"by Richard Gott, III (1993). Nature, 363, 315319. The New York Times article provides a popular account of the work of Richard Gott from Princeton University reported in the Nature article. Gott describes a method for obtaining 95% confidence limits on such things as how much longer a species (for example us) will survive. In the Times article it is described informally as follows. The idea is that we find ourselves at a randomly chosen point of the lifetime of whatever it is, and thus with probability .95 we are somewhere between 1/40 and 39/40 finished. Say, for example, we estimate that the human race has been around for about 200,000 years. If we're only 1/40 of the way through the lifetime of the human race, then we have 39 times 200,000 years or about 8 million years left to go. If we're at the other end of the 95% confidence interval, that is 39/40 of the way through, then we have only 1/39 times 200,000 years or about 5000 years to go. In the Nature article Gott gives some indication of how he justifies his confidence intervals. For example, he assumes that intelligent species are being formed in the universe at a uniform rate and are subject to a constant but unknown extinction rate. Let PAST be the time from the present back to the beginning and FUTURE the time from the present to the end. Then PAST and FUTURE are independent random variables with common exponential density and so their ratio has distribution F(x) = x/(1+x) from which the 95% confidence interval (1/39)*PAST < FUTURE < 39*PAST follows. I don't think this model is completely specified yet. The author connects his results with other results in cosmology such as an argument of Carter that there must be on the order of one improbable event required for the formation of intelligent life. See Carter, B. (1983), Phil. Trans. R. Soc. A, 310, 347363. "Gender bias charged in National Merit Scholarship Test"by Elizabeth Shogren. Los Angeles Times, 26 May 1993, 1A. The organization FairTest reports that only about 35% of the National Merit Scholarship winners are girls despite the fact that studies suggest that girls do as well or better than boys in grades in high school and college. The choice of semifinalists for the National Merit Scholarship is based entirely on the results of the Preliminary Scholastic Aptitude Test. Reference is made to the article below that documents the fact that women do less well than men on SAT exams. "Sex Differences in Performance on the Mathematics Section of the Scholastic Aptitude Test: A BiDirectional Validity Study"by Howard Wainer and Linda S. Steinberg (1992).Harvard Educational Review, 62, 323336. This study by Educational Testing Service researchers reviews the literature concerning the difference between men and women on the mathematics SAT tests and reports on their own study. They compare SATM scores for men and women with the same grades and also grades for men and women with the same SATM scores. They show that women consistently do about thirty points lower on these tests. Since women do as well or better than men in college, this raises questions about the proper use of the SATM exams in college admission and competitive scholarship programs. They discuss some possible solutions ranging from giving women extra points to doing nothing. The authors favor continuing to try to understand what is going on. "A costlier heart drug also proves better"by Lawrence K. Altman. New York Times, 1 May 1993, 7I. This is an account of a large international clinical trial involving 41,000 patients to see which of two clotbusting drugs is more effective in preventing death immediately following a heart attack. The two drugs are tissue plasminogen activator (tPA), a genetically engineered substance that costs $2,400 a dose, and Streptokinase, an older drug derived from bacteria and costing $240 a dose. The study was designed to test four different strategies as they affect the patient up to 30 days after the heart attack. Participants were assigned at random to the four groups. All received aspirin and a blood thinner, heparin. Group 1: Rapid infusion of tPA plus intravenous heparin. Death rate 6.3%, stroke rate .6%. Group 2: Somewhat slower infusion of tPA plus streptokinase and intravenous heparin. Death rate 7%, stroke rate .6%. Group 3: Streptokinase and under the skin injection of heparin. Death rate 7.2%, stroke rate .5%. Group 4: Streptokinase and intravenous heparin. Death rate 7.4%, stroke rate .5%. Here death rate means deaths in the first 30 days and stroke means disabling stroke. This study was released to the newspapers before being reviewed and accepted for publication. Previous studies had suggested that the two drugs were about equally effective. Most accounts of this study reported that the study conclusively settles the issue, quoting Dr. Topol, director of the study, as saying that the study shows a 14 percent reduction in risk with tPA. This provides a good opportunity to discuss such misleading use of percentages. This is also a good article to use to discuss ethical and economic issues in medicine. One student remarked that being offered the expensive drug was like playing Russian Roulette with 1000 slots and seven bullets and being offered to have one bullet taken out for $2000. "Sex survey of American men finds 1% are gay"by Felicity Barringer. New York Times, 15 April 1993, 1A. A study released by the Alan Guttmacher Institute estimated that the percentages of men who have engaged in homosexual sex and who consider themselves exclusively homosexual are about 2 percent and 1 percent, respectively. These estimates are in contrast to the conventional ten percent said to have originated with the Kinsey studies. They are consistent with other large studies done recently at the University of Chicago and in Europe. The study was carried out by researchers at the Battelle Human Affairs Research Center in Seattle. It was based upon face to face interviews in which subjects were guaranteed anonymity. It contained lots of other information about the sex habits of American men. For example, the median number of sexual partners was 7.3 for white men. No one seemed to be bothered by a median of 7.3 for integer valued data but some papers like the New York Times decided to call it a mean instead. (It was a median in the study and based on making the number of partners a continuous variable!). The study itself is reported in the March/April 1993 issue of Family Planning Perspectives. "Measuring sexuality through polls can be shaky"by Felicity Barringer. New York Times, 25 April 1993, 23I. This article describes a Harris poll that estimated that 4.4 percent of American men and 3.6 percent of women had sex with a samesex partner in the five years before being interviewed. These results were part of previously unpublished research from a 1988 study of 739 men and 409 women aged 16 to 60. Taylor, president of Harris, comments on the difficulty of obtaining reliable figures and mentions some classic surveys to show the significance of measurement errors as compared to sampling errors. For example, a survey taken five years after Collier's magazine had ceased publication showed that a large number of people were still reading it. William Aquilino at Wisconsin has experimented with how answers differ depending on how the survey is done. For example, on an anonymous questionnaire 28 percent of the whites said that they had ever used cocaine as compared with 25% with the same question in a onetoone interview, and 21% when asked over the phone. For blacks the corresponding figures were 23, 16, and 12. Corresponding percentages when asked about crack were 8 for confidential, 3 for interview, and 1 for phone. "China's crackdown on births: a stunning, and harsh, success"by Nicholas D. Kristof. New York Times, 25 April 1993, 1I. China's strict enforcement of their policy to restrict families to one or two children has made a dramatic drop in fertility rate from 2.5 children per family in 1988 to 1.9 per family in 1992. The problem of the missing girls is mentioned. Normally there should be about 106 boys born for every 100 girls. In 1989 in China there were 113.8 boys for every 100 girls. This is explained in terms of underreporting, use of ultrasound for selective abortion, etc. This article provides a good opportunity to try the ever popular question: If a family has children only until they have a boy will there be, on average, more boys or girls? "Ask Marilyn"by Marilyn vos Savant. Parade Magazine, 20 September 1993. Marilyn was asked to answer the following letter: I am asked to select one of two envelopes and told only that one contains twice as much money as the other. I find $100 in the envelope I select. Should I switch to the other one to improve my worldly gains?  Barney Blissinger, Hershey PA. This is an old chestnut: If your envelope has x then you are tempted to think that the other envelope has .5x or 2x with equal probabilities. Then if you switch, your expected value is 1.25x, which is an improvement. Marilyn argues that it does not make any difference since you gain no information from knowing the amount in your envelope. She is slightly wrong about this and the fact that she is wrong follows from a second very nice related paradox. Suppose you are just told that two distinct numbers are put in two envelopes and you choose an envelope at random and look at your number. Can you give a strategy for telling if you have the larger or the smaller of the two numbers that has a probability greater than 1/2 of being correct? The answer is yes. This question received a lot of interesting discussion on the EdStatL discussion group. You can easily find this discussion by doing a search on vos Savant in the folder "EdStatL Highlights and Archives" in the Journal of Statistics Education Information Service. Of course, the discussion of this question cannot match that of the famous Monte Hall problem. The following article is perhaps the most complete summary of all that has been said about this problem. "The Problem of the Car and Goats"by E. Barbeau (1993). The College Mathematics Journal, in the column "Fallacies, Flaws, and Flimflam," 24, 149154. The infamous Marilyn vos Savant discussion of the Monte Hall problem is discussed and put in context with a number of other closely related paradoxes such as the prisoner paradox, Bertrand's paradox, paradox of the second ace, etc. An excellent historical account. "Answering Questions About Baseball Using Statistics"by Bill James, Jim Albert, and Hal S. Stern (1993). Chance, 6, 914. If you have ever tried to discuss statistics of sports in terms of such concepts as streaks in sports or records in sports you will agree with the authors of this article that sports fans do not view sports statistics the way a statistician does. One need only read the current newspaper articles concerning the fate of Anthony Young of the New York Mets as he established the alltime record for the number of consecutive losses for a pitcher (23). No longer will 22 losses be significant! The authors give a marvelous discussion of how sports fans do view statistics and contrast this with the way that statisticians might use statistical modeling to answer such questions as "Is it possible that the weakest team in a division would end up winning the World Series because they are lucky?" "What Happened to HIV Transmission Among Drug Injectors in New Haven"by Edward H. Kaplan and Robert Heimer (1993). Chance, 6, 914. In 1990 New Haven introduced a needle exchange program for injecting drug users. This article tells the story of how they evaluated the success of this program. Previous evaluations of such programs had been based on selfreporting, which is subject to a lot of response bias in this kind of situation. Without the exchange program prescriptions were necessary for legally obtaining needles and so it was natural that they would be used a large number of times. With the needle exchange program it was hypothesized that the number of times a particular needle would be used would be decreased, making it less likely that the needle itself would become infected. Thus, the New Haven group made a simple model in terms of the rate at which needles are removed from the population for reasons not related to the exchange program and the rate at which they are returned through the program. On the basis of their model they conjectured that (a) the needle circulation times should decrease, (b) the fraction of distributed needles returned should increase, (c) the level of infection in returned needles should decrease. They show how they used their data to verify these conjectures. "Testing Color Proportion of M&M's"by Roger W. Johnson (1993). Teaching Statistics, 15, 24. According to Mars consumer affairs, the color ratio for the six colors in M&M's should be 30% brown, 20% yellow, 20% red, 10% orange, 10% green, and 10% tan. The author provides his students with small bags of M&M's and asks them to collect data to test the hypothesis that the Mars claim is correct using the standard chi square goodnessoffit statistic. He suggests that the distribution of this statistic be obtained by simulation using Minitab or some such program. He reports that the hypothesis is consistently rejected when he does this experiment. This is a good example to illustrate some real world problems such as: How are the colors mixed? How are they put in bags? etc. "Coke or Pepsi?"by Maita Levine and Raymond H. Rolwing (1993). Teaching Statistics, 15, 45. The authors describe an experiment with students to see if they can tell the difference between Pepsi and Coke as an introduction to tests of hypotheses. Students are encouraged to think about how to design an experiment to do this. The class is then asked to carry out an experiment using 20 trials requiring 12 or more successes to establish the claim that they can tell the difference. The authors bet that they cannot tell and report that they have won only twice in many such experiments, suggesting that many students can tell the difference. However, they admit that they are being generous accepting a criteria that gives a 25% chance of establishing the claim by just guessing. We have tried this, breaking the class up into groups and having them design and carry out an experiment. It is interesting to see what they come up with. 