The SAT Math Equivalent (SATME)

Education in States and Nations: 1991

(ESN) Indicator 25: Note on mathematics proficiency

Notes on Figure and Tables CanadaNine of ten provinces.

England, Scotland

School or student response rate is below the 85 percent standard employed by INES.

Israel

Hebrew-speaking schools.

Italy, Spain

Ninety percent or less of the international target population was sampled.

Portugal, Switzerland

School or student response rate is below the 85 percent standard employed by INES. Ninety percent or less of the international target population was sampled.

Soviet Union

Fourteen of fifteen republics. Russian-speaking schools only.

Spain

All regions except Catalu�a. Spanish-speaking schools only.

Switzerland

Fifteen of twenty-six cantons included.

United States

The U.S. sample for the International Assessment of Educational Progress (IAEP) consisted of both public and private schools. Only 13-year-olds were included. The state samples for the National Assessment of Educational Progress (NAEP), on the other hand, consisted of 8th grade classrooms only in public schools. On average, students in the state samples were likely to be older than those in the U.S. sample in the IAEP.

Technical Notes

Description of levels of mathematics proficiency Level 350: Multi-Step Problem Solving and Algebra Students at this level can apply a range of reasoning skills to solve multi-step problems. They can solve routine problems involving fractions and percents, recognize properties of basic geometric figures, and work with exponents and square roots. They can solve a variety of two-step problems using variables, identify equivalent algebraic expressions, and solve linear equations and inequalities. They are developing an understanding of functions and coordinate systems.

Level 300: Moderately Complex Procedures and Reasoning
Students at this level are developing an understanding of number systems. They can compute with decimals, simple fractions, and commonly encountered percents. They can identify geometric figures, measure lengths and angles, and calculate areas of rectangles. These students are also able to interpret simple inequalities, evaluate formulas, and solve simple linear equations. They can find averages, make decisions on information drawn from graphs, and use logical reasoning to solve problems. They are developing the skills to operate with signed numbers, exponents, and square roots.

Level 250: Numerical Operations and Beginning Problem Solving
Students at this level have an initial understanding of the four basic operations. They are able to apply whole number addition and subtraction skills to one-step word problems and money situations. In multiplication, they can find the product of a two-digit and a one-digit number. They can also compare information from graphs and charts, and are developing an ability to analyze simple logical relations.

Level 200: Beginning Skills and Understandings
Students at this level have considerable understanding of two-digit numbers. They can add two-digit numbers, but are still developing an ability to regroup in subtraction. They know some basic multiplication and division facts, recognize relations among coins, can read information from charts and graphs, and use simple measurement instruments. They are developing some reasoning skills.

Level 150: Simple Arithmetic Facts
Students at this level know some basic addition and subtraction facts, and most can add two-digit numbers without regrouping. They recognize simple situations in which addition and subtraction apply. They also are developing rudimentary classification skills.

Issues in Linking Different Tests

Indicator 25 uses data drawn from two sources. The data for the countries included in Figure 25 and Table 25a were obtained from the 1991 International Assessment of Educational Progress (IAEP), which tested 13-year-olds in public and private schools in participating countries. The data for the states included in Figure 25 and Table 25b were obtained from the 1992 National Assessment of Educational Progress (NAEP) Trial State Assessment, which tested eighth graders in public schools. In order to compare the mathematics achievement of the countries, which were tested as part of the IAEP, and the states, which were tested as part of the NAEP, it is necessary to link scores on the two tests.

Several approaches to test linking are available, and the appropriate linking strategy depends on characteristics of the tests involved. Mislevy (1992) describes four main strategies: equating, calibration, projection, and moderation.

bullet

Each three SATME point increase equals a 1% increase in correct answers.

bullet

SATME shows that a score of 420 on SAT Math equals zero math skills.

bullet

IAEP/NAEP Crosslink Data.

choice of an appropriate strategy to use in linking the IAEP and the NAEP depends on the degree to which the two tests measure the same constructs in the same ways. Overall, the IAEP and NAEP have a number of similarities and differences. The IAEP curriculum framework was adapted from the framework used for the NAEP, and the two tests contain similar (but not identical) items and were administered using similar procedures. In addition, both tests have been scaled using item response theory (IRT) methods. (4)

At the same time, the two tests also differ in a number of ways, most notably in that the IAEP was explicitly designed to be administered in countries that differ in language, curriculum and instructional practice, while the NAEP was not. In addition, the tests differ in length. In the IAEP mathematics assessment, one common form of the test was administered to all 13-year-olds. The form included 76 items and students were given 60 minutes to complete the assessment (not including time for background questions). In the NAEP mathematics assessment, 26 different test booklets were prepared, each containing a somewhat different number of items, and each sampled student completed one booklet. A typical NAEP booklet included about 60 items, and students were given 45 minutes to complete the assessment (not including time for background questions). Because the IAEP was somewhat longer than the NAEP, the IAEP may provide somewhat more reliable individual-level scores.

Given the similarities and differences among the tests, it would be plausible to consider linking the tests through a process of calibration, projection, or moderation. Because the IAEP and NAEP tests differ in the detailed curriculum frameworks employed as well as in reliability, we chose a form of projection to predict NAEP scores from IAEP scores.

The projected NAEP scores reported for Indicator 25 are based on analyses conducted by Pashley and Phillips (1993) and Pashley, Lewis, and Yan (1994). In developing their estimates, Pashley and Phillips relied on data collected in a "linking study," in which both the IAEP and NAEP instruments were administered to a sample of 1,609 U. S. students who were in eighth grade or thirteen years old in the spring of 1992. Pashley and Phillips used the linking study data to estimate a linear regression model predicting a student's NAEP score on the basis of his or her IAEP score. (5)   (See Table S21, row A, for the estimated coefficients.)  (6)     They then used the regression equation to develop predicted NAEP scores for the students in the IAEP sample in each participating country. (7)   Using the predicted scores, Pashley and Phillips obtained various statistics, including the means and percentile scores for the nations presented in Indicator 25. (Table S22, column A, provides the projected NAEP-scale means Pashley and Phillips obtained for each IAEP country.)

Table S21 Sensitivity of parameters used to link mean IAEP scores for countries to the NAEP scale to data source and method

  
--------------------------------------------------------------------------------------------
 
                                                                            Additional NAEP
 
                                                    Projected NAEP score    points per IAEP
 
Samples used                             Method         at (IAEP = 500)     point above 500
 
--------------------------------------------------------------------------------------------
 
 A  (IAEP cross-linking sample)         Projection         265                  0.44
 
 B  (IAEP cross-linking sample)         Moderation         263                  0.53
 
 C  (IAEP and 1990 NAEP Trial State     Moderation         264                  0.69
 
    Assessment in public schools)
 
 D  (IAEP and 1992 NAEP Trial State     Moderation         270                  0.72
 
    Assessment in public schools)
 
--------------------------------------------------------------------------------------------
 
    

and SOURCE: The IAEP scale range is from 0 to 1000; the NAEP scale range is from 0 to 500. Parameters in this table were calculated using information on the means and standard deviation of scores in each sample and, for line A, the correlation of the scores in the cross-linking sample. Pashley and Phillips (1993) used the sample and method of line A. Beaton and Gonzales (1993) used the samples and method of line C.

Table S22 Alternative projections of country mean IAEP scores onto the NAEP scale, by country

  
----------------------------------------------------------------------
 
                        Samples and Method  |Difference in projections
 
----------------------------------------------------------------------
 
Country                 A     B     C     D  (B - A) (C - A) (D - A)
 
----------------------------------------------------------------------
 
Taiwan                 285   287   297   303    2      12       6
 
Korea                  283   286   294   301    3      11       7
 
Switzerland1           270   281   288   294    2       9       6
 
Soviet Union2          279   281   288   294    2       9       7
 
Hungary                277   279   285   291    2       8       6
 
France                 273   274   278   284    1       5       6
 
Emilia Romagna, Italy3 272   272   276   283    0       4       6
 
Israel4                272   272   277   283    0       5       6
 
Canada5                270   270   274   280    0       4       6
 
Scotland               269   270   272   279    1       3       6
 
Ireland                269   268   271   277   -1       2       6
 
Slovenia               266   265   267   273   -1       1       6
 
Spain6                 263   261   262   267   -2      -1       5
 
United States7         262   260   262   266   -2       0       4
 
Jordan                 246   241   236   240   -5     -10       4
 
----------------------------------------------------------------------
 
    

out of 26 cantons.

2 Fourteen out of 15 republics; Russian-speaking schools only.

3 Combined school and student participation rate is below .80 but at least .70. Interpret with caution due to possible nonresponse bias.

4 Hebrew-speaking schools only.

5 Nine out of 10 provinces.

6 All regions except Catalu�a; Spanish-speaking schools only.

7 Eighth-graders took the test and not all were 13 years old.

Samples and Method

A. Cross-linking sample and projection method

B. Cross-linking sample and moderation method

C. IAEP and NAEP 1990 public school samples and moderation method

D. IAEP and NAEP 1992 public school samples and moderation method

Difference in projections

(B - A) Moderation versus projection in same (cross-linking) sample

(C - A) Moderation and 1990 NAEP/IAEP samples versus projection and cross-linking sample

(D - A) 1992 NAEP/IAEP versus 1990 NAEP/IAEP both using moderation method

NOTE and SOURCE: Countries are sorted from high to low based on their mean scores using sample and method A -- Cross-linking sample and projection method. Columns B and D are from Pashley, Lewis, and Yan (1994) and Beaton and Gonzales (1993), respectively. Both used student weighted data. Columns A and C are based in part on tabulations produced by the IAEP Processing Centre in June 1992. It appears that these tabulations did not use student weights. For most countries, the use of weights made little difference for estimated country mean IAEP scores. Switzerland is an exception, due to a complex sample design used there. Therefore, an unpublished weighted mean IAEP score of 532.36 was used instead of the published unweighted mean of 538.75 for Switzerland.

The most widely discussed alternative to the projection method used by Pashley and Phillips is a moderation method carried out by Beaton and Gonzalez (1993). Beaton and Gonzalez based their analysis on the 1991 IAEP United States sample and the 1990 NAEP eighth grade winter public school sample. They translated IAEP scores into NAEP scores by aligning the means and standard deviations for the two tests. (8)   Using the techniques of linear equating, they estimated conversion constants to transform the U.S. IAEP scores into a distribution having the same mean and standard deviation as the 1990 NAEP scores. (The conversion constants are shown in Table S21, row C.) They then used these conversion constants to transform the IAEP scores for the students in the IAEP samples in each participating country into equivalent NAEP scores. (The moderated country NAEP-scale means produced by Beaton and Gonzalez are shown in Table S22, column C. Full state and nation results for Indicator 25 using the Beaton and Gonzalez method are displayed in Table S23.)

The projection method used to develop Indicator 25 and the moderation method used by Beaton and Gonzalez produce somewhat different results, especially for countries with high average IAEP scores. (See Table S22.) For example, Korea is estimated to have a 1992 NAEP score of 283 using the projection method employed in Indicator 25 (see column A), while it has an estimated 1990 NAEP score of 294 using the Beaton and Gonzalez method (see column C).

The observed differences in transformed scores can be attributed in part to differences in the data sets on which Pashley and Phillips and Beaton and Gonzalez rely in developing their estimates. The students in the "linking study" sample used by Pashley and Phillips included both 13-year-olds and eighth graders in public and private schools. Beaton and Gonzalez used two samples to develop their estimates: the regular 1991 U.S. IAEP sample, and the regular winter eighth-grade 1990 NAEP administration. The 1991 United States IAEP sample on which they relied included 13-year-olds (but not other eighth graders) in public and private schools, while the 1990 NAEP sample included eighth graders (but not other 13-year-olds) in public schools only. ( 9)  Perhaps as a result of these differences, the estimation samples have somewhat different distributions. Both estimation methods are particularly sensitive to the ratio of the standard deviations for the NAEP and IAEP. (10)   In the linking sample used to develop the projection estimates, the ratio of the NAEP and IAEP standard deviations was about 0.53, while, for the samples used by Beaton and Gonzalez, the ratio of standard deviations was about 0.69. This difference in standard deviations generates predicted NAEP scores based on the projection method that are less distant from the mean than are the equivalent scores based on the Beaton and Gonzalez method.

To examine the sensitivity of the results to the samples used, we applied the Beaton and Gonzalez method to the data in the "linking sample" used by Pashley and Phillips. (11)   The conversion coefficient estimates are shown in Table S21, row B, and the estimated country NAEP means are shown in Table S22, column B. (12)   The estimated country means are much closer to the projection results obtained by Pashley and Phillips (column A) than are the Beaton and Gonzalez results obtained using the regular IAEP and 1990 winter public eighth grade samples. For example, the difference in the projection and moderation estimates for Korea drops from 11 to 3 points.

To explore this issue further, we applied the moderation method using one additional NAEP data set: the 1992 public eighth grade sample. (This sample corresponds to the sample used in the 1992 Trial State Assessment on which the state results in Indicator 25 are based.) The conversion coefficients are displayed in Table S21 (row D); and the moderated NAEP-scale country means are displayed in Table S22 (column D). This sample produces country results more extreme than do any of the other samples we tried.

These experiments clearly indicate that different samples produce different results. But the experiments do not indicate which sample is "best". One advantage of the linking sample used by Pashley and Phillips is that the same students took both the IAEP and the NAEP. Hence, the estimated conversion coefficients are not biased by possible differences between the IAEP and NAEP samples. But the fact that the IAEP standard deviation in the linking sample is substantially higher than the standard deviation in the regular U.S. administration of the IAEP, while the NAEP standard deviation in the linking sample is similar to the regular NAEP standard deviation, may at least in part counterbalance the other apparent advantages of the linking sample.

In addition to the effects of the sample on coefficient estimates, several conceptual issues should be considered in evaluating linking methods. We briefly review three of these issues below: the age or grade-level interpretation placed predicted test scores; the effects on coefficient estimates of unreliability in the measures; and potential country-level contextual effects.

First, different linking approaches may produce results that differ in the age or grade-level for which the predicted scores are intended to apply. For example, since the data used by Pashley and Phillips to derive their coefficient estimates involved a sample of students who completed both the IAEP and the NAEP, the predicted NAEP scores based on their coefficients should be viewed as the NAEP scores that would be obtained by students of the same age or grade as the students whose IAEP scores are used as predictors. Since the regular country administration of the IAEP involved sampling 13-year-olds, the predicted NAEP scores using the Pashley and Phillips method should be viewed as predicted NAEP scores for 13-year-old students. The predicted NAEP scores obtained by Beaton and Gonzalez, on the other hand, should be interpreted as the scores 13-year-olds who took the IAEP would receive if they completed the NAEP in eighth grade. (13)   Since average NAEP scores for eighth-graders are generally somewhat higher than average scores for 13-year-olds, the approach to sample specification used by Beaton and Gonzales is likely to produce somewhat higher scores than the approach used by Pashley and Phillips.

Linking methods may also differ in their sensitivity to unreliability in the predictor variable (in this case, the IAEP). In general, regression estimates of the effects of variables measured with error will be biased toward zero. Hence, projection coefficients estimated using unreliable measures are likely to be attenuated. (14)   The effects of unreliability on conversion coefficients obtained using moderation methods are more difficult to determine. In the special case in which the predictor and outcome variables are measured with the same reliability, the moderation coefficients should be roughly unbiased. (15)  

Finally, linking methods that are based on data from a single country may not properly reflect country-level contextual effects. Suppose, for example, that individual NAEP and IAEP scores were obtained for a sample of students in each of n countries. (16)   Both the projection and moderation methods rest on an assumption that the relationship between IAEP and NAEP scores (pooling students across countries) can be expressed as a simple linear model of the form:

 
 
  estimated NAEP score = constant + slope * IAEP score
 
    

is possible, however, that country-context effects exist. One simple specification might involve the addition of country dummies to the simple linear model above. If the country dummies differ significantly from zero, the within-country regression of NAEP scores on IAEP scores will not properly produce between-country relationships. Contextual effects of this sort might arise, for example, if the standardized test style used in the IAEP and NAEP is quite common in some countries, but rarely used in others. Unfortunately, without linked IAEP and NAEP data for a sample of countries, the possibility of contextual effects cannot be ruled out.

This brief discussion clearly indicates that different methods of linking the IAEP and NAEP can produce different results, and further study is necessary to determine which method is best. For this reason, Indicator 25 is labeled "experimental."

For more information on cross-linking and on the specific approaches used in developing Indicator 25, see Peter J. Pashley and Gary W. Phillips, Toward World-Class Standards: A Research Study Linking International and National Assessments (Princeton, NJ: Educational Testing Service, June, 1993); Peter J. Pashley, Charles Lewis and Duanli Yan, "Statistical Linking Procedures for Deriving Point Estimates and Associated Standard Errors," paper presented at the National Council on Measurement in Education (Princeton, NJ: Educational Testing Service, April, 1994); Albert E. Beaton and Eugenio J. Gonzalez, "Comparing the NAEP Trial State Assessment Results with the IAEP International Results," Setting Performance Standards for Student Achievement: Background Studies (Stanford, CA: National Academy of Education, 1993); Robert J. Mislevy, Albert E. Beaton, Bruce Kaplan, and Kathleen M. Sheehan, "Estimating Population Characteristics from Sparse Matrix Samples of Item Responses," Journal of Educational Measurement, Summer, 1992, vol 29, no 2, pp 133-161; and Robert J. Mislevy, Linking Educational Assessments: Concepts, Issues, Methods, and Prospects (Princeton, NJ: Educational Testing Service, December, 1992).

Table S23 Mathematics proficiency scores for 13-year-olds in countries and public school 8th-grade students in states, calculated using the equi-percentile linking method, according to Beaton and Gonzales, by country (1991) and state (1990)

  
-----------------------------------------------------------------------------------------
 
                                        |                 Percent of population
 
                                        |           in each proficiency score range
 
-----------------------------------------------------------------------------------------
 
COUNTRY/State          Mean       SE    | <200 200-250 250-300 300-350>350
 
-----------------------------------------------------------------------------------------
 
TAIWAN                 296.7     1.5      3.2       13.4       33.9       36.6      12.9
 
KOREA                  294.1     1.3      1.9       10.3       41.8       39.3       6.7
 
SOVIET UNION           287.6     1.5      0.8       10.4       53.1       34.0       1.7
 
SWITZERLAND            287.5     1.9      0.2        8.8       57.9       32.2       0.9
 
HUNGARY                284.8     1.4      1.4       13.5       52.6       29.9       2.7
 
North Dakota           281.1     1.2      0.8       13.2       60.0       24.8       1.3
 
Montana                280.5     0.9      0.5       14.3       59.5       24.9       0.8
 
FRANCE                 278.1     1.3      1.4       16.8       57.5       23.4       1.0
 
Iowa                   278.0     1.1      0.6       18.3       57.0       23.3       0.7
 
ISRAEL                 276.8     1.3      1.5       15.6       61.6       20.7       0.6
 
ITALY                  276.3     1.4      1.6       18.1       57.7       22.0       0.5
 
Nebraska               275.7     1.0      2.0       18.6       56.2       22.4       0.9
 
Minnesota              275.4     0.9      1.6       19.2       57.0       21.2       1.1
 
Wisconsin              274.5     1.3      1.5       20.8       55.4       21.6       0.7
 
CANADA                 274.0     1.0      1.4       17.6       63.7       16.7       0.7
 
New Hampshire          273.2     0.9      1.4       21.2       58.1       18.9       0.5
 
SCOTLAND               272.4     1.5      1.6       20.6       59.7       17.7       0.4
 
Wyoming                272.2     0.7      1.1       20.9       60.3       17.4       0.2
 
Idaho                  271.5     0.8      1.2       22.1       59.7       16.8       0.2
 
IRELAND                271.4     1.4      3.1       21.0       57.1       18.0       0.8
 
Oregon                 271.4     1.0      2.2       23.8       54.2       19.2       0.6
 
Connecticut            269.9     1.0      3.2       25.3       50.7       20.1       0.7
 
New Jersey             269.7     1.1      2.4       26.9       50.2       19.7       0.8
 
Colorado (NAEP)        267.4     0.9      2.8       26.5       54.7       15.7       0.4
 
SLOVENIA               267.3     1.3      1.6       25.7       60.2       12.2       0.4
 
Indiana                267.3     1.2      2         28.2       53.9       15.4       0.5
 
Pennsylvania           266.4     1.6      3.2       27.5       53.0       15.8       0.5
 
Michigan               264.4     1.2      3.1       30.1       51.7       14.5       0.6
 
Virginia               264.3     1.5      3.3       32.8       47.3       15.4       1.3
 
Colorado (IAEP)        264.2     0.7      3.1       28.8       55.4       12.4       0.4
 
Ohio                   264.0     1.0      3.1       30.5       52.4       13.8       0.3
 
Oklahoma               263.2     1.3      2.8       30.8       53.8       12.5       0.2
 
SPAIN                  261.9     1.3      2.1       29.0       62.0        6.9       0.0
 
UNITED STATES(IAEP)    261.8     2.0      5.0       30.6       52.0       11.5       0.9
 
United States (NAEP)   261.8     1.4      5.0       31.5       49.0       14.0       0.5
 
New York               260.8     1.4      5.9       31.4       48.0       13.9       0.8
 
Maryland               260.8     1.4      5.7       33.1       45.3       15.3       0.6
 
Delaware               260.7     0.9      4.6       34.2       47.6       13.0       0.6
 
Illinois               260.6     1.7      5.7       31.4       49.1       13.4       0.5
 
Rhode Island           260.0     0.6      5.0       34.0       47.3       13.5       0.3
 
Arizona                259.6     1.3      4.5       33.8       49.7       11.7       0.4
 
Georgia                258.9     1.3      5.3       35.2       46.5       12.5       0.6
 
Texas                  258.2     1.4      4.8       36.4       46.7       11.7       0.4
 
Kentucky               257.1     1.2      3.9       38.2       47.9        9.8       0.2
 
New Mexico             256.4     0.7      4.3       38.2       47.7        9.6       0.3
 
California             256.3     1.3      6.9       35.9       45.2       11.5       0.4
 
Arkansas               256.2     0.9      4.6       37.3       49.4        8.6       0.1
 
West Virginia          255.9     1.0      4.3       38.7       48.4        8.5       0.2
 
Florida                255.3     1.3      6.6       37.7       44.3       11.2       0.2
 
Alabama                252.9     1.1      6.2       40.5       44.8        8.3       0.3
 
Hawaii                 251.0     0.8      9.9       39.2       39.8       10.6       0.5
 
North Carolina         250.4     1.1      7.9       41.2       42.6        8.1       0.0
 
Louisiana              246.4     1.2      8.2       46.1       40.6        4.9       0.2
 
JORDAN                 236.1     1.9     16.0       48.3       32.6        3.1       0.0
 
District of Columbia   231.4     0.9     16.7       56.9       23.6        2.5       0.3
 
-----------------------------------------------------------------------------------------
 
    

Countries and states are sorted from high to low based on their mean proficiency scores. Colorado participated in both the NAEP Trial State Assessment and, separately, in the International Assessment of Educational Progress.

SOURCE: Albert E. Beaton and Eugenio J. Gonzalez, "Comparing the NAEP Trial State Assessment Results with the IAEP International Results," in Setting Performance Standards for Student Achievement: Background Studies (Stanford, CA: National Academy of Education, 1993).

horizontal rule

Footnotes

(4)   For the NAEP and the IAEP IRT scales, conventional individual scale scores are not generated. Instead, the scaling process generates a set of five "plausible values" for each student. The five plausible values reported for each student can be viewed as draws from a distribution of potential scale scores consistent with the student's observed responses on the test and the student's measured background characteristics. In other words, the plausible values are constructed to have a mean and variance consistent with the underlying true population values. In this sense, the plausible values correct for unreliability. See Mislevy, Beaton, Kaplan, and Sheehan, 1992          . . . return to section

(5)    The actual procedure used by Pashley and Phillips was somewhat more complex than the method described in the text. Five regressions were estimated, one for each pair of IAEP and NAEP plausible values (see the previous footnote). Given the sample sizes involved, the regression parameters produced by the five regressions differ only marginally.
. . . return to section

(6)   The regression parameters shown in the table are based on an approximate analysis using the reported correlation between the IAEP and the NAEP total mathematics score (r = .825), as well as the mean and the standard deviation of the IAEP and the NAEP in the linking sample, averaging across the five sets of plausible values. The results obtained by averaging in this way differ only slightly from the method used by Pashley and Phillips, based on separate regressions for each of the five plausible-value pairs. See the previous two footnotes.           . . . return to section

(7)    In the method as implemented by Pashley and Phillips, the five regression equations were each used to obtain predicted NAEP scores at the individual level; and the results were averaged to produce country means. The results are very similar to those that are obtained using the somewhat simpler method discussed in the text.           . . . return to section

(8)   Like Pashley and Phillips, Beaton and Gonzalez carried out their procedure separately for each of the five sets of plausible values; and they then averaged the results obtained for each set. The results differ only slightly when their procedure is carried out once using published estimates of means and standard deviations. .           . . . return to section

(9)   The 1990 NAEP mathematics results were rescaled in 1992, producing slightly different scale scores. Beaton and Gonzalez used the 1992 rescaling. .           . . . return to section

(10)   The simple regression coefficient required for the projection method can be expressed as rsy/sx, where r is the correlation between the IAEP and the NAEP, sy is the standard deviation of the NAEP, and sx is the standard deviation of the IAEP. The conversion coefficient required for the moderation method is simply sy/sx. .           . . . return to section

(11)  Given the data required, it is possible to develop moderation estimates similar to those developed by Beaton and Gonzalez for several different samples. But because the Pashley and Phillips projection method requires paired IAEP and NAEP data, the linking sample is the only data set in which it currently can be applied. .           . . . return to section

(12)  As discussed in footnotes 4-7 above, Beaton and Gonzalez based their estimates on the full set of individual-level plausible values for each country. We developed the estimates in Tables S21 and S22 based only on the reported country means and standard deviations based on the plausible values. These results differ only slightly from those that would be obtained using the full set of plausible values. .           . . . return to section

(13)  The interpretation of the predicted NAEP scores based on the moderation method is complicated by the fact that the IAEP sample used to develop the conversion constants included students in both public and private schools, while the NAEP sample included only public school students. Since the NAEP results for the full sample of eighth graders including both public and private students differ only modestly from the results for the sample including only public students, this problem probably accounts for relatively little of the difference in predicted outcomes for the projection and moderation approaches.
. . . return to section

(14)  The plausible values generated for the IAEP and NAEP are designed to reflect the true population mean and variance; but correlations among plausible values are attenuated due to unreliability. .           . . . return to section

(15)  Since the IAEP and NAEP plausible values are designed to produce unbiased estimates of population variance, moderation methods that make use of the plausible values should not be sensitive to measurement error.
. . . return to section

(16)  To obtain valid NAEP scores in countries outside the United States, language and other issues would of course need to be taken into account. .           . . . return to section


-###-

 

 

 

 

Using the above Beaton-Gonzalez crosslink study and published NAEP math scores for other schools produces the following more  complete table:

 

          Mean       SE    | 350  
-----------------------------------------------------------------------------------------  
Asians Maryland        306  
Whites Washington, DC  303  
Texas religious        301  
Washington religious   299  
TAIWAN                 296.7     1.5      3.2       13.4       33.9       36.6      12.9  
North Dakota religious 296  
KOREA                  294.1     1.3      1.9       10.3       41.8       39.3       6.7  
SOVIET UNION           287.6     1.5      0.8       10.4       53.1       34.0       1.7  
SWITZERLAND            287.5     1.9      0.2        8.8       57.9       32.2       0.9  
Montana Whites         287  
HUNGARY                284.8     1.4      1.4       13.5       52.6       29.9       2.7  
Whites in DOD schools  284  
California religious   284  
North Dakota           281.1     1.2      0.8       13.2       60.0       24.8       1.3  
Whites national        281  
Montana                280.5     0.9      0.5       14.3       59.5       24.9       0.8  
Virginia Whites        279  
FRANCE                 278.1     1.3      1.4       16.8       57.5       23.4       1.0  
Iowa                   278.0     1.1      0.6       18.3       57.0       23.3       0.7  
ISRAEL                 276.8     1.3      1.5       15.6       61.6       20.7       0.6  
ITALY                  276.3     1.4      1.6       18.1       57.7       22.0       0.5  
Nebraska               275.7     1.0      2.0       18.6       56.2       22.4       0.9  
Minnesota              275.4     0.9      1.6       19.2       57.0       21.2       1.1  
Wisconsin              274.5     1.3      1.5       20.8       55.4       21.6       0.7  
CANADA                 274.0     1.0      1.4       17.6       63.7       16.7       0.7  
New Hampshire          273.2     0.9      1.4       21.2       58.1       18.9       0.5  
SCOTLAND               272.4     1.5      1.6       20.6       59.7       17.7       0.4  
Wyoming                272.2     0.7      1.1       20.9       60.3       17.4       0.2  
Idaho                  271.5     0.8      1.2       22.1       59.7       16.8       0.2  
IRELAND                271.4     1.4      3.1       21.0       57.1       18.0       0.8  
Oregon                 271.4     1.0      2.2       23.8       54.2       19.2       0.6  
Connecticut            269.9     1.0      3.2       25.3       50.7       20.1       0.7  
New Jersey             269.7     1.1      2.4       26.9       50.2       19.7       0.8  
Colorado (NAEP)        267.4     0.9      2.8       26.5       54.7       15.7       0.4  
SLOVENIA               267.3     1.3      1.6       25.7       60.2       12.2       0.4  
Indiana                267.3     1.2      2         28.2       53.9       15.4       0.5  
Pennsylvania           266.4     1.6      3.2       27.5       53.0       15.8       0.5  
Michigan               264.4     1.2      3.1       30.1       51.7       14.5       0.6  
Virginia               264.3     1.5      3.3       32.8       47.3       15.4       1.3  
Colorado (IAEP)        264.2     0.7      3.1       28.8       55.4       12.4       0.4  
Ohio                   264.0     1.0      3.1       30.5       52.4       13.8       0.3  
Oklahoma               263.2     1.3      2.8       30.8       53.8       12.5       0.2  
SPAIN                  261.9     1.3      2.1       29.0       62.0        6.9       0.0  
UNITED STATES(IAEP)    261.8     2.0      5.0       30.6       52.0       11.5       0.9  
United States (NAEP)   261.8     1.4      5.0       31.5       49.0       14.0       0.5  
New York               260.8     1.4      5.9       31.4       48.0       13.9       0.8  
Maryland               260.8     1.4      5.7       33.1       45.3       15.3       0.6  
Delaware               260.7     0.9      4.6       34.2       47.6       13.0       0.6  
Illinois               260.6     1.7      5.7       31.4       49.1       13.4       0.5  
Rhode Island           260.0     0.6      5.0       34.0       47.3       13.5       0.3  
Arizona                259.6     1.3      4.5       33.8       49.7       11.7       0.4  
Georgia                258.9     1.3      5.3       35.2       46.5       12.5       0.6  
Texas                  258.2     1.4      4.8       36.4       46.7       11.7       0.4  
Kentucky               257.1     1.2      3.9       38.2       47.9        9.8       0.2  
New Mexico             256.4     0.7      4.3       38.2       47.7        9.6       0.3  
California             256.3     1.3      6.9       35.9       45.2       11.5       0.4  
Arkansas               256.2     0.9      4.6       37.3       49.4        8.6       0.1  
West Virginia          255.9     1.0      4.3       38.7       48.4        8.5       0.2  
Florida                255.3     1.3      6.6       37.7       44.3       11.2       0.2  
Alabama                252.9     1.1      6.2       40.5       44.8        8.3       0.3  
Hawaii                 251.0     0.8      9.9       39.2       39.8       10.6       0.5  
North Carolina         250.4     1.1      7.9       41.2       42.6        8.1       0.0  
Latinos national       250  
Louisiana              246.4     1.2      8.2       46.1       40.6        4.9       0.2  
JORDAN                 236.1     1.9     16.0       48.3       32.6        3.1       0.0  
District of Columbia   231.4     0.9     16.7       56.9       23.6        2.5       0.3  
U.S. blacks            209
Wash DC blacks         206 

bullet

Equating entails creating a common scale for two or more tests that are based on the same blueprint (such as two or more tests employing common item specifications). Equating is appropriate when tests share the same underlying conception of achievement, employ similar items, and are equally reliable. When tests have been equated, they can be used interchangeably.

bullet

Calibration is a process of linking tests that measure the same dimensions of achievement but differ in reliability. When tests are calibrated, individuals receiving the same scores on the two tests have the same expected achievement, but, since calibrated scores are based on tests that differ in reliability, they cannot be used interchangeably for all purposes. For example, differences in reliability need to be taken into account in using calibrated scores to estimate the population standard deviation.

bullet

Projection, which can be used when the assumptions underlying equating or calibration are not met, involves linking scores on tests that measure different dimensions of achievement. To the extent that performance on one test is correlated with performance on a second, the scores on the first test can be used to predict scores on the second, even if the two tests measure relatively distinct competencies. Because the projection method requires an estimate of the correlation between the scores on the two tests involved, the method requires a sample of individuals who have been given both tests. The adequacy of the projection approach to linking tests depends on the strength of the correlation between the tests involved, as well as on the extent to which the sample employed to estimate the prediction equation contains individuals with characteristics similar to those for which the predicted scores will be used. The linking sample needs to provide a good description of the relationship between the two tests involved but does not need to be a strict random sample of the population.

bullet

Finally, moderation is a process in which scores from two or more tests that measure different things are aligned so that performance levels that are judged to be of comparable value or worth on the tests are given equal scores. One common moderation strategy involves rescaling scores to produce a common mean and standard deviation on the two tests. This approach rests on the belief that individuals who score at the same distance from the mean on the two tests (as measured in standard deviation units) have achieved similar levels of performance. Fundamentally, moderation is a method of placing tests that measure different constructs on a common metric. Moderation makes it possible to compare scores on two tests, but tests that have been moderated cannot be used interchangeably.