
|
Equating entails creating a common scale for two or more tests
that are based on the same blueprint (such as two or more tests employing
common item specifications). Equating is appropriate when tests share the
same underlying conception of achievement, employ similar items, and are
equally reliable. When tests have been equated, they can be used
interchangeably.
|

|
Calibration is a process of linking tests that measure the same
dimensions of achievement but differ in reliability. When tests are
calibrated, individuals receiving the same scores on the two tests have the
same expected achievement, but, since calibrated scores are based on tests
that differ in reliability, they cannot be used interchangeably for all purposes.
For example, differences in reliability need to be taken into account in
using calibrated scores to estimate the population standard deviation.
|

|
Projection, which can be used when the assumptions underlying
equating or calibration are not met, involves linking scores on tests that
measure different dimensions of achievement. To the extent that performance
on one test is correlated with performance on a second, the scores on the
first test can be used to predict scores on the second, even if the two tests
measure relatively distinct competencies. Because the projection method
requires an estimate of the correlation between the scores on the two tests
involved, the method requires a sample of individuals who have been given both
tests. The adequacy of the projection approach to linking tests depends on
the strength of the correlation between the tests involved, as well as on the
extent to which the sample employed to estimate the prediction equation
contains individuals with characteristics similar to those for which the
predicted scores will be used. The linking sample needs to provide a good
description of the relationship between the two tests involved but does not
need to be a strict random sample of the population.
|

|
Finally, moderation is
a process in which scores from two or more tests that measure different
things are aligned so that performance levels that are judged to be of
comparable value or worth on the tests are given equal scores. One common
moderation strategy involves rescaling scores to produce a common mean and
standard deviation on the two tests. This approach rests on the belief that
individuals who score at the same distance from the mean on the two tests (as
measured in standard deviation units) have achieved similar levels of
performance. Fundamentally, moderation is a method of placing tests that
measure different constructs on a common metric. Moderation makes it possible
to compare scores on two tests, but tests that have been moderated cannot be
used interchangeably.
|
choice
of an appropriate strategy to use in linking the IAEP and the NAEP depends on
the degree to which the two tests measure the same constructs in the same ways.
Overall, the IAEP and NAEP have a number of similarities and differences. The
IAEP curriculum framework was adapted from the framework used for the NAEP, and
the two tests contain similar (but not identical) items and were administered
using similar procedures. In addition, both tests have been scaled using item
response theory (IRT) methods. (4)
At
the same time, the two tests also differ in a number of ways, most notably in
that the IAEP was explicitly designed to be administered in countries that
differ in language, curriculum and instructional practice, while the NAEP was
not. In addition, the tests differ in length. In the IAEP mathematics
assessment, one common form of the test was administered to all 13-year-olds.
The form included 76 items and students were given 60 minutes to complete the
assessment (not including time for background questions). In the NAEP
mathematics assessment, 26 different test booklets were prepared, each
containing a somewhat different number of items, and each sampled student completed
one booklet. A typical NAEP booklet included about 60 items, and students were
given 45 minutes to complete the assessment (not including time for background
questions). Because the IAEP was somewhat longer than the NAEP, the IAEP may
provide somewhat more reliable individual-level scores.
Given
the similarities and differences among the tests, it would be plausible to
consider linking the tests through a process of calibration, projection, or
moderation. Because the IAEP and NAEP tests differ in the detailed curriculum
frameworks employed as well as in reliability, we chose a form of projection to
predict NAEP scores from IAEP scores.
The
projected
NAEP scores reported for Indicator 25
are based on analyses conducted by Pashley and Phillips (1993) and Pashley,
Lewis, and Yan (1994). In developing their estimates, Pashley and Phillips
relied on data collected in a "linking study," in which both the IAEP
and NAEP instruments were administered to a sample of 1,609 U. S. students who
were in eighth grade or thirteen years old in the spring of 1992. Pashley and
Phillips used the linking study data to estimate a linear regression model
predicting a student's NAEP score on the basis of his or her IAEP score. (5)
(See Table S21, row A, for the estimated
coefficients.) (6) They then used the regression
equation to develop predicted NAEP scores for the students in the IAEP sample
in each participating country. (7) Using
the predicted scores, Pashley and Phillips obtained various statistics,
including the means and percentile scores for the nations presented in
Indicator 25. (Table S22, column A, provides the projected NAEP-scale means
Pashley and Phillips obtained for each IAEP country.)
Table S21 Sensitivity of parameters used to link mean IAEP
scores for countries to the NAEP scale to data source and method
--------------------------------------------------------------------------------------------
Additional NAEP
Projected NAEP score points per IAEP
Samples used Method at (IAEP = 500) point above 500
--------------------------------------------------------------------------------------------
A (IAEP cross-linking sample) Projection 265 0.44
B (IAEP cross-linking sample) Moderation 263 0.53
C (IAEP and 1990 NAEP Trial State Moderation 264 0.69
Assessment in public schools)
D (IAEP and 1992 NAEP Trial State Moderation 270 0.72
Assessment in public schools)
--------------------------------------------------------------------------------------------
and SOURCE: The IAEP scale range is from 0 to 1000; the NAEP scale range is
from 0 to 500. Parameters in this table were calculated using information on
the means and standard deviation of scores in each sample and, for line A, the
correlation of the scores in the cross-linking sample. Pashley and Phillips
(1993) used the sample and method of line A. Beaton and Gonzales (1993) used
the samples and method of line C.
Table S22 Alternative projections of country mean IAEP scores
onto the NAEP scale, by country
----------------------------------------------------------------------
Samples and Method |Difference in projections
----------------------------------------------------------------------
Country A B C D (B - A) (C - A) (D - A)
----------------------------------------------------------------------
Taiwan 285 287 297 303 2 12 6
Korea 283 286 294 301 3 11 7
Switzerland1 270 281 288 294 2 9 6
Soviet Union2 279 281 288 294 2 9 7
Hungary 277 279 285 291 2 8 6
France 273 274 278 284 1 5 6
Emilia Romagna, Italy3 272 272 276 283 0 4 6
Israel4 272 272 277 283 0 5 6
Canada5 270 270 274 280 0 4 6
Scotland 269 270 272 279 1 3 6
Ireland 269 268 271 277 -1 2 6
Slovenia 266 265 267 273 -1 1 6
Spain6 263 261 262 267 -2 -1 5
United States7 262 260 262 266 -2 0 4
Jordan 246 241 236 240 -5 -10 4
----------------------------------------------------------------------
out of 26 cantons.
2
Fourteen out of 15 republics; Russian-speaking schools only.
3 Combined
school and student participation rate is below .80 but at least .70. Interpret
with caution due to possible nonresponse bias.
4
Hebrew-speaking schools only.
5
Nine out of 10 provinces.
6 All
regions except Catalu�a; Spanish-speaking schools only.
7
Eighth-graders took the test and not all were 13 years old.
Samples and Method
A.
Cross-linking sample and projection method
B.
Cross-linking sample and moderation method
C.
IAEP and NAEP 1990 public school samples and moderation method
D.
IAEP and NAEP 1992 public school samples and moderation method
Difference in
projections
(B -
A) Moderation versus projection in same (cross-linking) sample
(C -
A) Moderation and 1990 NAEP/IAEP samples versus projection and cross-linking
sample
(D -
A) 1992 NAEP/IAEP versus 1990 NAEP/IAEP both using moderation method
NOTE
and SOURCE: Countries are sorted from high to low based on their mean scores
using sample and method A -- Cross-linking sample and projection method.
Columns B and D are from Pashley, Lewis, and Yan (1994) and Beaton and Gonzales
(1993), respectively. Both used student weighted data. Columns A and C are
based in part on tabulations produced by the IAEP Processing Centre in June
1992. It appears that these tabulations did not use student weights. For most
countries, the use of weights made little difference for estimated country mean
IAEP scores. Switzerland is an exception, due to a complex sample design used
there. Therefore, an unpublished weighted mean IAEP score of 532.36 was used
instead of the published unweighted mean of 538.75 for Switzerland.
The
most widely
discussed alternative to the projection method used by Pashley and Phillips is
a moderation method carried out by Beaton and Gonzalez (1993). Beaton and
Gonzalez based their analysis on the 1991 IAEP United States sample and the
1990 NAEP eighth grade winter public school sample. They translated IAEP scores
into NAEP scores by aligning the means and standard deviations for the two
tests. (8) Using the techniques of linear
equating, they estimated conversion constants to transform the U.S. IAEP scores
into a distribution having the same mean and standard deviation as the 1990
NAEP scores. (The conversion constants are shown in Table S21, row C.) They
then used these conversion constants to transform the IAEP scores for the
students in the IAEP samples in each participating country into equivalent NAEP
scores. (The moderated country NAEP-scale means produced by Beaton and Gonzalez
are shown in Table S22, column C. Full state and nation results for Indicator
25 using the Beaton and Gonzalez method are displayed in Table S23.)
The
projection method used to develop Indicator 25 and the moderation method used
by Beaton and Gonzalez produce somewhat different results, especially for
countries with high average IAEP scores. (See Table S22.)
For example, Korea is estimated to have a 1992 NAEP score of 283 using the
projection method employed in Indicator 25 (see column A), while it has an
estimated 1990 NAEP score of 294 using the Beaton and Gonzalez method (see
column C).
The
observed
differences in transformed scores can be attributed in
part to differences in the data sets on which Pashley and Phillips and Beaton
and Gonzalez rely in developing their estimates. The students in the
"linking study" sample used by Pashley and Phillips included both
13-year-olds and eighth graders in public and private schools. Beaton and
Gonzalez used two samples to develop their estimates: the regular 1991 U.S.
IAEP sample, and the regular winter eighth-grade 1990 NAEP administration. The
1991 United States IAEP sample on which they relied included 13-year-olds (but
not other eighth graders) in public and private schools, while the 1990 NAEP
sample included eighth graders (but not other 13-year-olds) in public schools
only. ( 9) Perhaps as a result of these
differences, the estimation samples have somewhat different distributions. Both
estimation methods are particularly sensitive to the ratio of the standard
deviations for the NAEP and IAEP. (10)
In the linking sample used to develop the projection estimates, the ratio of
the NAEP and IAEP standard deviations was about 0.53, while, for the samples
used by Beaton and Gonzalez, the ratio of standard deviations was about 0.69.
This difference in standard deviations generates predicted NAEP scores based on
the projection method that are less distant from the mean than are the
equivalent scores based on the Beaton and Gonzalez method.
To
examine
the sensitivity of the results to the samples used, we
applied the Beaton and Gonzalez method to the data in the "linking
sample" used by Pashley and Phillips. (11)
The conversion coefficient estimates are shown in Table S21, row B, and the
estimated country NAEP means are shown in Table S22, column B. (12) The estimated country means are much
closer to the projection results obtained by Pashley and Phillips (column A)
than are the Beaton and Gonzalez results obtained using the regular IAEP and
1990 winter public eighth grade samples. For example, the difference in the
projection and moderation estimates for Korea drops from 11 to 3 points.
To
explore this issue further, we applied the moderation method using one
additional NAEP data set: the 1992 public eighth grade sample. (This sample
corresponds to the sample used in the 1992 Trial State Assessment on which the
state results in Indicator 25 are based.) The conversion coefficients are
displayed in Table S21 (row D); and the moderated NAEP-scale country means are
displayed in Table S22 (column D). This sample produces country results more
extreme than do any of the other samples we tried.
These
experiments clearly indicate that different samples produce different results.
But the experiments do not indicate which sample is "best". One
advantage of the linking sample used by Pashley and Phillips is that the same
students took both the IAEP and the NAEP. Hence, the estimated conversion
coefficients are not biased by possible differences between the IAEP and NAEP
samples. But the fact that the IAEP standard deviation in the linking sample is
substantially higher than the standard deviation in the regular U.S.
administration of the IAEP, while the NAEP standard deviation in the linking
sample is similar to the regular NAEP standard deviation, may at least in part
counterbalance the other apparent advantages of the linking sample.
In
addition to the effects of the sample on coefficient estimates, several
conceptual issues should be considered in evaluating linking methods. We
briefly review three of these issues below: the age or grade-level
interpretation placed predicted test scores; the effects on coefficient
estimates of unreliability in the measures; and potential country-level
contextual effects.
First,
different
linking approaches may produce results that differ in the age or grade-level
for which the predicted scores are intended to apply. For example, since the
data used by Pashley and Phillips to derive their coefficient estimates
involved a sample of students who completed both the IAEP and the NAEP, the
predicted NAEP scores based on their coefficients should be viewed as the NAEP
scores that would be obtained by students of the same age or grade as the
students whose IAEP scores are used as predictors. Since the regular country
administration of the IAEP involved sampling 13-year-olds, the predicted NAEP
scores using the Pashley and Phillips method should be viewed as predicted NAEP
scores for 13-year-old students. The predicted NAEP scores obtained by Beaton
and Gonzalez, on the other hand, should be interpreted as the scores
13-year-olds who took the IAEP would receive if they completed the NAEP in
eighth grade. (13) Since average NAEP scores for
eighth-graders are generally somewhat higher than average scores for
13-year-olds, the approach to sample specification used by Beaton and Gonzales
is likely to produce somewhat higher scores than the approach used by Pashley
and Phillips.
Linking
methods
may also differ in their sensitivity to unreliability in
the predictor variable (in this case, the IAEP). In general, regression
estimates of the effects of variables measured with error will be biased toward
zero. Hence, projection coefficients estimated using unreliable measures are
likely to be attenuated. (14)
The effects of unreliability on conversion coefficients obtained using moderation
methods are more difficult to determine. In the special case in which the
predictor and outcome variables are measured with the same reliability, the
moderation coefficients should be roughly unbiased. (15)
Finally,
linking methods
that are based on data from a single country may not properly reflect
country-level contextual effects. Suppose, for example, that individual NAEP
and IAEP scores were obtained for a sample of students in each of n
countries. (16) Both the projection and moderation
methods rest on an assumption that the relationship between IAEP and NAEP
scores (pooling students across countries) can be expressed as a simple linear
model of the form:
estimated NAEP score = constant + slope * IAEP score
is possible, however, that country-context effects exist. One simple
specification might involve the addition of country dummies to the simple
linear model above. If the country dummies differ significantly from zero, the
within-country regression of NAEP scores on IAEP scores will not properly
produce between-country relationships. Contextual effects of this sort might
arise, for example, if the standardized test style used in the IAEP and NAEP is
quite common in some countries, but rarely used in others. Unfortunately,
without linked IAEP and NAEP data for a sample of countries, the possibility of
contextual effects cannot be ruled out.
This
brief discussion clearly indicates that different methods of linking the IAEP
and NAEP can produce different results, and further study is necessary to
determine which method is best. For this reason, Indicator 25 is labeled
"experimental."
For
more information on cross-linking and on the specific approaches used in
developing Indicator 25, see Peter J. Pashley and Gary W. Phillips, Toward
World-Class Standards: A Research Study Linking International and National
Assessments (Princeton, NJ: Educational Testing Service, June, 1993); Peter J.
Pashley, Charles Lewis and Duanli Yan, "Statistical Linking Procedures for
Deriving Point Estimates and Associated Standard Errors," paper presented
at the National Council on Measurement in Education (Princeton, NJ: Educational
Testing Service, April, 1994); Albert E. Beaton and Eugenio J. Gonzalez,
"Comparing the NAEP Trial State Assessment Results with the IAEP
International Results," Setting
Performance Standards for Student Achievement: Background Studies
(Stanford, CA: National Academy of Education, 1993); Robert J. Mislevy, Albert
E. Beaton, Bruce Kaplan, and Kathleen M. Sheehan, "Estimating Population
Characteristics from Sparse Matrix Samples of Item Responses," Journal of Educational
Measurement, Summer, 1992, vol 29, no 2, pp 133-161; and
Robert J. Mislevy, Linking
Educational Assessments: Concepts, Issues, Methods, and Prospects
(Princeton, NJ: Educational Testing Service, December, 1992).
Table S23 Mathematics
proficiency scores for 13-year-olds in countries and public school 8th-grade
students in states, calculated using the equi-percentile linking method,
according to Beaton and Gonzales, by country (1991) and state (1990)
-----------------------------------------------------------------------------------------
| Percent of population
| in each proficiency score range
-----------------------------------------------------------------------------------------
COUNTRY/State Mean SE | <200 200-250 250-300 300-350>350
-----------------------------------------------------------------------------------------
TAIWAN 296.7 1.5 3.2 13.4 33.9 36.6 12.9
KOREA 294.1 1.3 1.9 10.3 41.8 39.3 6.7
SOVIET UNION 287.6 1.5 0.8 10.4 53.1 34.0 1.7
SWITZERLAND 287.5 1.9 0.2 8.8 57.9 32.2 0.9
HUNGARY 284.8 1.4 1.4 13.5 52.6 29.9 2.7
North Dakota 281.1 1.2 0.8 13.2 60.0 24.8 1.3
Montana 280.5 0.9 0.5 14.3 59.5 24.9 0.8
FRANCE 278.1 1.3 1.4 16.8 57.5 23.4 1.0
Iowa 278.0 1.1 0.6 18.3 57.0 23.3 0.7
ISRAEL 276.8 1.3 1.5 15.6 61.6 20.7 0.6
ITALY 276.3 1.4 1.6 18.1 57.7 22.0 0.5
Nebraska 275.7 1.0 2.0 18.6 56.2 22.4 0.9
Minnesota 275.4 0.9 1.6 19.2 57.0 21.2 1.1
Wisconsin 274.5 1.3 1.5 20.8 55.4 21.6 0.7
CANADA 274.0 1.0 1.4 17.6 63.7 16.7 0.7
New Hampshire 273.2 0.9 1.4 21.2 58.1 18.9 0.5
SCOTLAND 272.4 1.5 1.6 20.6 59.7 17.7 0.4
Wyoming 272.2 0.7 1.1 20.9 60.3 17.4 0.2
Idaho 271.5 0.8 1.2 22.1 59.7 16.8 0.2
IRELAND 271.4 1.4 3.1 21.0 57.1 18.0 0.8
Oregon 271.4 1.0 2.2 23.8 54.2 19.2 0.6
Connecticut 269.9 1.0 3.2 25.3 50.7 20.1 0.7
New Jersey 269.7 1.1 2.4 26.9 50.2 19.7 0.8
Colorado (NAEP) 267.4 0.9 2.8 26.5 54.7 15.7 0.4
SLOVENIA 267.3 1.3 1.6 25.7 60.2 12.2 0.4
Indiana 267.3 1.2 2 28.2 53.9 15.4 0.5
Pennsylvania 266.4 1.6 3.2 27.5 53.0 15.8 0.5
Michigan 264.4 1.2 3.1 30.1 51.7 14.5 0.6
Virginia 264.3 1.5 3.3 32.8 47.3 15.4 1.3
Colorado (IAEP) 264.2 0.7 3.1 28.8 55.4 12.4 0.4
Ohio 264.0 1.0 3.1 30.5 52.4 13.8 0.3
Oklahoma 263.2 1.3 2.8 30.8 53.8 12.5 0.2
SPAIN 261.9 1.3 2.1 29.0 62.0 6.9 0.0
UNITED STATES(IAEP) 261.8 2.0 5.0 30.6 52.0 11.5 0.9
United States (NAEP) 261.8 1.4 5.0 31.5 49.0 14.0 0.5
New York 260.8 1.4 5.9 31.4 48.0 13.9 0.8
Maryland 260.8 1.4 5.7 33.1 45.3 15.3 0.6
Delaware 260.7 0.9 4.6 34.2 47.6 13.0 0.6
Illinois 260.6 1.7 5.7 31.4 49.1 13.4 0.5
Rhode Island 260.0 0.6 5.0 34.0 47.3 13.5 0.3
Arizona 259.6 1.3 4.5 33.8 49.7 11.7 0.4
Georgia 258.9 1.3 5.3 35.2 46.5 12.5 0.6
Texas 258.2 1.4 4.8 36.4 46.7 11.7 0.4
Kentucky 257.1 1.2 3.9 38.2 47.9 9.8 0.2
New Mexico 256.4 0.7 4.3 38.2 47.7 9.6 0.3
California 256.3 1.3 6.9 35.9 45.2 11.5 0.4
Arkansas 256.2 0.9 4.6 37.3 49.4 8.6 0.1
West Virginia 255.9 1.0 4.3 38.7 48.4 8.5 0.2
Florida 255.3 1.3 6.6 37.7 44.3 11.2 0.2
Alabama 252.9 1.1 6.2 40.5 44.8 8.3 0.3
Hawaii 251.0 0.8 9.9 39.2 39.8 10.6 0.5
North Carolina 250.4 1.1 7.9 41.2 42.6 8.1 0.0
Louisiana 246.4 1.2 8.2 46.1 40.6 4.9 0.2
JORDAN 236.1 1.9 16.0 48.3 32.6 3.1 0.0
District of Columbia 231.4 0.9 16.7 56.9 23.6 2.5 0.3
-----------------------------------------------------------------------------------------
Countries and states are sorted from high to low based on their mean
proficiency scores. Colorado participated in both the NAEP Trial State
Assessment and, separately, in the International Assessment of Educational
Progress.
SOURCE:
Albert E. Beaton and Eugenio J. Gonzalez, "Comparing the NAEP Trial State
Assessment Results with the IAEP International Results," in Setting Performance Standards
for Student Achievement: Background Studies (Stanford, CA:
National Academy of Education, 1993).

Footnotes
(4)
For the NAEP and
the IAEP IRT scales, conventional individual scale scores are not generated.
Instead, the scaling process generates a set of five "plausible
values" for each student. The five plausible values reported for each
student can be viewed as draws from a distribution of potential scale scores
consistent with the student's observed responses on the test and the student's
measured background characteristics. In other words, the plausible values are
constructed to have a mean and variance consistent with the underlying true
population values. In this sense, the plausible values correct for
unreliability. See Mislevy, Beaton, Kaplan, and Sheehan, 1992
. . .
return to section
(5)
The actual
procedure used by Pashley and Phillips was somewhat more complex than the
method described in the text. Five regressions were estimated, one for each
pair of IAEP and NAEP plausible values (see the previous footnote). Given the
sample sizes involved, the regression parameters produced by the five
regressions differ only marginally.
. . . return to section
(6)
The regression
parameters shown in the table are based on an approximate analysis using the
reported correlation between the IAEP and the NAEP total mathematics score (r =
.825), as well as the mean and the standard deviation of the IAEP and the NAEP
in the linking sample, averaging across the five sets of plausible values. The
results obtained by averaging in this way differ only slightly from the method
used by Pashley and Phillips, based on separate regressions for each of the
five plausible-value pairs. See the previous two footnotes.
. . .
return to section
(7)
In the method as
implemented by Pashley and Phillips, the five regression equations were each
used to obtain predicted NAEP scores at the individual level; and the results
were averaged to produce country means. The results are very similar to those
that are obtained using the somewhat simpler method discussed in the text.
. . . return to section
(8)
Like Pashley and
Phillips, Beaton and Gonzalez carried out their procedure separately for each
of the five sets of plausible values; and they then averaged the results
obtained for each set. The results differ only slightly when their procedure is
carried out once using published estimates of means and standard deviations. .
. . . return to section
(9)
The 1990 NAEP
mathematics results were rescaled in 1992, producing slightly different scale
scores. Beaton and Gonzalez used the 1992 rescaling. .
. . . return
to section
(10)
The simple
regression coefficient required for the projection method can be expressed as
rsy/sx, where r is the correlation between the IAEP and the NAEP, sy is the
standard deviation of the NAEP, and sx is the standard deviation of the IAEP.
The conversion coefficient required for the moderation method is simply sy/sx.
. . . . return to section
(11) Given the data required, it is
possible to develop moderation estimates similar to those developed by Beaton
and Gonzalez for several different samples. But because the Pashley and
Phillips projection method requires paired IAEP and NAEP data, the linking
sample is the only data set in which it currently can be applied. .
. . . return to section
(12) As discussed in footnotes 4-7 above,
Beaton and Gonzalez based their estimates on the full set of individual-level
plausible values for each country. We developed the estimates in Tables S21 and
S22 based only on the reported country means and standard deviations based on
the plausible values. These results differ only slightly from those that would
be obtained using the full set of plausible values. .
. . .
return to section
(13) The interpretation of the predicted
NAEP scores based on the moderation method is complicated by the fact that the
IAEP sample used to develop the conversion constants included students in both
public and private schools, while the NAEP sample included only public school
students. Since the NAEP results for the full sample of eighth graders
including both public and private students differ only modestly from the
results for the sample including only public students, this problem probably
accounts for relatively little of the difference in predicted outcomes for the
projection and moderation approaches.
. . . return to section
(14) The plausible values generated for the
IAEP and NAEP are designed to reflect the true population mean and variance;
but correlations among plausible values are attenuated due to unreliability. .
. . . return to section
(15) Since the IAEP and NAEP plausible
values are designed to produce unbiased estimates of population variance,
moderation methods that make use of the plausible values should not be
sensitive to measurement error.
. . . return to section
(16) To obtain valid NAEP scores in
countries outside the United States, language and other issues would of course
need to be taken into account. . . . . return to section
-###-
Using
the above Beaton-Gonzalez crosslink study and published NAEP math scores for
other schools produces the following more complete table:
Mean SE | 350
-----------------------------------------------------------------------------------------
Asians Maryland 306
Whites Washington, DC 303
Texas religious 301
Washington religious 299
TAIWAN 296.7 1.5 3.2 13.4 33.9 36.6 12.9
North Dakota religious 296
KOREA 294.1 1.3 1.9 10.3 41.8 39.3 6.7
SOVIET UNION 287.6 1.5 0.8 10.4 53.1 34.0 1.7
SWITZERLAND 287.5 1.9 0.2 8.8 57.9 32.2 0.9
Montana Whites 287
HUNGARY 284.8 1.4 1.4 13.5 52.6 29.9 2.7
Whites in DOD schools 284
California religious 284
North Dakota 281.1 1.2 0.8 13.2 60.0 24.8 1.3
Whites national 281
Montana 280.5 0.9 0.5 14.3 59.5 24.9 0.8
Virginia Whites 279
FRANCE 278.1 1.3 1.4 16.8 57.5 23.4 1.0
Iowa 278.0 1.1 0.6 18.3 57.0 23.3 0.7
ISRAEL 276.8 1.3 1.5 15.6 61.6 20.7 0.6
ITALY 276.3 1.4 1.6 18.1 57.7 22.0 0.5
Nebraska 275.7 1.0 2.0 18.6 56.2 22.4 0.9
Minnesota 275.4 0.9 1.6 19.2 57.0 21.2 1.1
Wisconsin 274.5 1.3 1.5 20.8 55.4 21.6 0.7
CANADA 274.0 1.0 1.4 17.6 63.7 16.7 0.7
New Hampshire 273.2 0.9 1.4 21.2 58.1 18.9 0.5
SCOTLAND 272.4 1.5 1.6 20.6 59.7 17.7 0.4
Wyoming 272.2 0.7 1.1 20.9 60.3 17.4 0.2
Idaho 271.5 0.8 1.2 22.1 59.7 16.8 0.2
IRELAND 271.4 1.4 3.1 21.0 57.1 18.0 0.8
Oregon 271.4 1.0 2.2 23.8 54.2 19.2 0.6
Connecticut 269.9 1.0 3.2 25.3 50.7 20.1 0.7
New Jersey 269.7 1.1 2.4 26.9 50.2 19.7 0.8
Colorado (NAEP) 267.4 0.9 2.8 26.5 54.7 15.7 0.4
SLOVENIA 267.3 1.3 1.6 25.7 60.2 12.2 0.4
Indiana 267.3 1.2 2 28.2 53.9 15.4 0.5
Pennsylvania 266.4 1.6 3.2 27.5 53.0 15.8 0.5
Michigan 264.4 1.2 3.1 30.1 51.7 14.5 0.6
Virginia 264.3 1.5 3.3 32.8 47.3 15.4 1.3
Colorado (IAEP) 264.2 0.7 3.1 28.8 55.4 12.4 0.4
Ohio 264.0 1.0 3.1 30.5 52.4 13.8 0.3
Oklahoma 263.2 1.3 2.8 30.8 53.8 12.5 0.2
SPAIN 261.9 1.3 2.1 29.0 62.0 6.9 0.0
UNITED STATES(IAEP) 261.8 2.0 5.0 30.6 52.0 11.5 0.9
United States (NAEP) 261.8 1.4 5.0 31.5 49.0 14.0 0.5
New York 260.8 1.4 5.9 31.4 48.0 13.9 0.8
Maryland 260.8 1.4 5.7 33.1 45.3 15.3 0.6
Delaware 260.7 0.9 4.6 34.2 47.6 13.0 0.6
Illinois 260.6 1.7 5.7 31.4 49.1 13.4 0.5
Rhode Island 260.0 0.6 5.0 34.0 47.3 13.5 0.3
Arizona 259.6 1.3 4.5 33.8 49.7 11.7 0.4
Georgia 258.9 1.3 5.3 35.2 46.5 12.5 0.6
Texas 258.2 1.4 4.8 36.4 46.7 11.7 0.4
Kentucky 257.1 1.2 3.9 38.2 47.9 9.8 0.2
New Mexico 256.4 0.7 4.3 38.2 47.7 9.6 0.3
California 256.3 1.3 6.9 35.9 45.2 11.5 0.4
Arkansas 256.2 0.9 4.6 37.3 49.4 8.6 0.1
West Virginia 255.9 1.0 4.3 38.7 48.4 8.5 0.2
Florida 255.3 1.3 6.6 37.7 44.3 11.2 0.2
Alabama 252.9 1.1 6.2 40.5 44.8 8.3 0.3
Hawaii 251.0 0.8 9.9 39.2 39.8 10.6 0.5
North Carolina 250.4 1.1 7.9 41.2 42.6 8.1 0.0
Latinos national 250
Louisiana 246.4 1.2 8.2 46.1 40.6 4.9 0.2
JORDAN 236.1 1.9 16.0 48.3 32.6 3.1 0.0
District of Columbia 231.4 0.9 16.7 56.9 23.6 2.5 0.3
U.S. blacks 209
Wash DC blacks 206