AIMSweb

Area: Oral Reading Fluency (R-CBM)

Cost

Technology, Human Resources, and Accommodations for Special Needs

Service and Support

Purpose and Other Implementation Information

Usage and Reporting

Reading-CBM is included in a subscription to AIMSweb Pro Reading, AIMSweb Pro Language Arts, or AIMSweb Pro Complete which range from $4.00 to $6.00 per student per year.

Every AIMSweb subscription provides unlimited access to the AIMSweb online system, which includes:

  • AIMSweb assessments for universal screening and progress monitoring
  • Data management and reporting
  • Browser-based scoring
  • Training manuals
  • Administration and scoring manuals

Internet access is required for full use of product services.

Testers will require 1 - 2 hours of training.

Paraprofessionals can administer the test.


Alternate forms available in Spanish for benchmarking.

Pearson
19500 Bulverde Road
San Antonio, TX 78259
Phone: 866-313-6194
Visit AIMSweb.com

General Information:
866-313-6194 option 2
aimswebsales@pearson.com

Tech support:
866-313-6194 option 1 aimswebsupport@ pearson.com

Access to field tested training manuals are included with AIMSweb subscriptions which provide administration, scoring, and implementation information.
Ongoing technical support is provided. 
Professional Development opportunities are available.

As a reading screening tool, Reading-CBM is used to identify children at-risk of reading failures and those students significantly below grade-level expectations. 
As a progress monitoring tool, additional standardized, equivalent, and graded alternate forms are used to frequently measure student progress towards specific goals and monitor the effects of instructional changes. Reading R-CBM is a 1 minute individually administered standardized measure of oral reading of graded passages with 30 alternate forms available for each grade 2 through 8, and 20 alternate forms for grade 1. 

Raw score (words read correctly per minute), national percentiles (grades 1 – 12) and normative performance levels by grade and season, individual student growth percentiles by grade and season (based on rates of improvement, ROI), success probability scores (cut scores that indicate a 50% or 80% probability of passing the state test), Lexile, and error analysis scores. Local norms are also available.

Instructional links to Reading Streets and My Sidewalks, Prentice Hall Literature (grades 6 – 8), and a Common Core Lexile Report

 

Reliability of the Performance Level Score: Convincing Evidence

Type of Reliability

Age or Grade

n (range)

Coefficient

SEM

Information (including normative data) / Subjects

Range

Median

Alternate Form

1 (W,S)

1000

0.97-0.97

0.97

6.1

Calculated separately at each benchmark period (fall, winter, spring). Reliability of the median of three probe scores, based on the average inter-form correlation (see explanation below). The sample for each grade was randomly drawn from the AIMSweb 2009-2010 user database. Sample demographics: 50 % female, 50% male, 73% White, 10% African American, 10% Hispanic, 4% Asian, 3% Other.

Alternate Form

2

1000

0.97-0.97

0.97

6.4

See subject information above.

Alternate Form

3

1000

0.96-0.97

0.97

7.2

See subject information above.

Alternate Form

4

1000

0.97-0.97

0.97

6.6

See subject information above.

Alternate Form

5

1000

0.97-0.97

0.97

7.3

See subject information above.

Alternate Form

6

1000

0.96-0.97

0.97

7.6

See subject information above.

Alternate Form

7

1000

0.96-0.97

0.97

7.8

See subject information above.

Alternate Form

8

1000

0.96-0.97

0.96

6.9

See subject information above.

Inter-rater Reliability

2

61

 

0.99

 

Administrations of a single R-CBM probe were digitally recorded, and then each recording was independently scored by two different raters who did their own timing. The sample was obtained at two suburban Minnesota schools and three urban Texas schools. Demographic characteristics were similar at all four grades; for the total sample, they were: 51% female, 49% male, 65% Hispanic, 32% White, 2% African American, 1% Asian, 1% ELL, and 3% Receiving special education services.

Inter-rater Reliability

4

63

 

0.99

 

See subject information above.

Inter-rater Reliability

6

73

 

0.99

 

See subject information above.

Inter-rater Reliability

8

63

 

0.99

 

See subject information above.

Split-half

2

61

 

0.97

 

Same sample as for inter-rater study. Correlations between scores (WRC) on the first and second 30-second portions of each probe were adjusted by Spearman-Brown and then used to compute the reliability of the median of three probe scores.

Split-half

4

63

 

0.97

 

See subject information above.

Split-half

6

73

 

0.96

 

See subject information above.

Split-half

8

63

 

0.97

 

See subject information above.

Retest

1

1000

 

0.91

 

Correlations between median scores at adjacent benchmark periods (fall-winter or winter-spring; winter-spring only for grade 1). Same sample as for alternate-form reliability.

Retest

2

1000

0.93-0.94

0.93

 

See subject information above.

Retest

3

1000

0.93-0.94

0.93

 

See subject information above.

Retest

4

1000

0.94-0.95

0.94

 

See subject information above.

Retest

5

1000

0.95-0.95

0.95

 

See subject information above.

Retest

6

1000

0.95-0.95

0.95

 

See subject information above.

Retest

7

1000

0.95-0.95

0.95

 

See subject information above.

Retest

8

1000

0.95-0.96

0.95

 

See subject information above.

 

Reliability of the Slope: Convincing Evidence

Type of Reliability

Age or Grade

n (range)

Coefficient

SEM

Information (including normative data) / Subjects

Range

Median

Split-half reliability (odd & even data points)

1

7812

 

0.92

0.22

Average # of weeks per student = 27.6, range 16-46; Average # of data point per student = 19.2, range 13-30.

Split-half reliability (odd & even data points)

2

20199

 

0.78

0.27

Average # of weeks per student = 30.4, range 16-49; Average # of data point per student = 20.9, range 13-30.

Split-half reliability (odd & even data points)

3

18272

 

0.72

0.28

Average # of weeks per student = 30.4, range 16-44; Average # of data point per student = 20.6, range 13-30.

Split-half reliability (odd & even data points)

4

11598

 

0.56

0.35

Average # of weeks per student = 30.7, range 16-44; Average # of data point per student = 19.8, range 13-30.

Split-half reliability (odd & even data points)

5

9259

 

0.66

0.32

Average # of weeks per student = 30.6, range 17-44; Average # of data point per student = 19.6, range 13-30.

Split-half reliability (odd & even data points)

6

4233

 

0.63

0.33

Average # of weeks per student = 30.8, range 17-43; Average # of data point per student = 19.7, range 13-30.

Split-half reliability (odd & even data points)

7

2471

 

0.63

0.35

Average # of weeks per student = 30.9, range 17-41; Average # of data point per student = 19.3, range 13-30.

Split-half reliability (odd & even data points)

8

1960

 

0.61

0.39

Average # of weeks per student = 30.6, range 16-40; Average # of data point per student = 19.1, range 13-30.

 

Validity of the Performance Level Score: Convincing Evidence

Type of Validity

Age or Grade

Test or Criterion

n (range)

Coefficient

Information (including normative data) / Subjects

range

median

Predictive

2

PSSA

~ 200

0.69-0.71

0.71

1 year interval (Keller-Margulis et al., 2008)

Predictive

4

PSSA

~ 200

0.67-0.69

0.69

Predictive

3 (F,W)

MCA

2051

0.68-0.70

0.69

Silberglitt & Hintze (2005)

Predictive

3 (F,W)

PSSA

185

0.65-0.66

0.65

Shapiro et al. (2006)

Predictive

4 (F,W)

MAT8

213

0.71-0.72

0.71

Predictive

5 (F,W)

PSSA

185

0.68-0.69

0.68

Predictive

3 (F)

MAP

137

 

0.76

Andren (2010)

Predictive

3 (F,W)

NECAP

137

0.68-0.71

0.69

Predictive

3 (F)

NCEGT

1087

 

0.69 (0.67)

2009-2010 (see Classification Accuracy for demographics)

Predictive

4 (F)

NCEGT

1174

 

0.70 (0.65)

Predictive

5 (F)

NCEGT

1088

 

0.68 (0.66)

Predictive

6 (F)

ISAT

1326

 

0.64 (0.64)

Predictive

7 (F)

ISAT

1328

 

0.63 (0.65)

Predictive

8 (F)

ISAT

911

 

0.60 (0.62)

Predictive

3 (W)

NCEGT

1087

 

0.71 (0.70)

Predictive

4 (W)

NCEGT

1174

 

0.71 (0.67)

Predictive

5 (W)

NCEGT

1088

 

0.67 (0.66)

Predictive

6 (W)

ISAT

1326

 

0.65 (0.66)

Predictive

7 (W)

ISAT

1328

 

0.63 (0.65)

Predictive

8 (W)

ISAT

911

 

0.60 (0.62)

Construct

3

MCA

2126

 

0.71

Silberglitt & Hintze (2005)

Construct

3

PSSA

185

 

0.67

Shapiro et al. (2006)

Construct

4

MAT8

213

 

0.70

Construct

5

PSSA

206

 

0.67

Construct

2, 3, 4

MAP

71-85

0.68-0.72

0.70

Merino & Beckman (2010)

Construct

3 (F,W)

MAP

137

0.77-0.81

0.79

Andren (2010)

Construct

3 (S)

NCEGT

1087

 

0.72 (0.71)

2009-2010 (same samples as for the Classification Accuracy analyses)

Construct

4 (S)

NCEGT

1174

 

0.72 (0.68)

Construct

5 (S)

NCEGT

1088

 

0.69 (0.67)

Construct

6 (S)

ISAT

1326

 

0.64 (0.65)

Construct

7 (S)

ISAT

1328

 

0.62 (0.64)

Construct

8 (S)

ISAT

911

 

0.60 (0.62)

Content:

The passages used in AIMSweb R-CBM were developed to represent the types of narrative text that students in a particular grade typically encounter in school. The creation and refinement of the set of passages followed a careful development process documented by Howe and Shinn (2002). Key components of this process included:

  • authors with experience writing for students at various grade levels
  • specifications for readability (such as number of syllables and sentences per 100 words)
  • monitoring readability using the Fry formula and Lexile scaling
  • field testing the passages and eliminating those with low alternate-form reliability, an atypical score level, or an inappropriate readability index or Lexile score

Predictive Validity of the Slope of Improvement: Partially Convincing Evidence

Type of Validity Age or Grade Test or Criterion n (range) Coefficient Information (including normative data)/Subjects
median
Predictive Grade 3 State Exam 396 Predictive validity of slope for Total sample = 0.23. Reliability of slope based on 20 data points collected over the course of 20 weeks (i.e. one data point per week).
Predictive Grade 1 State Exam 199 Predictive validity of slope for Total sample = 0.42. Reliability of slope based on 8 data points collected over the course of 8 weeks (i.e. one data point per week).
Predictive Grade 2 State Exam 235 Predictive validity of slope for Total sample = 0.43. Reliability of slope based on 8 data points collected over the course of 8 weeks (i.e. one data point per week).

 

Disaggregated Reliability and Validity Data: Convincing Evidence

Disaggregated Reliability of the Performance Level Score:

Type of Reliability

Age or Grade

n (range)

Coefficient

SEM

Information (including normative data) / Subjects

Range

Median

Alternate Form

1 (W,S)

100

0.97-0.97

0.97

6.1

African American students in the alternate-form sample described in GOM 1. Reliability of the median of 3 probe scores administered at a benchmark testing period.

Alternate Form

2

54

0.96-0.97

0.96

6.3

See above.

Alternate Form

3

77

0.97-0.97

0.97

6.5

See above.

Alternate Form

4

51

0.97-0.97

0.97

6.6

See above.

Alternate Form

5

46

0.97-0.97

0.97

6.8

See above.

Alternate Form

6

59

0.96-0.97

0.97

8

See above.

Alternate Form

7

88

0.96-0.97

0.96

6.6

See above.

Alternate Form

8

130

0.97-0.97

0.97

6.9

See above.

Alternate Form

1 (W,S)

105

0.96-0.97

0.96

5.1

Hispanic students in the alternate-form sample described in GOM 1. Reliability of the median of 3 probe scores administered at a benchmark testing period.

Alternate Form

2

58

0.96-0.97

0.96

6.2

See above.

Alternate Form

3

68

0.97-0.97

0.97

7.7

See above.

Alternate Form

4

69

0.97-0.97

0.97

6.6

See above.

Alternate Form

5

69

0.95-0.97

0.96

7.3

See above.

Alternate Form

6

88

0.96-0.97

0.96

6.9

See above.

Alternate Form

7

63

0.97-0.97

0.97

7

See above.

Alternate Form

8

44

0.95-0.96

0.95

6.2

See above.

Alternate Form

1 (W,S)

449

0.97-0.97

0.97

6

White non-Hispanic students in the alternate-form sample described in GOM 1. Reliability of the median of 3 probe scores administered at a benchmark testing period.

Alternate Form

2

443

0.96-0.97

0.96

6.9

See above.

Alternate Form

3

511

0.96-0.97

0.96

7.5

See above.

Alternate Form

4

399

0.97-0.97

0.97

6.8

See above.

Alternate Form

5

468

0.97-0.97

0.97

7.4

See above.

Alternate Form

6

608

0.97-0.97

0.97

7.2

See above.

Alternate Form

7

487

0.96-0.97

0.97

8

See above.

Alternate Form

8

508

0.96-0.97

0.97

6.5

See above.

Alternate Form

1 (W,S)

65

0.96-0.97

0.97

5.3

ELL students in the alternate-form sample described in GOM 1. Reliability of the median of 3 probe scores administered at a benchmark testing period

Alternate Form

2

75

0.96-0.97

0.97

6.4

See above.

Alternate Form

3

74

0.96-0.97

0.97

6.5

See above.

Alternate Form

4

51

0.96-0.97

0.96

5.9

See above.

Alternate Form

5

60

0.96-0.96

0.96

6.7

See above.

Alternate Form

6

77

0.95-0.95

0.95

8

See above.

Alternate Form

7

98

0.96-0.96

0.96

7.2

See above.

Alternate Form

8

82

0.97-0.97

0.97

6.2

See above.

Alternate Form

1 (W,S)

175

0.97-0.97

0.97

5.6

Students receiving free/reduced lunch in the alternate-form sample described in GOM 1. Reliability of the median of 3 probe scores administered at a benchmark testing period.

Alternate Form

2

133

0.96-0.97

0.96

6.6

See above.

Alternate Form

3

162

0.96-0.97

0.97

6.9

See above.

Alternate Form

4

157

0.97-0.97

0.97

6.9

See above.

Alternate Form

5

181

0.96-0.97

0.96

7.6

See above.

Alternate Form

6

246

0.95-0.96

0.96

8

See above.

Alternate Form

7

139

0.97-0.97

0.97

7.4

See above.

Alternate Form

8

100

0.96-0.97

0.97

6.7

See above.

Disaggregated Validity of the Performance Level Score

Type of Validity

Age or Grade

Test or Criterion

n (range)

Coefficient

Information (including normative data) / Subjects

range

median

Predictive

3 (F)

NCEGT

201

 

0.72 (0.65)

African American students in the Classification Accuracy samples.

Predictive

4 (F)

NCEGT

246

 

0.64 (0.59)

See above.

Predictive

5 (F)

NCEGT

210

 

0.64 (0.61)

See above.

Predictive

6 (F)

ISAT

144

 

0.56 (0.57)

See above.

Predictive

7 (F)

ISAT

100

 

0.56 (0.55)

See above.

Predictive

8 (F)

ISAT

96

 

0.56 (0.55)

See above.

Predictive

3 (W)

NCEGT

201

 

0.72 (0.67)

See above.

Predictive

4 (W)

NCEGT

246

 

0.68 (0.62)

See above.

Predictive

5 (W)

NCEGT

210

 

0.66 (0.64)

See above.

Predictive

6 (W)

ISAT

144

 

0.66 (0.65)

See above.

Predictive

7 (W)

ISAT

100

 

0.55 (0.55)

See above.

Predictive

8 (W)

ISAT

96

 

0.57 (0.54)

See above.

Predictive

3 (F)

NCEGT

103

 

0.70 (0.60)

Hispanic students in the Classification Accuracy samples

Predictive

4 (F)

NCEGT

127

 

0.68 (0.58)

See above.

Predictive

5 (F)

NCEGT

81

 

0.55 (0.43)

See above.

Predictive

6 (F)

ISAT

228

 

0.67 (0.62)

See above.

Predictive

7 (F)

ISAT

211

 

0.58 (0.54)

See above.

Predictive

8 (F)

ISAT

177

 

0.64 (0.64)

See above.

Predictive

3 (W)

NCEGT

103

 

0.70 (0.61)

See above.

Predictive

4 (W)

NCEGT

127

 

0.73 (0.65)

See above.

Predictive

5 (W)

NCEGT

81

 

0.59 (0.50)

See above.

Predictive

6 (W)

ISAT

228

 

0.70 (0.66)

See above.

Predictive

7 (W)

ISAT

211

 

0.62 (0.57)

See above.

Predictive

8 (W)

ISAT

177

 

0.65 (0.65)

See above.

Construct

3 (S)

NCEGT

201

 

0.68 (0.61)

African American students in the Classification Accuracy samples

Construct

4 (S)

NCEGT

246

 

0.70 (0.64)

See above.

Construct

5 (S)

NCEGT

210

 

0.66 (0.63)

See above.

Construct

6 (S)

ISAT

144

 

0.63 (0.61)

See above.

Construct

7 (S)

ISAT

100

 

0.54 (0.54)

See above.

Construct

8 (S)

ISAT

96

 

0.64 (0.61)

See above.

Construct

3 (S)

NCEGT

103

 

0.70 (0.63)

Hispanic students in the Classification Accuracy samples

Construct

4 (S)

NCEGT

127

 

0.70 (0.63)

See above.

Construct

5 (S)

NCEGT

81

 

0.58 (0.50)

See above.

Construct

6 (S)

ISAT

228

 

0.68 (0.63)

See above.

Construct

7 (S)

ISAT

211

 

0.57 (0.56)

See above.

Construct

8 (S)

ISAT

177

 

0.61 (0.62)

See above.

Disaggregated Predictive Validity of the Slope of Improvement

Type of Validity Age or Grade Test or Criterion n (range) Coefficient Information (including normative data)/Subjects
median
Predictive Grade 3 State Exam 396 0.23 to 0.25 Predictive validity of slope for Caucasian sample = 0.25. Reliability of slope for African American sample = 0.23. Reliability of slope for Hispanic sample = 0.24

 

Alternate Forms: Convincing Evidence

1. Evidence that alternate forms are of equal and controlled difficulty or, if IRT based, evidence of item or ability invariance:

Educators constructed passages using Standard Reading Assessment Passages that produce reliable and valid judgments about general reading achievement for individual students. The data generated are sensitive to between person differences (i.e., educational need) and within-person differences (i.e., sensitivity to improvement). For grades 1-7, a stratified random sample of 25 students per grade served as the initial field-testing pool. All students read all passages. Median alternate form reliabilities across all grades was 0.86. For primer and grade 8 passages, a horizontal field testing model was used with a significantly larger sample size (369 and 183 students, respectively), where more students read fewer randomly assigned passages, with at least 30 paired comparisons. This strategy resulted in alternate-form reliabilities of 0.87 and 0.90, respectively.

2. Number of alternate forms of equal and controlled difficulty:

20 alternate forms in grade 1 and 30 alternate forms in grades 2-8.

Sensitive to Student Improvement: Convincing Evidence

1. Describe evidence that the monitoring system produces data that are sensitive to student improvement (i.e., when student learning actually occurs, student performance on the monitoring tool increases on average).

In order to assess the sensitivity of the AIMSWeb R-CBM assessment to student improvement, data from a 2-year period were analyzed for students in grades 2 (n=142), 3 (n=96), 4 (n=78), and 5 (n=65) who were receiving Tier 2 RTI supplemental instruction. The intervention for students in grades 2 and 3 consisted of intensified instruction and intervention in phonemic segmentation, alphabetic principle, decoding, encoding, word analysis, vocabulary development, sight word recognition, fluency, and comprehension (i.e., Wilson Foundations). Students in grades 4 and 5 received intensified instruction and intervention in word attack basics (e.g., letter sounds, letter combinations, short vowels), decoding strategies, regular and irregular words, high utility words, rate, and fluency (i.e., Corrective Reading). At each grade level, the average rate of improvement (ROI) of students receiving intervention was compared with the median ROI of students in general education who were not receiving supplemental intervention. Results of one-sample t-tests were statistically significant (p <.01) at each grade level. Students in grades 2 through 5 who received intervention evidenced average rates of improvement of 3.40, 4.06, 4.33, and 3.84 words read correctly per minute (WRC) per week, respectively. These ROIs were higher than the median ROIs for general-education students by between 2.20 and 3.53 WRC per week.  Results of these analyses provide evidence that the AIMSWeb R-CBM assessment measure is sensitive to validated interventions.

End-of-Year Benchmarks: Convincing Evidence

1. Are benchmarks for minimum acceptable end-of-year performance specified in your manual or published materials?

Yes.

a. Specify the end-of-year performance standards:

Customers can 1) Define their own benchmark targets based on norm tables or other data, 2) Use AIMSweb presets, which are based on the score at the 50th percentile from the AIMSweb National  Norms. 3) Use DIBELS presets, or 4) Use the AIMSweb test correlation feature to generate benchmark targets that predict success on high stakes testing.

b. Basis for specifying minimum acceptable end-of-year performance:

Norm-referenced and criterion-referenced

c. Specify the benchmarks:

d. Basis for specifying these benchmarks?

Norm-referenced/Criterion-referenced

Normative profile:

Representation: National
Date: 2001-2008
Number of States: 49 & DC
Size: 288,105

Procedure for specifying benchmarks for end-of-year performance levels:

By establishing the predictive relationship between R-CBM criterion, AIMSweb users can use the scores that predict passing with an 80% or 90% probability.

Rates of Improvement Specified: Convincing Evidence

1. Is minimum acceptable growth (slope of improvement or average weekly increase in score by grade level) specified in manual or published materials?

Yes.

a. Specify the growth standards:

Grade ROI
1 1.36
2 1.17
3 1.03
4 0.81
5 0.89
6 1.69

b. Basis for specifying minimum acceptable growth:

Criterion-referenced and norm-referenced

2. Normative profile:

Representation: National
Date: 2001-2008
Number of States: 49 & DC
Size: 288,105

3. Procedure for specifying criterion for adequate growth:

By establishing the predictive relationship between R-CBM criterion, AIMSweb users can use the scores that predicts passing with an 80% or 90% probability.

Decision Rules for Changing Instruction: Convincing Evidence

Specification of validated decision rules for when changes to instruction need to be made: The newest version of the AIMSweb online system, to be released for piloting in the fall of 2012 and made available to all users no later than the fall of 2013, applies a statistical procedure to the student’s monitoring scores in order to provide empirically-based guidance about whether the student is likely to meet, fall short of, or exceed their goal. The calculation procedure (presented below) is fully described in the AIMSweb Progress Monitoring Guide (Pearson, 2012) and can be implemented immediately by AIMSweb users if they create a spreadsheet or simple software program. Once the new AIMSweb online system is fully distributed, the user will not have to do any calculations to obtain this data-based guidance. The decision rule is based on a 75% confidence interval for the student’s predicted score at the goal date. This confidence interval is student-specific and takes into account the number and variability of monitoring scores and the duration of monitoring. Starting at the sixth week of monitoring, when there are at least four monitoring scores, the AIMSweb report following each monitoring administration includes one of the following statements: “The student is projected to not reach the goal.” This statement appears if the confidence interval is completely below the goal score. “The student is projected to exceed the goal.” This statement appears if the confidence interval is completely above the goal score. “The student is on track to reach the goal. The projected score at the goal date is between X and Y” (where X and Y are the bottom and top of the confidence interval). This statement appears if the confidence interval includes the goal score. If Statement A appears, the user has a sound basis for deciding that the current intervention is not sufficient and a change to instruction should be made. If Statement B appears, there is an empirical basis for deciding that the goal is not sufficiently challenging and should be increased. If Statement C appears, the student’s progress is not clearly different from the aimline and so there is not a compelling reason to change the intervention or the goal; however, the presentation of the confidence-interval range enables the user to see whether the goal is near the upper limit or lower limit of the range, which would signal that the student’s progress is trending below or above the goal.  A 75% confidence interval was chosen for this application because it balances the costs of the two types of decision errors. Incorrectly deciding that the goal will not be reached (when in truth it will be reached) has a moderate cost: an intervention that is working will be replaced by a different intervention. Incorrectly deciding that the goal may be reached (when in truth it will not be reached) also has a moderate cost: an ineffective intervention will be continued rather than being replaced. Because both kinds of decision errors have costs, it is appropriate to use a modest confidence level.
 
Calculation of the 75% confidence interval for the score at the goal date. Calculate the trend line. This is the ordinary least-squares regression line through the student’s monitoring scores. Calculate the projected score at the goal date. This is the value of the trend line at the goal date. Calculate the standard error of estimate (SEE) of the projected score at the goal date, using the following formula: [((1 + 1/k + (GW – mean(w)))/(k – 2))((sum(y – y’)2)/(sum(w – mean(w))2))]1/2 where k = number of completed monitoring administrations w = week number of a completed administration GW = week number of the goal date y = monitoring score y’ = predicted monitoring score at that week (from the student’s trendline). The means and sums are calculated across all of the completed monitoring administrations up to that date. Add and subtract 1.25 times the SEE to the projected score, and round to the nearest whole numbers.
 
Evidentiary basis for these decision rules: The decision rules are statistically rather than empirically based. The guidance statements that result from applying the 75% confidence interval to the projected score are correct probabilistic statements, under certain assumptions: The student’s progress can be described by a linear trend line. If the pattern of the student’s monitoring scores is obviously curvilinear, then the projected score based on a linear trend will likely be misleading. We provide training in the AIMSweb Progress Monitoring Guide about the need for users to take non-linearity into account when interpreting progress-monitoring data. The student will continue to progress at the same rate as they have been progressing to that time. This is an unavoidable assumption for a decision system based on extrapolating from past growth. Even though the rules are not derived from data, it is useful to observe how they work in a sample of real data. For this purpose, we selected random samples of students in the AIMSweb 2010-2011 database who were progress-monitored on either Reading Curriculum-Based Measurement (R-CBM) or Math Computation (M-COMP). All students scored below the 25th percentile in the fall screening administration of R-CBM or M-COMP. The R-CBM sample consisted of 1,000 students (200 each at grades 2 through 6) who had at least 30 monitoring scores, and the M-COMP sample included 500 students (100 per grade) with a minimum of 28 monitoring scores. This analysis was only a rough approximation, because we did not know each student’s actual goal or whether the intervention or goal was changed during the year. To perform the analyses, we first set an estimated goal for each student by using the ROI at the 85th percentile of AIMSweb national ROI norms to project their score at their 30th monitoring administration. Next, we defined “meeting the goal” as having a mean score on the last three administrations (e.g., the 28th through 30th administrations of R-CBM) that was at or above the goal score. At each monitoring administration for each student, we computed the projected score at the goal date and the 75% confidence interval for that score, and recorded which of the three decision statements was generated (projected not to meet goal, projected to exceed goal, or on-track/no-change).
 
In this analysis, accuracy of guidance to change (that is, accuracy of projections that the student will not reach the goal or will exceed the goal) reached a high level (80%) by about the 13th to 15th monitoring administration, on average. The percentage of students receiving guidance to not change (i.e., their trendline was not far from the aimline) would naturally tend to decrease over administrations as the size of the confidence interval decreased. At the same time, however, there was a tendency for the trendline to become closer to the aimline over time as it became more accurately estimated, and this worked to increase the percentage of students receiving the “no change” guidance.

Decision Rules for Increasing Goals: Convincing Evidence

Specification of validated decision rules for when increases in goals need to be made: The newest version of the AIMSweb online system, to be released for piloting in the fall of 2012 and made available to all users no later than the fall of 2013, applies a statistical procedure to the student’s monitoring scores in order to provide empirically-based guidance about whether the student is likely to meet, fall short of, or exceed their goal. The calculation procedure (presented below) is fully described in the AIMSweb Progress Monitoring Guide (Pearson, 2012) and can be implemented immediately by AIMSweb users if they create a spreadsheet or simple software program. Once the new AIMSweb online system is fully distributed, the user will not have to do any calculations to obtain this data-based guidance. The decision rule is based on a 75% confidence interval for the student’s predicted score at the goal date. This confidence interval is student-specific and takes into account the number and variability of monitoring scores and the duration of monitoring. Starting at the sixth week of monitoring, when there are at least four monitoring scores, the AIMSweb report following each monitoring administration includes one of the following statements: “The student is projected to not reach the goal.” This statement appears if the confidence interval is completely below the goal score. “The student is projected to exceed the goal.” This statement appears if the confidence interval is completely above the goal score. “The student is on track to reach the goal. The projected score at the goal date is between X and Y” (where X and Y are the bottom and top of the confidence interval). This statement appears if the confidence interval includes the goal score. If Statement A appears, the user has a sound basis for deciding that the current intervention is not sufficient and a change to instruction should be made. If Statement B appears, there is an empirical basis for deciding that the goal is not sufficiently challenging and should be increased. If Statement C appears, the student’s progress is not clearly different from the aimline and so there is not a compelling reason to change the intervention or the goal; however, the presentation of the confidence-interval range enables the user to see whether the goal is near the upper limit or lower limit of the range, which would signal that the student’s progress is trending below or above the goal.  A 75% confidence interval was chosen for this application because it balances the costs of the two types of decision errors. Incorrectly deciding that the goal will not be reached (when in truth it will be reached) has a moderate cost: an intervention that is working will be replaced by a different intervention. Incorrectly deciding that the goal may be reached (when in truth it will not be reached) also has a moderate cost: an ineffective intervention will be continued rather than being replaced. Because both kinds of decision errors have costs, it is appropriate to use a modest confidence level.
 
Calculation of the 75% confidence interval for the score at the goal date. Calculate the trend line. This is the ordinary least-squares regression line through the student’s monitoring scores. Calculate the projected score at the goal date. This is the value of the trend line at the goal date. Calculate the standard error of estimate (SEE) of the projected score at the goal date, using the following formula: [((1 + 1/k + (GW – mean(w)))/(k – 2))((sum(y – y’)2)/(sum(w – mean(w))2))]1/2 where k = number of completed monitoring administrations w = week number of a completed administration GW = week number of the goal date y = monitoring score y’ = predicted monitoring score at that week (from the student’s trendline). The means and sums are calculated across all of the completed monitoring administrations up to that date. Add and subtract 1.25 times the SEE to the projected score, and round to the nearest whole numbers.
 
Evidentiary basis for these decision rules: The decision rules are statistically rather than empirically based. The guidance statements that result from applying the 75% confidence interval to the projected score are correct probabilistic statements, under certain assumptions: The student’s progress can be described by a linear trend line. If the pattern of the student’s monitoring scores is obviously curvilinear, then the projected score based on a linear trend will likely be misleading. We provide training in the AIMSweb Progress Monitoring Guide about the need for users to take non-linearity into account when interpreting progress-monitoring data. The student will continue to progress at the same rate as they have been progressing to that time. This is an unavoidable assumption for a decision system based on extrapolating from past growth. Even though the rules are not derived from data, it is useful to observe how they work in a sample of real data. For this purpose, we selected random samples of students in the AIMSweb 2010-2011 database who were progress-monitored on either Reading Curriculum-Based Measurement (R-CBM) or Math Computation (M-COMP). All students scored below the 25th percentile in the fall screening administration of R-CBM or M-COMP. The R-CBM sample consisted of 1,000 students (200 each at grades 2 through 6) who had at least 30 monitoring scores, and the M-COMP sample included 500 students (100 per grade) with a minimum of 28 monitoring scores. This analysis was only a rough approximation, because we did not know each student’s actual goal or whether the intervention or goal was changed during the year. To perform the analyses, we first set an estimated goal for each student by using the ROI at the 85th percentile of AIMSweb national ROI norms to project their score at their 30th monitoring administration. Next, we defined “meeting the goal” as having a mean score on the last three administrations (e.g., the 28th through 30th administrations of R-CBM) that was at or above the goal score. At each monitoring administration for each student, we computed the projected score at the goal date and the 75% confidence interval for that score, and recorded which of the three decision statements was generated (projected not to meet goal, projected to exceed goal, or on-track/no-change).
 
In this analysis, accuracy of guidance to change (that is, accuracy of projections that the student will not reach the goal or will exceed the goal) reached a high level (80%) by about the 13th to 15th monitoring administration, on average. The percentage of students receiving guidance to not change (i.e., their trendline was not far from the aimline) would naturally tend to decrease over administrations as the size of the confidence interval decreased. At the same time, however, there was a tendency for the trendline to become closer to the aimline over time as it became more accurately estimated, and this worked to increase the percentage of students receiving the “no change” guidance.

Improved Student Achievement: Data Unavailable

Improved Teacher Planning Data Unavailable