DIBELS Next

Area: DORF (DIBELS Oral Reading Fluency)

Cost Technology, Human Resources, and Accommodations for Special Needs Service and Support Purpose and Other Implementation Information Usage and Reporting

Amplify: The basic pricing plan is an annual per student license of $14.90. For users already using an mCLASS assessment product, the cost per student to add mCLASS:DIBELS Next is $6 per student. 

Sopris: There are three purchasing options for implementing Progress Monitoring materials in Year 1:

1) Progress Monitoring via Online Test Administration and Scoring

2) Progress Monitoring materials as part of the purchase of Classroom Sets, which also include Benchmark materials and DIBELS Next Survey

3) Individual Progress Monitoring materials.

DIBELS Next Classroom Sets contain everything needed for one person to conduct the Benchmark Assessment for 25 students and the Progress Monitoring Assessment for up to five students. These easy-to-implement kits simplify the distribution and organization of DIBELS Next materials.

DMG: Materials may be downloaded at no cost from DMG at http://dibels.org/next. Minimal reproduction costs associated with printing.

Testers will require 4-8 hours of training. Examiners must at a minimum be a paraprofessional.

Training manuals and materials are field tested and are included in the cost of the tool.

Amplify’s Customer Care Center offers complete user-level support from 7:00 a.m. to 7:00 p.m. EST, Monday through Friday. Customers may contact a customer support representative via telephone, e-mail, or electronically through the mCLASS website. Additionally, customers have self-service access to instructions, documents, and frequently asked questions on our Website.  The research staff and product teams are available to answer questions about the content within the assessments.

Accommodations:

DIBELS Next is an assessment instrument well-suited for use with capturing the developing reading skills of special education students learning to read, with a few exceptions: a) students who are deaf; b) students who have fluency-based speech disabilities, e.g., stuttering, oral apraxia; c) students who are learning to read in a language other than English or Spanish; d) students with severe disabilities.  Use of DIBELS Next is appropriate for all other students, including those in special education for whom reading connected text is an IEP goal. For students receiving special education, it may be necessary to adjust goals and timelines. Approved accommodations are available in the administration manual.

Where to obtain:

Amplify Education, Inc.
55 Washington Street, Suite 900
Brooklyn, NY 11201
1-800-823-1969, option 1
www.amplify.com

Sopris Learning
17855 Dallas Parkway, Suite 400, Dallas, TX 75287-6816
http://www.soprislearning.com

DMG
859 Willamette Street, Suite 320, Eugene, OR 97401
541-431-6931
(888) 399-1995
http://dibels.org

DIBELS Next measures are brief, powerful indicators of foundational early literacy skills that: are quick to administer and score; serve as universal screening (or benchmark assessment) and progress monitoring; identify students in need of intervention support; evaluate the effectiveness of interventions; and support the RtI/Multi-tiered model. DIBELS Next comprises six measures: First Sound Fluency (FSF), Letter Naming Fluency (LNF), Phoneme Segmentation Fluency (PSF), Nonsense Word Fluency (NWF), DIBELS Oral Reading Fluency (DORF), and Daze. 

DIBELS Oral Reading Fluency (DORF) is a measure of advanced phonics and word attack skills, accurate and fluent reading of connected text, and reading comprehension. The DORF passages and procedures are based on the program of research and development of Curriculum-Based Measurement of reading by Stan Deno and colleagues at the University of Minnesota (Deno, 1989). There are two components to DORF: oral reading fluency and passage retell.

Administration of the test takes 1 minute plus 1 minute maximum for retell.

There are 20 alternate forms per measure.

Raw scores and developmental benchmarks are available. Assessor shows the reading passage to the student.  The student reads the passage. Following the passage reading, the student is ask to retell the story if he/she reads 40 or more words correctly on the passage.

Scores: 1) The number of words read correctly in 1 minute. 2) The percentage of words read accurately in 1 minute. 3) The number of words spoken in the retell that is about what was read. 4) The quality of the retell response is rated against a rubric for number of details, meaningful sequence, and main idea. Raw scores, cut points, and benchmark goals are all grade-specific but are not strictly based on grade norms.

 

 

Reliability of the Performance Level Score: Convincing Evidence

Type of Reliability

Age or Grade

n (range)

Coefficient Median

SEM

Test-Retest

1

28

0.95

N/A

Alternate-form

1

28

0.98

5.56

Test-Retest

2

21

0.91

N/A

Alternate-form

2

24

0.96

8.00

Inter-rater

2

25

0.99

N/A

Test-Retest

3

27

0.93

N/A

Alternate-form

3

30

0.97

7.00

Inter-rater

3

25

0.99

N/A

Test-Retest

4

21

0.97

N/A

Alternate-form

4

30

0.95

8.53

Inter-rater

4

24

0.99

N/A

Test-Retest

5

23

0.97

N/A

Alternate-form

5

25

0.96

7.66

Inter-rater

5

28

0.99

N/A

Alternate-form

6

61

0.94

7.00

Inter-rater

6

20

0.99

N/A

Information (including normative data) / Subjects:

Test-Retest: Participants were a stratified random sample drawn from a single school district that was selected based upon student DIBELS performance from the beginning-of-year benchmark assessment.

Alternate-form: Participants were a stratified random sample drawn from thirteen schools across five states based on beginning of year DIBELS performance.

Inter-rater: Participants were students randomly selected from five schools.

See The DIBELS Next Technical Adequacy Brief and the DIBELS Next Technical Manual for further sample data.

Reliability of the Slope: Unconvincing Evidence

Type of Reliability

Age or Grade

n (range)

Coefficient Median

SEM

HLM

1

356

0.82

10.33

HLM

2

2051

0.77

11.29

HLM

3

843

0.55

11.12

HLM

4

1010

0.56

10.50

HLM

5

610

0.50

10.39

HLM

6

102

0.50

10.96

Information (including normative data) / Subjects:

Reliability of slope was computed using progress monitoring data from the 2013-2014 school year. There were approximately 130,000 students in first through sixth grades from 1,121 schools within 388 districts. This data is reported in the DIBELS Next Technical Adequacy Brief and the Technical Aqeduacy Supplement for DIBELS Next Oral Reading Fluency

 

Validity of the Performance Level Score: Convincing Evidence

Type of Validity

Age or Grade

Test or Criterion

n (range)

Coefficient Median

Predictive

1

GRADE Total

196

0.64

Concurrent

1

GRADE Total

196

0.75

Concurrent

1

NAEP

23

0.97

Predictive

2

GRADE Total

215

0.76

Concurrent

2

GRADE Total

215

0.73

Concurrent

2

NAEP

23

0.91

Predictive

3

GRADE Total

190

0.67

Concurrent

3

GRADE Total

190

0.66

Concurrent

3

NAEP

23

0.96

Predictive

4

GRADE Total

190

0.74

Concurrent

4

GRADE Total

190

0.69

Concurrent

4

NAEP

23

0.89

Predictive

5

GRADE Total

194

0.69

Concurrent

5

GRADE Total

194

0.65

Concurrent

5

NAEP

23

0.96

Predictive

6

GRADE Total

103

0.64

Concurrent

6

GRADE Total

103

0.61

Information (including normative data) / Subjects:

Validity with GRADE Total Test: Participants were a stratified random sample drawn from thirteen schools across five states based on beginning of year DIBELS performance.

Validity with NAEP: The NAEP standard 4th grade reading passage, from the NAEP 2002 Special Study of Oral Reading (Daane et. al., 2005), was administered concurrently with the DIBELS Next Oral Reading Fluency passages to all grades in the Readability Study. The participants were students from two schools.

 

Predictive Validity of the Slope of Improvement: Unconvincing Evidence

Type of Validity

Age or Grade

Test or Criterion

n (range)

Coefficient

Information (including normative data)/Subjects

range

median

Predictive Validity for DORF WC

Grade 2 to predict Grade 3

IREAD3

8182

 

0.27

Validity of slope was computed using DIBELS DORF data from school year 2011-2012 to predict IREAD3 data from school year 2012-2013. 12% African American, 10% Hispanic, 2% Asian, 5% Multi-race; 30% subsidized lunch; 8% special education; 7% English as second language. Weekly assessments over 12 months (i.e., 6-40 assessments; mean=10.35).

Concurrent Validity for DORF ACC

3

ISTEP+ ELA

2491

 

0.17

Validity of slope was computed using data from school year 2011-2012. 17% African American, 14% Hispanic, 5% Asian, 7% Multi-race; 14% subsidized lunch; 8% special education; 12% English as second language. Weekly assessments over 12 months (i.e., 6-40 assessments; mean=12.49).

Predictive Validity for DORF ACC

Grade 2 to predict Grade 3

IREAD3

8158

 

0.20

Validity of slope was computed using DIBELS DORF-Accuracy data from school year 2011-2012 to predict IREAD3 data from school year 2012-2013. 11% African American, 10% Hispanic, 2% Asian, 5% Multi-race; 30% subsidized lunch; 8% special education; 7% English as second language. Weekly assessments over 12 months (i.e., 6-40 assessments; mean=10.33).

Concurrent Validity for Retell

3

ISTEP+ ELA

2443

 

0.42

Validity of slope was computed using data from school year 2011-2012. 17% African American, 14% Hispanic, 5% Asian, 7% Multi-race; 14% subsidized lunch; 8% special education; 12% English as second language. Weekly assessments over 12 months (i.e., 6-40 assessments; mean=12.39).

Predictive Validity for Retell

Grade 2 to predict Grade 3

IREAD3

6673

 

0.28

Validity of slope was computed using DIBELS Retell data from school year 2011-2012 to predict IREAD3 data from school year 2012-2013. 11% African American, 10% Hispanic, 2% Asian, 5% Multi-race; 27% subsidized lunch; 6% special education; 6% English as second language. Weekly assessments over 12 months (i.e., 6-40 assessments; mean=9.98).

 

Disaggregated Reliability and Validity Data: Unconvincing Evidence

Disaggregated Reliability of Slope

Type of Reliability

Subgroup

Age or Grade

n (range)

Coefficient Median

HLM

Caucasian

1

1643

0.71

HLM

African American

1

380

0.77

HLM

Hispanic

1

363

0.75

HLM

Caucasian

2

5016

0.64

HLM

African American

2

1139

0.69

HLM

Hispanic

2

1092

0.69

Information (including normative data) / Subjects:

Reliability of slope was computed using data from the 2011-2012 school year. Sample demographics: 20% subsidized lunch, 10% SPED, 9% ELL. Weekly assessments over 12 months. First grade: 6 - 28 assessments; mean = 9.65. Second grade: 6 - 47 assessments; mean = 12.86.

Disaggregated Validity of Performance Level Score

Type of Validity

Subgroup

Age or Grade

Test or Criterion

n (range)

Coefficient Median

Predictive

Caucasian

Grade 2 to predict Grade 3

IREAD3

14751

0.66

Predictive

African American

Grade 2 to predict Grade 3

IREAD3

2634

0.66

Predictive

Hispanic

Grade 2 to predict Grade 3

IREAD3

2167

0.70

Information (including normative data) / Subjects:

Validity was computed using DORF WC data from the 2011-2012 school year to predict IREAD3 data from the 2012-2013 school year. Approximately 27% of participants were eligible for subsidized lunch, 5% were classified as Special Education students, and 4% used English as a second language. 

Disaggregated Predictive Validity of the Slope of Improvement

Type of
Validity

Age or
Grade

Test or
Criterion

N
(range)

Coefficient

Information (including normative data)/Subjects

range

median

DORF – Words Correct (WC)

Predictive Validity
(Caucasian)

Grade 2
to predict
Grade 3

IREAD3

4,182

 

0.27

Validity of slope was computed using DIBELS DORF data from school year 2011-2012 to predict IREAD3 data from school year 2012-2013. 30% subsidized lunch; 8% special education; 7% English as second language. Weekly assessments over 12 months (i.e., 6-40 assessments; mean=10.35).

Predictive Validity
(African American)

Grade 2
to predict
Grade 3

IREAD3

891

 

0.21

Predictive Validity
(Hispanic)

Grade 2
to predict
Grade 3

IREAD3

855

 

0.30

DORF – Accuracy (ACC)

Concurrent Validity
(Caucasian)

3

ISTEP+
ELA

1,303

 

0.17

Validity of slope was computed using data from school year 2011-2012. 14% subsidized lunch; 8% special education; 12% English as second language. Weekly assessments over 12 months (i.e., 6-40 assessments; mean=12.49).

Concurrent Validity
(African American)

3

ISTEP+
ELA

412

 

0.18

Concurrent Validity
(Hispanic)

3

ISTEP+
ELA

347

 

0.17

Predictive Validity
(Caucasian)

Grade 2
to predict
Grade 3

IREAD3

4,174

 

0.18

Validity of slope was computed using DIBELS DORF-Accuracy data from school year 2011-2012 to predict IREAD3 data from school year 2012-2013. 30% subsidized lunch; 8% special education; 7% English as second language. Weekly assessments over 12 months (i.e., 6-40 assessments; mean=10.33).

Predictive Validity
(African American)

Grade 2
to predict
Grade 3

IREAD3

882

 

0.15

Predictive Validity
(Hispanic)

Grade 2
to predict
Grade 3

IREAD3

857

 

0.27

DORF – Retell

Concurrent Validity
(Caucasian)

3

ISTEP+
ELA

1,282

 

0.41

Validity of slope was computed using data from school year 2011-2012. 14% subsidized lunch; 8% special education; 12% English as second language. Weekly assessments over 12 months (i.e., 6-40 assessments; mean=12.39).

Concurrent Validity
(African American)

3

ISTEP+
ELA

410

 

0.33

Concurrent Validity
(Hispanic)

3

ISTEP+
ELA

344

 

0.40

Predictive Validity
(Caucasian)

Grade 2
to predict
Grade 3

IREAD3

3,476

 

0.29

Validity of slope was computed using DIBELS Retell data from school year 2011-2012 to predict IREAD3 data from school year 2012-2013. 27% subsidized lunch; 6% special education; 6% English as second language. Weekly assessments over 12 months (i.e., 6-40 assessments; mean=9.98).

Predictive Validity
(African American)

Grade 2
to predict
Grade 3

IREAD3

700

 

0.27

Predictive Validity
(Hispanic)

Grade 2
to predict
Grade 3

IREAD3

658

 

0.32

 

Alternate Forms: Partially Convincing Evidence

Provide evidence that alternate forms are of equal and controlled difficulty or, if IRT based, provide evidence of item or ability invariance (attach documentation of direct evidence).

What is the number of alternate forms of equal and controlled difficulty?

Grades 1 through 6 have 20 alternate forms plus 6 benchmark forms in first grade and nine benchmark forms in grades 2 through 6.

If IRT based, provide evidence of item or ability invariance:

The DIBELS Next Readability Tech Report provides evidence of alternate form equivalence. Grade 1: Table 24, page 65; Grade 2: Table 27, page 68; Grade 3: Table 30, page 71; Grade 4: Table 33, page 74; Grade 5: Table 36, page 77; Grade 6: Table 39, page 80.

Table 24

       

Passage Selection and Placement Considerations for Grade 1 DIBELS Next Progress Monitoring Passages

Passage

Genre

N

IRT Rasch Model Difficulty Parameter

Alternate-Form Reliability

Progress Monitor 1

Narrative

23

55.34

0.92

Progress Monitor 2

Expository

23

39.17

0.95

Progress Monitor 3

Narrative

23

47.25

0.94

Progress Monitor 4

Expository

23

21.72

0.96

Progress Monitor 5

Narrative

23

64.09

0.95

Progress Monitor 6

Narrative

23

39.17

0.94

Progress Monitor 7

Expository

23

47.25

0.95

Progress Monitor 8

Narrative

23

74.8

0.94

Progress Monitor 9

Expository

23

21.72

0.95

Progress Monitor 10

Narrative

23

39.17

0.97

Progress Monitor 11

Narrative

23

74.8

0.95

Progress Monitor 12

Expository

23

30.77

0.94

Progress Monitor 13

Narrative

23

47.25

0.93

Progress Monitor 14

Narrative

23

74.8

0.92

Progress Monitor 15

Narrative

23

55.34

0.95

Progress Monitor 16

Expository

23

47.25

0.95

Progress Monitor 17

Narrative

23

74.8

0.94

Progress Monitor 18

Expository

23

39.17

0.95

Progress Monitor 19

Expository

23

39.17

0.95

Progress Monitor 20

Narrative

23

74.8

0.94

 

Table 27

       

Passage Selection and Placement Considerations for Grade 2 DIBELS Next Progress Monitoring Passages

Passage

Genre

N

IRT Rasch Model Difficulty Parameter

Alternate-Form Reliability

Progress Monitor 1

Expository

23

63.76

0.9

Progress Monitor 2

Narrative

24

39.45

0.93

Progress Monitor 3

Narrative

24

46.2

0.84

Progress Monitor 4

Narrative

24

30.88

0.9

Progress Monitor 5

Expository

25

57.98

0.9

Progress Monitor 6

Expository

23

30.88

0.92

Progress Monitor 7

Narrative

25

57.98

0.92

Progress Monitor 8

Narrative

23

63.76

0.91

Progress Monitor 9

Narrative

24

63.76

0.86

Progress Monitor 10

Expository

25

46.2

0.92

Progress Monitor 11

Narrative

25

63.76

0.87

Progress Monitor 12

Expository

25

52.22

0.87

Progress Monitor 13

Narrative

25

52.22

0.88

Progress Monitor 14

Expository

25

57.98

0.91

Progress Monitor 15

Expository

24

30.88

0.91

Progress Monitor 16

Expository

25

46.2

0.91

Progress Monitor 17

Narrative

25

52.22

0.92

Progress Monitor 18

Narrative

25

46.2

0.92

Progress Monitor 19

Expository

24

39.45

0.91

Progress Monitor 20

Expository

25

63.76

0.9

 

Table 30

       

Passage Selection and Placement Considerations for Grade 3 DIBELS Next Progress Monitoring Passages

Passage

Genre

N

IRT Rasch Model Difficulty Parameter

Alternate-Form Reliability

Progress Monitor 1

Narrative

22

50.07

0.84

Progress Monitor 2

Expository

22

50.07

0.94

Progress Monitor 3

Expository

22

21.05

0.93

Progress Monitor 4

Narrative

22

62.07

0.94

Progress Monitor 5

Narrative

22

56.53

0.92

Progress Monitor 6

Narrative

22

42.27

0.94

Progress Monitor 7

Narrative

22

50.07

0.88

Progress Monitor 8

Expository

22

56.53

0.93

Progress Monitor 9

Narrative

22

6.14

0.94

Progress Monitor 10

Narrative

22

56.53

0.94

Progress Monitor 11

Expository

22

56.53

0.9

Progress Monitor 12

Expository

22

21.05

0.93

Progress Monitor 13

Narrative

22

50.07

0.93

Progress Monitor 14

Narrative

22

67.12

0.88

Progress Monitor 15

Expository

22

21.05

0.93

Progress Monitor 16

Narrative

22

62.07

0.91

Progress Monitor 17

Narrative

22

56.53

0.92

Progress Monitor 18

Expository

22

50.07

0.94

Progress Monitor 19

Expository

22

50.07

0.94

Progress Monitor 20

Narrative

22

62.07

0.92

 

Table 33

       

Passage Selection and Placement Considerations for Grade 4 DIBELS Next Progress Monitoring Passages

Passage

Genre

N

IRT Rasch Model Difficulty Parameter

Alternate-Form Reliability

Progress Monitor 1

Narrative

23

53.44

0.93

Progress Monitor 2

Expository

23

53.44

0.89

Progress Monitor 3

Narrative

23

42.49

0.92

Progress Monitor 4

Narrative

23

48.17

0.9

Progress Monitor 5

Expository

23

53.44

0.89

Progress Monitor 6

Narrative

22

41.49

0.9

Progress Monitor 7

Expository

23

42.49

0.89

Progress Monitor 8

Expository

23

72.67

0.9

Progress Monitor 9

Narrative

23

29.31

0.91

Progress Monitor 10

Expository

22

47.64

0.9

Progress Monitor 11

Narrative

23

48.17

0.91

Progress Monitor 12

Narrative

23

29.31

0.91

Progress Monitor 13

Narrative

23

58.41

0.88

Progress Monitor 14

Expository

23

67.86

0.9

Progress Monitor 15

Expository

22

67.8

0.9

Progress Monitor 16

Narrative

23

48.17

0.91

Progress Monitor 17

Expository

23

48.17

0.92

Progress Monitor 18

Expository

23

36.28

0.89

Progress Monitor 19

Narrative

23

48.17

0.87

Progress Monitor 20

Narrative

23

58.41

0.9

 

Table 36

       

Passage Selection and Placement Considerations for Grade 5 DIBELS Next Progress Monitoring Passages

Passage

Genre

N

IRT Rasch Model Difficulty Parameter

Alternate-Form Reliability

Progress Monitor 1

Expository

23

52

0.94

Progress Monitor 2

Narrative

23

58.31

0.92

Progress Monitor 3

Expository

23

19.31

0.93

Progress Monitor 4

Narrative

23

64.85

0.92

Progress Monitor 5

Expository

23

58.31

0.92

Progress Monitor 6

Expository

23

30.33

0.92

Progress Monitor 7

Expository

23

58.31

0.94

Progress Monitor 8

Narrative

23

52

0.91

Progress Monitor 9

Narrative

23

46.08

0.93

Progress Monitor 10

Expository

23

40.57

0.9

Progress Monitor 11

Expository

23

64.85

0.92

Progress Monitor 12

Narrative

23

25.1

0.94

Progress Monitor 13

Expository

23

52

0.89

Progress Monitor 14

Expository

23

77.83

0.9

Progress Monitor 15

Expository

23

35.39

0.9

Progress Monitor 16

Expository

23

40.57

0.91

Progress Monitor 17

Narrative

23

64.85

0.94

Progress Monitor 18

Narrative

23

40.57

0.91

Progress Monitor 19

Narrative

23

52

0.92

Progress Monitor 20

Expository

23

71.4

0.93

 

Table 39

       

Passage Selection and Placement Considerations for Grade 6 DIBELS Next Progress Monitoring Passages

Passage

Genre

N

IRT Rasch Model Difficulty Parameter

Alternate-Form Reliability

Progress Monitor 1

Expository

24

45.21

0.86

Progress Monitor 2

Expository

24

69.46

0.8

Progress Monitor 3

Expository

23

32.12

0.82

Progress Monitor 4

Expository

23

49.07

0.82

Progress Monitor 5

Narrative

24

56.69

0.84

Progress Monitor 6

Expository

23

45.21

0.87

Progress Monitor 7

Expository

23

52.87

0.83

Progress Monitor 8

Narrative

24

64.84

0.84

Progress Monitor 9

Expository

23

52.87

0.86

Progress Monitor 10

Narrative

23

45.21

0.83

Progress Monitor 11

Expository

23

64.84

0.83

Progress Monitor 12

Expository

23

32.12

0.83

Progress Monitor 13

Narrative

24

49.07

0.86

Progress Monitor 14

Expository

24

56.69

0.84

Progress Monitor 15

Expository

23

41.21

0.84

Progress Monitor 16

Narrative

22

54.43

0.87

Progress Monitor 17

Expository

24

69.46

0.88

Progress Monitor 18

Narrative

24

36.93

0.85

Progress Monitor 19

Expository

23

49.07

0.83

Progress Monitor 20

Expository

23

58.71

0.86

 

Sensitive to Student Improvement: Convincing Evidence

Describe evidence that the monitoring system produces data that are sensitive to student improvement (i.e., when student learning actually occurs, student performance on the monitoring tool increases on average).

For DORF WC: Slopes on the progress-monitoring tool are significantly greater than zero; the slopes are significantly different for special education students vs. low-achieving vs. average-achieving vs. high-achieving students; and the slopes are significantly greater when effective practices (e.g., identified with high fidelity implementation) are in place.

Grade

n

Slope

SE

n

Slope

SE

n

Slope

SE

DORF WC

 

All

Special Ed

Non Special Ed

1

3277

3.33

0.04

277

2.32

0.12

1775

3.46

0.06

2

9617

3.78

0.02

861

2.62

0.07

4634

3.78

0.04

3

4223

0.68

0.03

444

0.73

0.04

2470

0.66

0.04

4

1397

0.61

0.10

59

0.54

0.20

541

0.62

0.16

5

1116

0.02

0.11

65

0.24

0.18

274

0.04

0.21

6

117

1.00

0.40

N/A

N/A

N/A

N/A

N/A

N/A

 

 

High
Achieving

Average
Achieving

Low
Achieving

3

491

1.17

0.36

2650

0.74

0.04

601

0.77

0.03

 

 

Fidelity

Significance
Tests

 

High Fidelity

Low Fidelity

Special Ed
vs. Not

High vs.
Avg vs.
Low Achieving

High vs.
Low
Fidelity

1

24908

12.09

0.07

6791

10.71

0.13

Yes

N/A

Yes

2

29189

10.05

0.04

1893

9.50

0.17

Yes

N/A

Yes

High fidelity of implementation was defined by selecting students that have scores from all the three benchmark periods and have been progress monitored more than six times if they were at risk at the beginning of the year.
 

End-of-Year Benchmarks: Convincing Evidence

Are benchmarks for minimum acceptable end-of-year performance specified in your manual or published materials?

Yes.

Specify the end-of-year performance standards:

Three end-of-year performance standards are specified: Well Below Benchmark, Below Benchmark, and At or Above Benchmark. These standards are used to indicate increading odds of achieving At or Above Benchmark status at the next benchmark administration.

What is the basis for specifying minimum acceptable end-of-year performance?

Criterion-Referenced                    

Specify the benchmarks:

Grade

Score Level

Likely Need for Support

End of Year

Words Read Correctly

First Grade

At or Above Benchmark

Likely to Need Core Support

47+

Below Benchmark

Likely to Need Strategic Support

32-46

Well Below Benchmark

Likely to Need Intensive Support

0-31

Second Grade

At or Above Benchmark

Likely to Need Core Support

87+

Below Benchmark

Likely to Need Strategic Support

65-84

Well Below Benchmark

Likely to Need Intensive Support

0-64

Third Grade

At or Above Benchmark

Likely to Need Core Support

100+

Below Benchmark

Likely to Need Strategic Support

80-99

Well Below Benchmark

Likely to Need Intensive Support

0-79

Fourth Grade

At or Above Benchmark

Likely to Need Core Support

115+

Below Benchmark

Likely to Need Strategic Support

95-114

Well Below Benchmark

Likely to Need Intensive Support

0-94

Fifth Grade

At or Above Benchmark

Likely to Need Core Support

130+

Below Benchmark

Likely to Need Strategic Support

105-129

Well Below Benchmark

Likely to Need Intensive Support

0-104

Sixth Grade

At or Above Benchmark

Likely to Need Core Support

120+

Below Benchmark

Likely to Need Strategic Support

95-119

Well Below Benchmark

Likely to Need Intensive Support

0-94

What is the basis for specifying these benchmarks?

Criterion-Referenced

If criterion-referenced, describe procedure for specifying benchmarks for end-of-year performance levels:

The DIBELS Next benchmark goals provide targeted levels of skill that students need to achieve by specific times to be considered to be making adequate progress. The Group Reading Assessment and Diagnostic Evaluation (GRADE; Williams, 2001), a high-quality, nationally norm-referenced assessment, was used as an external criterion in the Benchmark Goal Study. In the Benchmark Goal Study, the 40th percentile at or above the GRADE Total Test Raw Score was used as one approximation of adequate reading skill. The intent is to develop generalizable benchmark goals and cut points that are relevant and appropriate for a wide variety of reading outcomes, across a wide variety of states and regions, and for diverse groups of students. The principle vision for DIBELS Next is a step-by-step vision. Student skills at or above benchmark at the beginning of the year put the odds in favor of the student achieving the middle-of-year benchmark goal. In turn, students with skills at or above benchmark in the middle of the year have the odds in favor of achieving the end-of-year benchmark goal. Finally, students with skills at or above benchmark at the end of the year have odds in favor of having adequate reading skills on a wide, general variety of external measures of reading proficiency. The fundamental logic for developing the benchmark goals and cut points for risk was to begin with the external outcome goal and work backward in that step-by-step system. We first obtained an external criterion measure (the GRADE Total Test Raw Score) at the end of the year with a level of performance that would represent adequate reading skills (the GRADE Total Test Raw Score at the 40th percentile rank). Next we specified the benchmark goal and cut point for risk on end-of-year DORF WC with respect to the end-of-year external criterion. Then, using the DORF WC end-of-year goal as an internal criterion, we established the benchmark goals and cut points for risk on middle-of-year DORF WC. Finally, we established the benchmark goals and cut points for risk on beginning-of-year DORF WC using the middle-of-year DORF WC goal as an internal criterion.   

Rates of Improvement Specified: Convincing Evidence

Decision Rules for Changing Instruction: Unconvincing Evidence

Specification of validated decision rules for when changes to instruction need to be made:

Yes.

Specify the decision rules:

We recommend using a goal-oriented rule for evaluating a student’s response to intervention that is straightforward for teachers to understand and use. Decisions about a student’s progress are based on comparisons of DIBELS scores that are plotted on a graph and the aimline, or expected rate of progress. We suggest that educational professionals consider instructional modifications when student performance falls below the aimline for three consecutive points (Kaminski, Cummings, Powell-Smith, and Good, 2008). 

Evidentiary basis for these decision rules:

This recommended decision rule is based on early work with CBM (Fuchs, 1988, 1989) and precision teaching (White & Haring, 1980) and allows for a minimum of three data points to be gathered before any decision is made. As when validating a student’s need for support, a pattern of performance is considered before making individual student decisions (Kaminski, Cummings, Powell-Smith, and Good, 2008).

Kaminski, R. A., Cummings, K. Powell-Smith, K. A., Good, R. H. (2008). Best Practices in Using Dynamic Indicators of Basic Early Literacy Skills for Formative Assessment and Evaluation.  In Thomas, A. and Grimes, J. Best Practices in School Psychology V (Vol 4, pp. 1181-1204). Bethesda, MD: NASP Publications.

Fuchs, L. S. (1988). Effects of computer-managed instruction on teachers' implementation of systematic monitoring programs and student achievement. Journal of Education Research, 81, 294-304. 

Fuchs, L. S. (1989). Evaluation solutions: Monitoring progress and vising intervention plans. In M. Shinn (Ed.), Curriculum-based management: Assessing special children.  New York: Guilford Press.

White, O. R., & Haring, N. G. (1980). Exceptional teaching (2nd Ed.) Columbus, OH: Merrill.

Decision Rules for Increasing Goals: Unconvincing Evidence

Specification of validated decision rules for when increases in goals need to be made: 

Yes.

Specify the decision rules: 

In general, it is recommended that support be continued until a student achieves at least three points at or above the goal. If a decision is made to discontinue support, it is recommended that progress monitoring be continued weekly for at least 1 month to ensure that the student is able to maintain growth without the supplemental support. The frequency of progress monitoring will be faded gradually as the child’s progress continues to be sufficient (Kaminski, Cummings, Powell-Smith, and Good, 2008). 

Evidentiary basis for these decision rules: 

This recommended decision rule is based on early work with CBM (Fuchs, 1988, 1989) and precision teaching (White & Haring, 1980) and allows for a minimum of three data points to be gathered before any decision is made. As when validating a student’s need for support, a pattern of performance is considered before making individual student decisions (Kaminski, Cummings, Powell-Smith, and Good, 2008).

Kaminski, R. A., Cummings, K. Powell-Smith, K. A., Good, R. H. (2008). Best Practices in Using Dynamic Indicators of Basic Early Literacy Skills for Formative Assessment and Evaluation.  In Thomas, A. and Grimes, J. Best Practices in School Psychology V (Vol 4, pp. 1181-1204). Bethesda, MD: NASP Publications.

Fuchs, L. S. (1988). Effects of computer-managed instruction on teachers' implementation of systematic monitoring programs and student achievement. Journal of Education Research, 81, 294-304.

Fuchs, L. S. (1989). Evaluation solutions: Monitoring progress and vising intervention plans. In M. Shinn (Ed.), Curriculum-based management: Assessing special children.  New York: Guilford Press.

White, O. R., & Haring, N. G. (1980). Exceptional teaching (2nd Ed.) Columbus, OH: 

Improved Student Achievement: Unconvincing Evidence

Description of evidence that teachers’ use of the tool results in improved student achievement based on an empirical study that provides this evidence.

Study: Unpublished study.

Sample:

         Number of students in product/experimental condition: 22,662

Number of students in control condition: 33,104

Characteristics of students in sample and how they were selected for participation in study:

A large mid-western state implemented the technology-supported DIBELS DORF assessment to be administered to Grade 2 students multiple times each school year. While DIBELS is implemented in that state, schools can choose to participate or not. If a school opts in, all teachers must use DIBELS for benchmark and progress monitoring purposes. However, there are instances in which individual students may be exempted from the assessment based on eligibility guidance provided by the state, administers’ teaching time trade-offs, the presence of an IEP or special education designation excluding a student from the assessment, mobility, illness, or other absence.

This study compared the performance of students who were administered DIBELS DORF measures with high fidelity implementation to those who were either exempted from DIBELS or administered with low fidelity implementation. This study predicted the impact of Grade 2 student DIBELS DORF performance from 2011-2012 on Grade 3 statewide IREAD3 test performance, one year later (2012-2013). It was hypothesized that students who were administered the DIBELS DORF with high fidelity implementation would demonstrate higher IREAD3 test scores than students who were exempted from DIBELS DORF in Grade 2.

This study included data from a total of 1264 schools for this study. Data from 55,766 Grade 3 students in 2012-2013 school year were analyzed. Among those students, 11% African American, 10% Hispanic; 6% were English learners; 16% were special education students; 62 % were eligible for free or reduced priced lunch. Among the 55,766 students, 22,662 were administered DIBELS DORF in Grade 2 with high fidelity implementation during the 2010-2011 school year while 33,104 students were either exempted from DIBELS DORF or identified as low fidelity implementation in Grade 2.

Design: Random assignment was not used.

As a statewide initiative supporting school-level implementation, random assignment to conditions was not available for this study. Therefore, this study employed propensity score matching in the analysis. Propensity score methods rely on a model of treatment assignment to identify comparable individuals on the basis of similar probabilities of receiving treatment (Fan & Nowell, 2011).  In the DIBELS database, there was a wide variety of student characteristics from which could be chosen to build the propensity score. To ensure choosing appropriate characteristics, this study used an iterative model selection procedure with different subsets of characteristics as well as the full set of characteristics to perform the propensity score matching. The propensity score was estimated using a logistic regression model. For each model, model performance and the balanced of covariates across treatment and comparison groups were examined. The model that demonstrated the desired statistical characteristics was then used to estimate the treatment probabilities for the control and treatment groups.  We then adopted covariate adjustment using the propensity score approach (Austin, 2011). The propensity score was calculated based on student demographic characteristics and subsequently propensity score and an indicator variable denoting treatment status were regressed on the outcome variable: student achievement test score. The covariates included were special education or not, English learner or not, and eligibility for free or reduced-price lunch; other demographic characteristics did not perform well in the matching model were excluded.

Unit of assignment: Students

Unit of analysis: Students

Duration of product implementation: One year

Describe analysis: The three chosen covariates (i.e., special education or not, English learner or not, eligibility for free or reduced-price lunch) in the propensity score matching procedures represented the largest difference in students’ scores on demographic information between the two conditions. A propensity score model was built whereby each student was assigned a propensity score, which is the predicted value of the logistic regression model estimated using the selected covariates. A linear regression analysis was then conducted using the whole data set, regressing statewide achievement test scores on the binary treatment indicator and the propensity scores.  

Fidelity:

Description of when and how fidelity of treatment information was obtained: The fidelity of implementation was based on the criterion informed by NCRTI (2010). The following criteria were applied to identify high-fidelity implementations: (1) students were adiminstrated DIBELS DORF (including DORF WC, DORF ACC, and Retell) screening tools at each benchmark period; (2) low-performing students who were identified by the screening tools were progress monitored as needed according to the DIBELS user guide. 

Results on the fidelity of treatment implementation measure: There were 29,881 students who received at least one DIBELS DORF progress monitoring during the school year. However, among them, 7219 students were identified as low fidelity implementation. Therefore, among the 55,766 students, 22,662 were administered DIBELS DORF in Grade 2 with high fidelity implementation during the 2010-2011 school year while 33,104 students were either exempted from DIBELS DORF or identified as low fidelity implementation in Grade 2

Measures:

External outcome measures used in the study, along with psychometric properties:

Measure name

Reliability statistics (specify type of reliability, e.g. Cronbach’s alpha, IRT reliability, temporal stability, inter-rater)

IREAD3

Not available

 

Results:

Results of the study: The results suggested that, after controlling for the propensity score, administration of DIBELS DORF was positively associated with change in IREAD3 test score (b = 23.04, S. E. = 0.54, p < 0.01). Students who were administered DIBELS DORF had higher IREAD3 test scores (M = 523.14, SD = 61.55) than students who were not administered DIBELS DORF or identified as low fidelity implementation (M = 489.52, SD = 69.29).

Effect sizes for each outcome measure:

Measure name

Effect size

IREAD3

0.35

 

Summary of conclusions and explanation of conditions to which effects should be generalizedOverall, the administration of DIBELS DORF measures provides broad, useful information about students’ skill levels and identify those students in need of further assessments and development. This study used a large sample and used propensity score matching to reduce bias across treatment and comparison conditions. The results of this study revealed that when schools/teachers administered DIBELS DORF to Grade 2 students, IREAD3 test scores were increased in Grade 3. Given the large, diverse nature of the sample, these results should generalize beyond the sample.

Other related references or information

Austin, P. C. (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46, 399-424.

Fan, X., & Nowell, D. L. (2011). Using propensity score matching in educational research. Gifted Child Quarterly, 55(1), 74-79.

National Center on Response to Intervention. (2010). Essential components of RTI—A closer look at response to intervention. Washington, DC: U.S. Department of Education, Office of Special Education Programs, National Center on Response to Intervention.

Improved Teacher Planning Data Unavailable