Momentary Time-Sampling

Scale: Academic Engagement

Descriptive Information

Usage

Acquisition and Cost

Program Specifications and Requirements

Training

This review focuses on examinations of the properties of data derived from MTS with 15-second intervals for measuring academic engagement, when this construct is defined as including both active (e.g., writing an essay) and passive (e.g., looking at a teacher while she teaches) engagement. In other words, a student could engage in active or passive engagement in order to be considered to engage in “academic engagement”.

MTS is intended for use in grades K-12 for all students, including students with disabilities and English language learners.

Prevalence/duration is the most frequent score resulting from MTS, and may be calculated by summing the number of intervals scored as an occurrence and dividing this value by the total number of intervals observed. Frequency can also be calculated according to formulas found in Suen & Ary (1989) when certain criteria for the interval length and behavior stream are met.

Momentary time-sampling for academic engagement is a non-commercial intervention and, therefore, does not have a formal pricing plan.

Momentary time-sampling is described in numerous books, articles, and presentations. Its methods are simple and transparent, and as a result, MTS may be considered free to use.

This review focuses on examinations of the properties of data derived from Momentary time-sampling. MTS is a behavior assessment methodology within systematic direct observation wherein an observation period is divided into intervals, and behavior during each interval is scored as an occurrence if the behavior is occurring at the moment the interval begins or ends (depending on the specific procedures used). Like other time-sampling and interval-recording procedures, MTS can be used with a number of interval lengths, observation durations, and target behaviors. This review focuses on the use of MTS with 15-second intervals, with a target behavior of academic engagement (defined as including both passive and active engagement, as described below). The studies reviewed are those that explicitly examine the reliability, validity, and levels of performance of data derived from MTS with 15-second intervals for academic engagement.

The training of observers varies by study and is generally over 4 hours.

There is no minimum qualifications of the rater.

 

Sensitive to Student Change: Convincing Evidence

Describe evidence that the monitoring system produces data that are sensitive to detect incremental change (e.g., small behavior change in a short period of time such as every 20 days or more frequently depending on the purpose of the construct).

Evidence on sensitivity to change comparing MTS with 15-second intervals for academic engagement is somewhat difficult to identify, given that SDO procedures are often put forth as the gold standard against which other methods are evaluated.

Levels of Performance Specified: Partially Convincing Evidence

Specify the levels of performance:

Fellers & Saudargas, 1987. Observed behavior of two groups of 15 female students (LD and non-LD; total n = 30) across grades 2, 4, and 5 from public elementary schools. LD and non-LD students were matched based on classroom (i.e., one for each group drawn from each classroom). Observed using SECOS system, which utilizes a combined definition of academic engagement called “schoolwork” with 15s MTS procedures. Students were observed at least three times for 20 minutes across two weeks.

Percentage of total intervals during which “seatwork” was indicated, as mean (M) and standard deviation (SD).

LD group. M = 68.3%, SD = 12.7%.

Non-LD group. M = 73.9%, SD = 14.3%.

Fellers, G., & Saudargas, R. A. (1987). Classroom Behaviors of LD and Nonhandicapped Girls. Learning Disability Quarterly, 10(3), 231. http://doi.org/10.2307/1510495

 

Slate & Saudargas, 1986a (“Differences in learning disabled and average students’ classroom behaviors”). Observed behavior of two groups of 14 male students (LD and non-LD; total n = 28) across grades 3, 4, and 5 from public elementary schools. Of LD group, White = 7, Black = 7. Of non-LD group, White = 6, Black = 8. Observed using SECOS system, which utilizes a combined definition of academic engagement called “schoolwork” with 15s MTS procedures. Students were observed four to six times for 20 minutes across 10 weeks.

Percentage of total intervals during which “seatwork” was indicated, as mean (M) and standard deviation (SD).

LD group. M = 67.9%, SD = 12.1%.

Non-LD group. M = 68.1%, SD = 8.53%.

Slate, J. R., & Saudargas, R. A. (1986). Differences in Learning Disabled and Average Students’ Classroom Behaviors. Learning Disability Quarterly, 9(1), 61. http://doi.org/10.2307/1510402


Slate & Saudargas, 1986b (“Differences in the classroom behaviors of behaviorally disordered and regular class children”). Observed behavior of two groups of 13 male students (behaviorally disordered [BD] and non-BD; total n = 26) across grades 3, 4, and 5 from public elementary schools. Observed using SECOS system, which utilizes a combined definition of academic engagement called “schoolwork” with 15s MTS procedures. Students were observed four times for 20 minutes, with each individual student’s observations occurring within a single two week period.

Percentage of total intervals during which “seatwork” was indicated, as mean (M) and standard deviation (SD).

BD group. M = 66.83%, SD = 14.38%.

Non-BD group. M = 67.52%, SD = 7.40%.

Slate, J. R., & Saudargas, R. A. (1986). Differences in the classroom behaviors of behaviorally disordered and regular class children. Behavioral Disorders, 45–53.

 

Zigmund, Kerr, & Schaeffer, 1988. Observed behavior of three groups of students: students with LD, students with emotional disturbance (ED), and a control group of students. Observed using 15s MTS procedures of on-task behavior. Students were observed twice weekly for 30 minutes.

LD group: n = 36. Male = 28, Female = 8. Grades 9 to 11.

ED group: n = 8.  Male = 7, Female = 1. Grades 9 to 12.

Control students: typical students, randomly selected at each observation of a student with LD or ED.

Number of total intervals during which “on-task” was indicated, as mean (M) and standard deviation (SD). Total intervals = 15.

LD group. M = 8.49, SD = 2.734

ED group. M = 8.78, SD = 1.974

Control group. M = 8.82, SD = 1.742

Zigmond, N., Kerr, M. M., & Schaeffer, A. (1988). Behavior patterns of learning disabled and non-learning-disabled adolescents in high school academic classes. Remedial and Special Education, 9(2), 6–11.

 

Describe how the levels of performance are used for progress monitoring:

Unclear. Although these levels of performance are available within the literature, it is likely expected that MTS would be used in a single-case progress-monitoring framework with within-student levels of performance at intervention compared to those of baseline in order to evaluate response to intervention.

Data to Support Intervention Change: Data Unavailable

Data to Support Intervention Choice: Data Unavailable

Reliability: Convincing Evidence

Describe and provide evidence demonstrating the reliability of your progress monitoring tool. In your description, include information about the intended purpose of your tool and an argument to demonstrate why the reliability data available are aligned with and support the tool’s reliability given its intended purpose.

Reliability evidence for 15-second MTS procedures with academic engagement (when explicitly examined for psychometric properties) has chiefly been examined using generalizability theory. After calculating variance components for the specified model and its original specifications, coefficients for relative (Ep2) and absolute (phi) decision-making may be derived for both the original measurement model, as well as models which utilize distinct combinations of procedures (e.g., raters, observations). Thus, Ep2 and phi coefficients are both reported as applicable for a given study. When applicable, the manipulated facets are also described in order to demonstrate change in these coefficients as a result of manipulated facet conditions. Although decision-making may occur relative to a criterion using MTS (as demonstrated using Ep2), it may be expected that most decision-making using MTS will occur using absolute, within-student comparisons. Thus, the phi coefficient may often provide the most relevant coefficient of dependability for this instrument. When describing results of D studies, wherein facets are manipulated to examine the resulting changes relative and absolute decision-making coefficients, questions are asked regarding over how many levels of a facet an observation would need to be averaged in order to generate a dependable estimate of behavior. Thus, as long as some amount of variance is attributable to a facet, increasing the number of levels over which it will be averaged will increase the dependability of that estimate (depending upon whether that facet is included in the calculation of Ep2 or phi). As described below, reliability evidence generally suggests that 15-second MTS procedures for academic engagement certainly need to be averaged over multiple observations in order to generate a single, dependable estimate. However, the number of observations and their length varies from study to study.

For kindergarten-age students, Briesch, Riley-Tillman, and Chafouleas (2010) demonstrated that at least five days of observations would be required in order to generate an Ep2 or phi above .80, regardless of whether one or three observations were conducted per day. For elementary-age students, Hintze and Matthews (2004) demonstrated coefficients above .80 only when a considerable number of observations were conducted. To wit, an Ep2 and phi of .83 was observed for 40 days of observations, using 4 observations per day. Ep2 and phi for an estimate of academic engagement derived from 3 days of observations using 1 observation per day were both only .25. For middle-school students, Briesch, Volpe, and Ferguson (2014) examined phi coefficients for two subsamples: one group comparable to a typical classroom setting, and one for students demonstrating behavior problems. Using only one observer, 4 observations, each 20 minutes in length, would demonstrate a coefficient of .82 for students who may be comparable to students found in a typical classroom. For students with more significant problem behaviors, one observer conducting eight 20-minute observations would result in a phi of .81. Also for middle-school students, Ferguson, Briesch, Volpe, and Daniels (2012) examined over how many 5-minute observations an estimate would need to be averaged in order to demonstrate adequate phi coefficients. Coefficients of at least .80 were demonstrated for nine 5-minute observations (45-min total) for each of 3 days, 5 five-minute observations (25-min total) for 4 days, and three 3-minute observations (15-min total) for 5 days.

One study (Wood, Hojnoski, Laracy, & Olson, 2015) explicitly examined the reliability of MTS throughout inter-observer agreement procedures with an early childhood sample. In this study, the authors reported a mean percent agreement value of 95.5% across two research raters, ranging from 91.2% to 100%. Kappa values were reported at .89.

SUBSCALE:  Academic engagement

FORM: n/a

AGE RANGE: Early childhood/Kindergarten

Type of Reliability

Coefficient

SEM

n (examinees)

n

(raters)

Sample Information (including normative data)/Demographics

G Theory (variance component estimates  and resulting G and D coefficients from original and D study)

 

Briesch, Chafouleas, & Riley-Tillman, 2010

Ep2 for one observation per day across:

 

1 day = .50

5 days = .83

10 days = .91

15 days = .93

20 days = .98

100 days = .99

Not reported

12

2

Examinee sample: 12 students. Mean age = 5 years 11 months. White = 10, African-American = 1, Asian = 1. Female = 7, Male = 5.

 

SDO rater sample: 2 researchers, trained with videos to 95% IOA criterion between observers (kappa = .89). Training lasted 8 hours.

See above

 

Briesch, Chafouleas, & Riley-Tillman, 2010

Ep2 for three observations per day across:

 

1 day = .73

5 days = .93

10 days = .96

15 days = .97

20 days = .98

100 days = .99

Not reported

12

2

See above

See above

 

Briesch, Chafouleas, & Riley-Tillman, 2010

Phi for one observation per day across:

 

1 day = .48

5 days = .82

10 days = .90

15 days = .93

20 days = .97

100 days = .99

Not reported

12

2

See above

See above

 

Briesch, Chafouleas, & Riley-Tillman, 2010

Phi for three observations per day across:

 

1 day = .70

5 days = .92

10 days = .96

15 days = .97

20 days = .97

100 days = .99

Not reported

12

2

See above

Interobserver agreement

 

Wood, Hojnoski, Laracy, & Olson, 2015

 

 

Observation period ranged from 10:19 to 19:59 (min:sec), mean of 14 min.

 

Mean IOA (percent agreement) = 95.5%, range = 91.2% - 100%.

 

Kappa = .89

Not reported

24

3

Examinee sample: 24 children. Female = 11, Male = 13. Age range = 38 – 65 months. Mean age = 51 months, SD = 8.30 months. Majority Caucasian. Primary Language of English = 23, Spanish = 1. Special education services for speech/language = 6. Special education services for unidentified needs = 1.

 

Rater sample: 3 researchers. Trained using three training videos with criterion of 85% agreement on 3 consecutive videos met.

 

Briesch, A. M., Chafouleas, S. M., & Riley-Tillman, T. C. (2010). Generalizability and dependability of behavior assessment methods to estimate academic engagement: A comparison of systematic direct observation and direct behavior rating. School Psychology Review, 39(3), 408.

Wood, B. K., Hojnoski, R. L., Laracy, S. D., & Olson, C. L. (2015). Comparison of Observational Methods and Their Relation to Ratings of Engagement in Young Children. Topics in Early Childhood Special Education, 0271121414565911.

SUBSCALE: Academic engagement

FORM: n/a

AGE RANGE: Elementary school

Type of Reliability

Coefficient

SEM

n (examinees)

n

(raters)

Sample Information (including normative data)/Demographics

G Theory (variance component estimates  and resulting coefficients from original and D study)

 

Hintze & Matthews, 2004

 

Ep2 (length of observation = 15 minutes, observers = 1)

 

- 10 days, 2 obs per day = .63

- 10 days, 1 obs per day = .50

- 3 days, 1 obs per day = .25

- 20 days, 2 obs per day = .71

- 40 days, 4 obs per day = .83

Not reported

14

5

Examinee sample: 14 students. 100% in fifth grade. Female = 7, male = 7. Mean age = 12 years 2 months (SD = 1.5 months). Gen ed = 12, special education = 3. Caucasian = 12, African American = 2.

 

Rater sample: 5 school psychology graduate students. 100% female. Trained for 4 hours against master-coded video. All demonstrated >90% agreement with master codes.

See above.

 

Hintze & Matthews, 2004

 

Phi (length of observation = 15 minutes, observers = 1)

 

- 10 days, 2 obs per day = .62

- 10 days, 1 obs per day = .46

- 3 days, 1 obs per day = .25

- 20 days, 2 obs per day = .62

- 40 days, 4 obs per day = .83

Not reported

14

5

Examinee sample: 14 students. 100% in fifth grade. Female = 7, male = 7. Mean age = 12 years 2 months (SD = 1.5 months). Gen ed = 12, special education = 3. Caucasian = 12, African American = 2.

 

Rater sample: 5 school psychology graduate students. 100% female. Trained for 4 hours against master-coded video. All demonstrated >90% agreement with master codes.

Hintze, J. M., & Matthews, W. J. (2004). The Generalizability of Systematic Direct Observations Across Time and Setting: A Preliminary Investigation of the Psychometrics of Behavioral Observation. School Psychology Review, 33(2), 258-270.

SUBSCALE: Academic engagement

FORM: n/a

AGE RANGE: Middle school

Type of Reliability

Coefficient

SEM

n (examinees)

n

(raters)

Sample Information (including normative data)/Demographics

G Theory (variance component estimates  and resulting coefficients from original and D study)

 

Briesch, Volpe, & Ferguson, 2014

Phi (general group)

 

- 2 observers, 20 min period, 5 observations = .87

- 1 observer, 20 min period, 2 observations = .71

- 1 observer, 20 min period, 4 observations = .82

- 1 observer, 20 min period, 10 observations = .91

 

 

Phi (eligible group)

 

- 2 observers, 20 min period, 5 days = .75

- 1 observer, 20 min period, 4 observations = .70

- 1 observer, 20 min period, 8 observations = .81

- 1 observer, 20 min period, 10 observations = .84

Not reported

16

4

Examinee sample: 16 students. 100% in 7th grade. Male = 12, female = 4. 100% ethnic minority group.

 

Two subsamples: general classroom group and eligible for intervention group.

 

Rater sample: 4 school psychology graduate students, trained with videos to 95% IOA criterion between observer and researcher.

G Theory (variance component estimates  and resulting coefficients from original and D study)

 

Ferguson, Briesch, Volpe, & Daniels, 2012

Rater not in model, all observations based on assumption of single rater.

 

2 days, 6 five-minute observations (original study conditions).

 

Ep2 = .71

Phi = .70

 

 

Not reported

20

2

Examinee sample: 20 students. 100% in 7th grade. 11 = male, 9 = female. 100% students of color.

 

Rater sample: 2 school psychology graduate students, trained using three 10-min videos, demonstrating 88% IOA / .74 kappa.

See above.

 

Ferguson, Briesch, Volpe, & Daniels, 2012

Phi

 

- 2 days, 20 five-minute observations = .74

- 3 days, 3 five-minute observations = .71

- 4 days, 2 five-minute observations = .72

- 5 days, 2 five-minute observations = .76

- 3 days, 9 five-minute observations = .80

- 4 days, 5 five-minute observation = 81

- 5 days, 3 five-minute observations = .80

 

Not reported

20

2

Examinee sample: 20 students. 100% in 7th grade. 11 = male, 9 = female. 100% students of color.

 

Rater sample: 2 school psychology graduate students, trained using three 10-min videos, demonstrating 88% IOA / .74 kappa.

See above.

 

Ferguson, Briesch, Volpe, & Daniels, 2012

Ep2 (observations within one day)

 

1 five-minute obs = .46

2 five-minute obs = .63

3 five-minute obs = .72

4 five-minute obs = .77

5 five-minute obs = .81

6 five-minute obs = .83

7 five-minute obs = .85

8 five-minute obs = .87

9 five-minute obs = .88

10 five-minute obs = .89

11 five-minute obs = .90

12 five-minute obs = .91

 

Not reported

20

2

Examinee sample: 20 students. 100% in 7th grade. 11 = male, 9 = female. 100% students of color.

 

Rater sample: 2 school psychology graduate students, trained using three 10-min videos, demonstrating 88% IOA / .74 kappa.

See above.

 

Ferguson, Briesch, Volpe, & Daniels, 2012

Phi (observations within one day)

 

1 five-minute obs = .43

2 five-minute obs = .61

3 five-minute obs = .70

4 five-minute obs = .75

5 five-minute obs = .79

6 five-minute obs = .82

7 five-minute obs = .84

8 five-minute obs = .86

9 five-minute obs = .87

10 five-minute obs = .89

11 five-minute obs = .89

12 five-minute obs = .90

 

Not reported

20

2

Examinee sample: 20 students. 100% in 7th grade. 11 = male, 9 = female. 100% students of color.

 

Rater sample: 2 school psychology graduate students, trained using three 10-min videos, demonstrating 88% IOA / .74 kappa.

 

Briesch, A. M., Volpe, R. J., & Ferguson, T. D. (2014). The influence of student characteristics on the dependability of behavioral observation data. School Psychology Quarterly, 29(2), 171–181. http://doi.org/10.1037/spq0000042

Ferguson, T. D., Briesch, A. M., Volpe, R. J., & Daniels, B. (2012). The influence of observation length on the dependability of data. School Psychology Quarterly, 27(4), 187–197. http://doi.org/10.1037/spq0000005

 

Validity: Convincing Evidence

Describe and provide evidence demonstrating the validity of your progress monitoring tool. In your description, include information about the intended purpose of your tool and an argument to demonstrate why the validity data available are aligned with and support the tool’s validity given its intended purpose.

As is true for information regarding sensitivity to change, validity evidence for estimates of academic engagement derived from MTS 15-second procedures is sparse, given that time-sampling procedures in general and MTS specifically are often viewed as a gold standard measure. However, recently, Wood, Hojnoski, Laracy, and Olson (2015) examined error of MTS-derived estimates of prevalence when compared to those derived from continuous observation. MTS was found to be the least error-prone estimate when compared to PI and WI sampling. Absolute mean error (across students) was 6.28%, while mean measurement error that maintained the properties of over/underestimation was -3.35%. The Pearson correlation coefficient between MTS-derived estimates and those from continuous observation was .83, and Spearman’s rho, a non-parametric rank-order correlation coefficient, was .71 when MTS-derived estimates were compared to expert rankings of student engagement. In a less quantitative study, Saudargas and Zanolli (1990) used visual analysis to examine patterns of engagement estimates derived from both continuous observation and MTS. In almost all cases, trends between both data patterns were consistent across days, even when level was discrepant. Quantitative results reported by authors indicates that there was a less than 9% discrepancy identified between scores derived from MTS and continuous observation for 18 of 22 observations (82%).

SUBSCALE: Academic engagement

FORM: n/a

AGE RANGE: Early education

Type of Validity

Test or Criterion

Coefficient

n

(examinees)

n

(raters)

Sample Information (including normative data)/Demographics

Convergent/ accuracy (Comparison of MTS-derived data and those obtained from continuous duration recording [CDR])

 

Wood, Hojnoski, Laracy, & Olson, 2015

n/a

Absolute mean measurement error against CDR = 6.28%

 

Mean measurement error against CDR = -3.35%

 

Relative percent difference with CDR. Average = 8.31%, SD = 7.65, Range = 0.49% to 31.59%.

 

Correlation (across students) with CDR = .83, p < .001

Absolute difference with CDR. Average = .06, SD = .05. T-test sig at p < .001.

Correlations between MTS-derived ranking ordering of student engagement and teacher- or expert-nominated rankings of student  engagement, using Spearman’s rho.

- Rho (teacher ranking) = .34 (p = .108)

- Rho (expert ranking) = .71 (p < .001)

24

3

Examinee sample: 24 children. Female = 11, Male = 13. Age range = 38 – 65 months. Mean age = 51 months, SD = 8.30 months. Majority Caucasian. Primary Language of English = 23, Spanish = 1. Special education services for speech/language = 6. Special education services for unidentified needs = 1.

 

Rater sample: 3 researchers. Trained using three training videos with criterion of 85% agreement on 3 consecutive videos met.

Wood, B. K., Hojnoski, R. L., Laracy, S. D., & Olson, C. L. (2015). Comparison of Observational Methods and Their Relation to Ratings of Engagement in Young Children. Topics in Early Childhood Special Education, 0271121414565911.

SUBSCALE:  Academic engagement

FORM: n/a

AGE RANGE: Elementary

Type of Validity

Test or Criterion

Coefficient

n

(examinees)

n

(raters)

Sample Information (including normative data)/Demographics

Convergent/ accuracy (Visual analysis of discrepancy between MTS-derived scores and those obtained using continuous observation)

 

Saudargas & Zanolli, 1990

n/a

Mostly based upon visual analysis from 20-min observation periods, which suggests similar patterns of behavior for most days. In almost all cases, trend followed across days, even when level is discrepant. Quantitative results reported by authors: less than 9% discrepancy identified between scores derived from MTS and continuous observation for 18 of 22 observations (82%).

16

2

Examinee sample: 16 students. Grade 1 = 2, Grade 2 = 1, Grade 3 = 5, Grade 4 = 8.

 

Rater sample: 2 graduate students. Trained using videotapes.

Saudargas, R. A., & Zanolli, K. (1990). Momentary Time Sampling as an Estimate of Percentage Time: A Field Validation. Journal Of Applied Behavior Analysis, 23(4), 533-37.

 

Disaggregated Reliability and Validity Data: Data Unavailable

Assessment Format: Direct observation

Rater / Scorer: External observer

Usability Study Conducted: Yes

The broader class of systematic direct observation (SDO) methodologies, which includes SDO, has been examined in a combined usability and social validity study conducted by Riley-Tillman, Chafouleas, Briesch, and Eckert (2008). The total sample size across two samples of school psychologists was 191 (92 in Study 1, 99 in Study 2). Most respondents worked in public schools (83.7%, 88.9% by Study), were female (76.1%, 74.7%), practiced with a “Masters plus 30” credential (48.9%, 41.4%), and were fairly evenly split across years in practice, urbanicity, and age group served. Results from responses to 16 Likert-type-scaled items (1 = strongly disagree, 6 = strongly agree) indicated that SDO procedures were generally perceived as acceptable to very acceptable (mean scores for positively-worded items were 4.4 to 5.1 across samples). Items specific to the time and intrusiveness upon teachers/staff, school psychologists, and the general classroom environment were rated from a mean of 2.0 to 2.8 using the scale described above, indicating low to moderate feelings towards the intrusiveness of procedures. To wit, each of these items began with the stem “The use of this technique was overly intrusive on…”. Mean responses to the item “This technique provides a feasible method of assessing the effectiveness of an intervention” were 4.7 and 4.8 across samples.

Riley-Tillman, T., Chafouleas, S., Briesch, A., & Eckert, T. (2008). Daily Behavior Report Cards and Systematic Direct Observation: An investigation of the acceptability, reported training and use, and decision reliability among school psychologists. Journal of Behavioral Education, 17(4), 313-327. doi:10.1007/s10864-008-9070-5