Behavior Intervention Monitoring Assessment System (BIMAS)

Scale: Negative Affect

Descriptive Information Usage Acquisition and Cost Program Specifications and Requirements Training

The Behavior Intervention Monitoring Assessment System (BIMAS) is a measure of social, emotional, behavioral, and academic functioning in children and adolescents between the ages 5 to 18 years. The BIMAS Standard includes 34 change-sensitive items that can be used for universal screening and for assessing response to intervention. The BIMAS Standard items were developed based on years of research and a scientific model for item selection called Intervention Item Selection Rules (IISRs; Meier, 1997, 1998, 2000, 2004). The BIMAS Flex is an extended component of the BIMAS. Positively or negatively worded items can be selected to customize treatment goals for individual students and to create three-to-five item mini Flex assessments for frequent progress monitoring purposes. The BIMAS has three Behavioral Concern scales—Conduct, Negative Affect, and Cognitive/Attention, and two Adaptive Scales— Social and Academic Functioning. To facilitate between-rater comparisons, the BIMAS offers a parallel set of items across three rater forms: Teacher (34 items; 5-18 years); Parent (34 items; 5-18 years); Self-Report (12-18 years); and Clinician (31 items; 5-18 years; non-norm referenced; individualized item responses only).

BIMAS is intended for use for students in kindergarten through twelfth grade. The program is intended for use with students in general education, students with disabilities, and English language learners. It was designed to be a brief, repeatable multi-informant measure for behavioral screening, progress monitoring, outcome assessment, and program evaluation within the RTI framework.

The Behavioral Concern scales identify risks in Conduct, Negative Affect, and Cognitive/Attention domains. The Adaptive scales include mainly items with positive content and are used to assess increases in adaptive functioning that are often the targets of intervention in the Social and Academic Functioning domains.

The tool provides information on student behavior in English.

MHS, Inc.

3770 Victoria Park Avenue

Toronto, ON, M2H 3M6

Phone: 1-800-268-6011

Website: www.mhs.com

The BIMAS is sold by student annual subscription which includes the unlimited use of BIMAS Standard as well as unlimited test scoring and/or report generation for the entire year per individual student. There is also an unlimited number of user accounts at different access levels that can be set up on the BIMAS Online to facilitate both sharing of data between school staff as well as to ensure the security of data at various levels within a school district (e.g., teacher--class level access; school principal--school level access; district superintendent--district level access). There is a minimum purchase of 25 students required for the annual subscription. Orders between 1 to 9,999 subscriptions is priced at $4.00 per student per year. Orders between 10,000 to 39,999 is sold at a discounted rate of $3.00 per student per year, and an order for 40,000 and above are discounted at rate of $2.00 per student per year. The pricing for clinical clients (non-school district) is at $4.00 per client per year. In addition, each BIMAS Technical Manual (paper) costs $80.00. Annual subscription includes complimentary customer/technical support as well as access to all Help Files in PDF format available on the BIMAS Online.

This program can be rated or scored by a general education teacher, special education teacher, parent, child, school psychologist, or a clinician. Interpretation of BIMAS scores requires the completion of graduate-level courses in tests and measurement at a university or equivalent documented training.

The recommended administration setting is a general education classroom, special education classroom, school office, recess, lunchroom, or home.

BIMAS is designed for use with large groups, small groups, and individual students.

The BIMAS uses a rating scale for the assessment format.

Administration time per student is 5-10 minutes with additional scoring time per student being 2-5 minutes.

There is no limit to how many administrators can rate a student concurrently.

Alternate forms are available.

Less than 30 minutes of training is required for the rater.

There are no minimum qualifications for the rater. However, results obtained from BIMAS assessments should always be interpreted by an assessor with a B-level qualification with completed graduate-level courses on test and measurement at a university or has received equivalent documented training.

Training manuals and materials are field-tested and included in the cost of the intervention.

Ongoing technical support regarding the use of the BIMAS as an assessment tool as well as the use of the BIMAS Online are available via the Publisher, Multi-Health Systems, Inc.’s Customer Service and Technical Support.

 

Sensitive to Student Change: Unconvincing Evidence

 

During the standardization of the BIMAS, a validity study was conducted to monitor the effectiveness of using the BIMAS as a progress monitoring tool. Specifically, a few sites that participated in the data collection also agreed to offer pre-and post-treatment data on some of the students who were receiving mental health services. In one such study, the BIMAS was used to monitor the progress of 46 students who participated in an anger management intervention. The sample consisted of 32 males and 14 females. There were 30 African American, 2 Hispanic, and 14 White students. The students’ ages ranged from 12 to 18 years old. The BIMAS was used to collect data prior to, and after, the completion of the intervention. Mean time interval in days between Time 1 and Time 2 administration was 33.79 (SD = 3.65) on the Teacher form, 31.09 (SD = 11.25) on the Parent form, and 34.71 (SD = 1.36) on the Self-Report. The students’ behaviors were rated by their teachers, parents, and by the youth themselves. Results from this study provide evidence that the BIMAS is sensitive to change, as it effectively captured significant decreases in the Behavioral Concern scales as well as significant increases in the Adaptive scales. Effect sizes (presented as Cohen’s d ratios) revealed a large effect (i.e., all effect sizes ≥ |0.8|) across all rater-types and all scales, with the exception of the Social scale on the BIMAS–T where a moderate effect was found (d = −0.7). All effect sizes were in the expected direction, indicating an improvement in functioning: positive effect sizes were found on the Behavioral Concern scales (i.e., scores decreased from pre-test to post-test), and negative effect sizes were found on the Adaptive scales (i.e., scores increased from pre-test to post-test).

For the Negative Affect subscale, Cohen's d equaled 1.0 for the Teacher form, 1.7 for the Parent form, and 1.8 for the Self-Report form.     

Levels of Performance Specified: Convincing Evidence

 

Are levels of performance specified in your manual or published materials?

Yes

Specify the levels of performance:

The methods for evaluating levels of performance as indexed by standard scores on the BIMAS Standard form (the standardized form for both screening and progress-monitoring) as well as the BIMAS Flex form (the customizable form for progress monitoring 1-5 items mini-assessments) are all specified in the manual in the Interpretation and Case Study chapters. The manual provides specific examples.  

Describe how the levels of performance are used for progress monitoring:

Scoring in this protocol, scale descriptors/interpretive guidelines in the categories of High Risk, Some Risk, and Low Risk are used in the interpretation of standard scores or level of performance on the Negative Affect scale. The manual also lists some of the common characteristics of youth with high scores on the BIMAS Negative Affect scale. In addition, the BIMAS also offers Item Descriptors (Concern, Mild Concern, No Conern) for the interpretation of item-level scores for goal-setting and customized intervention design/monitoring. These Item Descriptors have been developed by computing the normative sample cumulative frequencies, as well as means and standard deviations for every Negative Affect scale item.  

What is the basis for specifying levels of performance?

Norm-referenced. At the scale level, Negative Affect scale standard score that fall in the top 15% of the normative group distribution are considered as demonstrating a high level concern. At the item level, the approach used to designate the risk levels of a BIMAS Behavioral Concern scale item score is similar to the technique used by Naglieri, McNeish, and Bardos (1991), Naglieri, LeBuffe, and Pfeiffer (1994), and LeBuffe and Naglieri (2003); they all suggested that an individual item score that falls in the top 15% of the normative group distribution (e.g., exceeds the mean normative item score plus one standard deviation) can be considered problematic. As such, a Concern item score denotes an item response that is more than 1 SD above the mean of the normative sample or higher than or equal to the 85th percentile. A Mild Conern item scores is equal to 1 SD above the mean or between the 75th - 84th percentile. Finally, a No Concern item score on any of the three BIMAS Behavioral Concern scales denotes an item score that is less than M + 1SD or lower than the 75th percentile.      

If norm-referenced, describe the normative profile:

Representation: National & Local

Date: 2009

Number of States: 30

Regions: 4

Size: 3,500

Gender: 50% Male, 50% Female

SES:

46.6% Parents graduated high school

27.2% Parents had 1-3 years of college

26.2% Parents had 4 or more years of college

Race/Ethnicity:

61.9% White, Non-Hispanic

15.7% Black, Non-Hispanic

61.9% Hispanic

3.8% Asian/Pacific Islander

3.5% Other

15.7% Unknown

If criterion-referenced, describe procedures for specifying levels of performance:

Not applicable.

Describe any other procedures for specifying levels of performance:

None

 

Data to Support Intervention Change: Unconvincing Evidence

Are validated decision rules for when changes to the intervention need to be made specified in your manual or published materials?

Yes

Specify the decision rules here: 

Decision rules for when changes in response to intervention have occurred are based on visual display of data patterns in time series graphs, Reliable Change Index (RCI) as well as effect size. 

What is the evidentiary basis for these decision rules?

The BIMAS provides three ways to gauge the amount of improvement of repeated measures:

  1. Visual displays

Time series graphs which illustrate variability in Academic scale scores over a period of time.

  1. Reliable change index (RCI)

Based on the Jacobson and Taruax (1991) method, the RCI takes into account the standard error of difference between two scores when determining whether a change in T-scores between two administrations is statistically significant (the standard error of difference is computed with standard error of prediction estimates). The RCI values needed to establish statistical significance, when comparing Time 1 to Time 2 scores, were calculated for each scale on the BIMAS Standard, at the p < 0.10 level of significance. The p < 0.10 criterion was used in order to ensure that important changes (both increases and decreases in scores) are not missed. If the absolute difference between two administrations is equal to or greater than the corresponding RCI value, then the amount of change in that area of functioning is statistically significant. This means that the measured change is a function of an actual difference between test scores and not the result of random fluctuations in behavior or error in measurement. This calculation method is integrated into the BIMAS Standard Individual Progress Report: Significant Change Over Time.

  1. Effect size (ES) estimates.

The strength of change, based on an individual’s raw scores is computed between two BIMAS Conduct scores over time using Clement’s (1999) formula:

ES = (Time 2 Mean – Time 1 Mean) / Standard Deviation at Time 1

where Time 1 Mean denotes the mean of all item scores for an individual on a BIMAS scale at time 1, which is typically the baseline BIMAS assessment prior to intervention, and Time 2 Mean denotes the mean of all item scores for the same individual on the same scale at time 2 (a follow-up assessment during a course of intervention or a follow-up assessment at the termination of intervention). The BIMAS Standard Individual Progress Report: Significant Change Over Time provides the option to display ES and also provides the interpretive guidelines based on the criteria created by Clement (1999) that ranges from Much Worse, Worse, No Change, Improved, to Much Improve.

Data to Support Intervention Choice: Data Unavailable

Are validated desicion rules for what intervention(s) to select specified in your manual or published materials?

No

Specify the decision rules here:

N/A

What is the evidentiary basis for these decision rules?

N/A

Reliability: Partially Convincing Evidence

 

McDougal, J. L., Bardos, A. N., & Meier, S. T. (2011). Behavior Intervention Monitoring Assessment System Technical Manual. Toronto, Canada: Multi-Health Sytems.   

Subscale(s): Negative Affect

Forms: Teacher / Parent / Self-Report

Age Range: 5-18 / 5-18 / 12-18

Type of Reliability Coefficient SEM n
(examinees)
n
(raters)
Sample Information / Demographics

Teacher

Alpha

0.85

3.94

1,361

1,361

The normative sample included ratings of 50 males and 50 females at each age (from 5 through 18 years). Teachers completed the BIMAS–T for a normative sample of 1,400 youth. All of the teachers had known the students they were rating for at least 1 month. The sample characteristics were compared to the U.S. population (based on the 2000 U.S. Census report) on race/ethnicity and geographic region. The collected data were very similar to the U.S. Census in terms of race/ethnicity; however, some discrepancies existed between the actual collected data and Census targets for geographic region. To address these discrepancies, the sample was weighted through statistical procedures so that the weighted sample closely matched the U.S. Census statistics both in terms of race/ethnicity and geographic region distribution.

Parent 

Alpha

0.82 4.28 1,400 1,400

The BIMAS–Parent form was completed for a normative sample of 1,400 children and adolescents. The majority (n = 1,116; 79.7%) of the BIMAS–P normative sample comprised assessments completed by the youth’s biological mother, while the remaining assessments were completed by the youth’s biological father (n = 164; 11.7%) or by other significant adults (including non-biological parents and other relatives; n = 120; 8.5%). The 1,400 rated youth included 50 males and 50 females at each age (for ages 5 through 18 years). A similar statistical weighting procedure described in the Teacher normative sample section was applied to the BIMAS–P sample to correct for discrepancies in PEL and region. The resulting weighted sample therefore closely matched the U.S. Census statistics in terms of race/ethnicity, parent education level, and geographic region distribution.

Self-Report

Alpha

0.85 4.28 703 703

The BIMAS–SR normative sample consisted of 700 youth aged 12 to 18 years old (350 males, 350 females, 100 youth in each age group by year). Table 9.6 describes the sample’s racial/ethnic distribution, which very closely approximated the U.S. Census. A similar weighting statistical procedure was applied to the BIMAS–SR sample so that the normative sample regional representation would be a close match to U.S. Census data.

 

McDougal, J. L., Bardos, A. N., & Meier, S. T. (2011). Behavior Intervention Monitoring Assessment System Technical Manual. Toronto, Canada: Multi-Health Sytems.   

Subscale(s): Negative Affect

Forms: Teacher / Parent / Self-Report

Age Range: 5-18 / 5-18 / 12-18

Type of Reliability Coefficient SEM n
(examinees)
n
(raters)
Sample Information / Demographics

Teacher 

Test-retest

0.85

3.86

112

112

 

Parent 

Test-retest

0.91 3.05 79 79

 

Self-Report

Test-retest

0.87 3.58 52 52

 

 

McDougal, J. L., Bardos, A. N., & Meier, S. T. (2011). Behavior Intervention Monitoring Assessment System Technical Manual. Toronto, Canada: Multi-Health Sytems.   

Subscale(s): Negative Affect

Forms: Teacher to Parent

Age Range: 12-18

Type of Reliability Coefficient SEM n
(examinees)
n
(raters)
Sample Information / Demographics

Inter-rater

0.86

 

162

162

 

 

McDougal, J. L., Bardos, A. N., & Meier, S. T. (2011). Behavior Intervention Monitoring Assessment System Technical Manual. Toronto, Canada: Multi-Health Sytems.   

Subscale(s): Negative Affect

Forms: Teacher to Self-Report

Age Range: 12-18

Type of Reliability Coefficient SEM n
(examinees)
n
(raters)
Sample Information / Demographics

Inter-rater

0.64

 

162

162

 

 

McDougal, J. L., Bardos, A. N., & Meier, S. T. (2011). Behavior Intervention Monitoring Assessment System Technical Manual. Toronto, Canada: Multi-Health Sytems.   

Subscale(s): Negative Affect

Forms: Parent to Self-Report

Age Range: 12-18

Type of Reliability Coefficient SEM n
(examinees)
n
(raters)
Sample Information / Demographics

Inter-rater

0.69

 

162

162

 

 

Validity: Partially Convincing Evidence

Content Validity of the BIMAS

The content validity evidence of the BIMAS began with early studies by Meier (1997, 1998, 2000, 2004) that included the examination of the relationships of specific items to external criteria (see Meier, 2000), further item/content reviews by the authors, and feedback by colleagues working in public schools and community mental health centers. These efforts resulted in a pool of items that were pilot tested in small scale studies. Following these earlier studies (e.g., Lerew, 2004) and reviews of the literature, a set of items were developed and proposed to represent behaviors that can be classified into: (a) externalizing behaviors (Conduct scale); (b) internalizing behaviors (Negative Affect scale); (c) behaviors that are related to attentive skills (Cognitive/Attention scale); and (d) items related to adaptive skills that were further divided into areas of social functioning (Social scale) and academic functioning (Academic Functioning Scale). 

 

Construct Validity of the BIMAS

Establishing the construct validity of the BIMAS subscales involved a multistep process that began with a series of confirmatory factor analyses (CFAs).  Once the constructs representing the BIMAS subscales were formed, the establishment of their relationship with related scales on the Conners Comprehensive Behavior Rating Scales™ (CBRS™; Conners, 2008), a behavior rating scale measuring similar constructs was examined. In brief, ratings were obtained for three samples of non-clinical youth by Teachers (N = 112), Parents (N = 127), and Youth (N = 108) who completed the respective BIMAS and Conners CBRS rating forms (i.e., Conners CBRS−Teacher [Conners CBRS−T]; Conners CBRS−Parent [Conners CBRS−P]; Conners CBRS−Self-Report [Conners CBRS−SR]).  The samples performed similarly across the two scales. The obtained correlations between the BIMAS and the Conners CBRS scales were significant (p < 0.01) and of moderate to large in size, providing evidence that the BIMAS is measuring the constructs it was designed to measure. The results of these correlations will be presented separately for each of the related BIMAS subscales below.

 

The BIMAS as a Screening Tool/Discriminant Validity

Next, once the BIMAS, and CBRS relationships were obtained supporting the constructs measured by the BIMAS, the performance of clinical and non-clinical samples was compared (see Table A below for a description of the clinical groups).  Discriminant function analyses (DFA) were performed next for each subscale including the calculation of overall correct classification rates to support the claim of the use of the BIMAS as a screening tool of behavior and/or emotional difficulties. The DFA were conducted using the BIMAS T-scores cut-off of greater than T = 60 for Behavioral Concerns scales and less than T = 40 for Adaptive Skills scales for the clinical sample of students with a Disruptive Behavior Disorder. According to the National Center on Response to Intervention “In screening, attention should focus on fidelity of implementation and selection of evidence based tools, with consideration for cultural and linguistic responsiveness and recognition of student strengths” (http://www.rti4success.org/essential-components-rti/universal-screening). The classification analysis findings presented for each scale were analyzed using both the BIMAS proposed cut-off scoring rules as well as calculated by controlling for ethnicity and region. Classification efficiency values calculated included:

•  Overall Correct Classification Rate. The percentage of correct group classifications made using the BIMAS T-scores.

•  Sensitivity. The percentage of clinical cases correctly predicted by the BIMAS T-scores to belong to the clinical group.

•  Specificity. The percentage of normative cases correctly predicted by the BIMAS T-scores to belong to the normative group.

•  Positive Predictive Power. The percentage of youth identified by the BIMAS T-scores as having a clinical condition who, based on previous diagnosis, actually have a clinical condition.

•  Negative Predictive Power. The percentage of youth identified by the BIMAS T-scores as not having a clinical condition who actually do not have a clinical condition.

 

Table A. The Clinical Samples Used for the BIMAS Validation Studies

Clinical Groups

Teacher

Parent

Self-Report

Total

Disruptive Behavioral Disorders

123

70

65

258

Attention Deficit/Hyperactivity Disorder

109

117

89

315

Anxiety

55

67

56

178

Depression

60

73

62

195

Pervasive Developmental Disorders

95

86

65

246

Learning Disorders

45

n/a

n/a

45

Developmental Delay

30

n/a

n/a

30

Other Clinical

21

54

13

88

Total

538

467

350

1,355

Note:  Disruptive Behavior Disorders (DBD; includes Conduct Disorder, Oppositional Defiant Disorder); Attention-Deficit/Hyperactivity Disorder (ADHD); Anxiety (includes Generalized Anxiety Disorder, Obsessive Compulsive Disorder, and Social Phobia); Depression (includes Major Depressive Disorder and Dysthymia); Pervasive Developmental Disorders (PDD; includes Autistic Disorder and Asperger’s Disorder); Learning Disorders (LD; teacher report only); and Developmental Delay (DD; teacher report only).

 

The BIMAS as a Progress Monitoring Tool

A second major claim of the BIMAS is its use a progress monitoring tool.  A study was conducted with a group of students receiving treatment through an anger management intervention.  Pre-post ratings were gathered from the Clinician, Parents, and the students themselves to demonstrate the use of the BIMAS as a progress monitoring tool. 

In summary, the validity evidence for each of the five BIMAS scales will be presented as follows.  The performance and relationship between the BIMAS and the CBRS scales will be presented first.  Next, the performance of clinical and regular education sample of students will be presented, followed by the classification efficiency of the BIMAS scale as a screening tool.  Finally the findings of a study that utilized the BIMAS as a progress monitoring tool will be presented.

Validity Evidence for the BIMAS Negative Affect Scale: Results from Validity Studies   

Table B below describes the scores on the BIMAS Negative Affect scale and the mean scores on two Conners CBRS scales that assess negative mood.  The mean score performance across the two behavior rating scales was similar while the scale correlations were moderate to strong; across the three rater-types, ranging from 0.38 to 0.70 (all p < 0.01), with the highest convergence occurring for the parent ratings. 

 

Table B.  Descriptive Statistics on the BIMAS and the CBRS

Scales

Teacher

N = 112

Parent

N =127

Self-Report

N =108

Mean

SD

Mean

SD

Mean

SD

BIMAS Negative Affect scale

48.5

7.9

49.2

10.1

50.8

10.2

Conners CBRS Scale

Emotional Distress

48.7

9.4

50.6

13.7

51.0

10.6

DSM-IV Major Depressive Episode

49.8

9.1

50.0

13.1

52.0

11.5

 

Table C.  Correlations Between the BIMAS Standard Negative Affect Scale T-scores and Relevant Conners CBRS Scales*

Conners CBRS Scale

BIMAS Conduct Scale

Teacher

Parent

Self-Report

Emotional Distress

0.47

0.70

0.54

DSM-IV Major Depressive Episode

0.38

0.62

0.56

Note:  Adapted from Table 11.10 in McDougal, J. L., Bardos, A. N., & Meier, S. T. (2011).  Behavior Intervention Monitoring Assessment System (BIMAS) Technical Manual. Toronto, Canada: Multi-Health Sytems Inc.

All correlations significant at p < 0.01 (2-tailed).

Table D below describes the clinical samples’ mean performance on the BIMAS Teacher, Parent, and Self-Report rating forms and compares their performance against the normative mean of 50 (SD = 10).  Cohen’s d values are also reported demonstrating the differences of the groups.  Shaded cells indicate the largest effect size across all clinical groups. As expected, the Anxiety and Depression group means showed the biggest difference from the norm with the largest effect sizes on the Negative Affect scale, indicating the efficacy of the Negative Affect scale in screening for affect-related internalizing problems. Discriminant analysis findings describing the overall classification and other screening efficiency indexes is presented in Table E using a sample of children diagnosed with Depression & Anxiety Disorders, followed by the findings of similar analyses controlling for ethnicity and region in Table F. 

Table D.  Descriptive Statistics (T-scores) of the Different Clinical Samples and Effect Sizes across the BIMAS Rating Forms on the Negative Affect Scale

Clinical Groups

Teacher

Parent

Self-Report

Disruptive Behavioral Disorders

M

65.6

69.1

57.0

SD

10.2

9.3

8.3

Cohen’s d

1.6

1.9

0.7

Attention Deficit/Hyperactivity Disorder

M

63.6

57.9

55.7

SD

8.4

9.1

8.8

Cohen’s d

1.4

0.8

0.6

Anxiety

M

71.3

59.6

60.8

SD

10.4

10.9

8.1

Cohen’s d

2.1

1.0

1.1

Depression

M

72.0

63.6

65.2

SD

8.9

9.2

8.3

Cohen’s d

2.2

1.4

1.5

Pervasive Developmental Disorders

M

64.1

55.2

60.0

SD

11.6

10.1

0.5

Cohen’s d

1.4

0.5

1.0

Learning Disorders

 

M

64.5

 

 

SD

10.2

 

 

Cohen’s d

1.4

 

 

Developmental Delay

M

71.4

 

 

SD

10.0

 

 

Cohen’s d

2.1

 

 

Note: Values adapted from Tables 11.17, 11.22 & 11.27 from the BIMAS Technical Manual.

 

Table E.  Efficiency of Classification of the Negative Affect Scale for a Sample of Children with Depression and Anxiety Disorders.

 

Teacher

Parent

Self

Overall Correct Classification

88.3%

79.7%

54.1%

Sensitivity

88.7%

77.1%

17.8%

Specificity

88.0%

82.0%

97.0%

Positive Predictive Power

85.0%

80.0%

87.5%

Negative Predictive Power

91.0%

79.4%

50.0%

 

Table F.  Efficiency of Classification of the Negative Affect scale for a Sample of Children with Depression and Anxiety Disorders Controlling for Ethnicity and Region

 

Teacher

Parent

Self

Overall Correct Classification

88.3%

82.1%

77.1%

Sensitivity

88.7%

82.3%

68.6%

Specificity

88.0%

82.0%

87.0%

Positive Predictive Power

85.0%

79.9%

86.2%

Negative Predictive Power

91.0%

84.2%

70.2%

 

As the classification statistics in Tables E & F show, the BIMAS Negative Scale demonstrates good sensitivity and specificity for internalizing behaviors on both Teacher and Parent rater forms. Self-Report showed a lower sensitivity of 17.81% but a high specificity of 97.0%. When used as screening measure, overall correct classification is at 54.1% and false positive is highly unlikely for a youth with a normal range of internalizing behaviors. Speculations on the lower sensitivity might include less willingness of youth to report affective problems on self-reports.

Taken together, the BIMAS Negative Affect scale showed efficacy in screening for internalizing problems with the biggest difference from the norm with the largest effect size found in the Anxiety and Depression group means as well as adequate sensitivity for classifying children with internalizing type problems.

To validate the use of the BIMAS as a progress monitoring tool, a study was conducted with a group of 46 children participating in an anger management intervention. Mean time interval between pre-test and pro-test ranged from was 31.09 to 34.71 days across the three rater forms. As the results in Table G and Figure 1 showed, the BIMAS was able to capture significant decreases in the Negative Affect scale scores across all raters with large 1.0 to 1.8 effect sizes from pre-to post-treatment, demonstrating the sensitivity of items on the BIMAS Negative Affect scale in detecting changes in response to intervention as well as the valid use of BIMAS as a progress monitoring tool.

 

Table G.  Anger Management Treatment Group: Pre-to Post-Treatment BIMAS Standard Negative Affect Scale T-scores

 

Teacher

Parent

Self

Pre

M

63.0

60.8

59.2

SD

10.7

9.5

9.8

Post

M

53.9

47.1

44.6

SD

7.7

6.9

6.5

t

6.6

10.4

11.5

Cohen’s d

1.0

1.7

1.8

Note. N = 46. All t values significant, p < 0.01.

Disaggregated Reliability and Validity Data: Partially Convincing Evidence

Disaggregated Reliability

Age Group

Teacher Ratings

Parent Ratings

Self-Report

Male

Female

Total

Male

Female

Total

Male

Female

Total

5-6

100

100

200

100

100

200

     

7-9

150

150

300

150

150

300

     

10-11

100

100

200

100

100

200

     

12-13

100

100

200

100

100

200

100

100

200

14-16

150

150

300

150

150

300

150

150

300

17-18

100

100

200

100

100

200

100

100

200

Total

700

700

1400

700

700

1400

350

350

700

 

Cronbach’s Alpha: Standard by Age Group (Combined Gender) and by Gender and Age

Gender & Age Group

Teacher

Parent

Self-Report

Total Sample

0.85

0.82

0.85

Combined Gender

5-6

0.79

0.69

 

7-9

0.75

0.77

 

10-11

0.84

0.80

 

12-13

0.87

0.85

0.85

14-16

0.87

0.86

0.85

17-18

0.86

0.82

0.85

Male

5-6

0.75

0.59

 

7-9

0.76

0.81

 

10-11

0.82

0.85

 

12-13

0.91

0.87

0.83

14-16

0.89

0.86

0.83

17-18

0.87

0.81

0.86

Female

5-6

0.84

0.76

 

7-9

0.74

0.74

 

10-11

0.86

0.67

 

12-13

0.91

0.82

0.86

14-16

0.84

0.86

0.86

17-18

0.86

0.83

0.86

 

Standard Error of Measurement (SEM): Standard T-scores by Age Group (Combined Gender) and by Gender and Age

Gender & Age Group

Teacher

Parent

Self-Report

Total Sample

3.94

4.28

4.28

Combined Gender

5-6

3.83

5.22

 

7-9

4.30

4.55

 

10-11

3.65

4.37

 

12-13

3.76

3.95

3.94

14-16

4.22

3.82

4.05

17-18

4.07

4.47

4.07

Male

5-6

4.26

5.87

 

7-9

4.19

4.40

 

10-11

4.05

3.80

 

12-13

3.27

3.75

4.13

14-16

4.29

3.78

4.36

17-18

3.83

4.64

4.15

Female

5-6

3.08

4.34

 

7-9

4.64

4.77

 

10-11

3.26

5.30

 

12-13

4.33

4.56

4.16

14-16

4.59

3.93

3.76

17-18

4.29

4.45

3.90

 

Since the validity analyses were computed using T-scores which were built in the norms controlling for age and gender, disaggregated validity data are not available. Multivariate Analyses of Variance (MANOVAs) were employed to examine the relationships between gender and age, and BIMAS™ Standard raw scale scores. Results at the multivariate level revealed significant main effects for Age and Gender, as well as for the Age × Gender interaction for all three raters. At the univariate level, Age was found to significantly affect all scales (see Table G.2), while Gender significantly affected the majority of the scales. These main effects were qualified by several significant Age × Gender interactions. 

 

Age and Gender Effects: Multivariate Results

Rater

Independent Variable

Wilk’s Lambda

F

df

p

Partial ƞ2

Teacher

Age

0.71

16.81

25, 4381.2

<0.001

0.07

Gender

0.86

39.49

5, 1179

<0.001

0.14

Age X Gender

0.86

7.38

25, 4381.3

<0.001

0.03

Parent

Age

0.85

9.38

25, 5113.1

<0.001

0.03

Gender

0.95

15.21

5, 1376

<0.001

0.05

Age X Gender

0.73

3.84

25, 5113.1

<0.001

0.01

Self-Report

Age

0.72

16.81

24, 4381.3

<0.001

0.07

Gender

0.92

11.81

5, 658

<0.001

0.08

Age X Gender

0.86

7.38

25, 4381.3

<0.001

0.03

 

Age Effects: Univariate Results

Rater

F

df

p

Partial ƞ2

Teacher

4.82

5, 1183

<0.001

0.02

Parent

9.68

5, 1380

<0.001

0.03

Self-Report

8.16

2, 662

0.01

0.02

 

Age Effects: Univariate Results

Rater

F

df

p

Partial ƞ2

Teacher

7.17

1, 1183

0.008

0.01

Parent

1.31

1, 1380

0.253

0.00

Self-Report

1.07

1, 662

0.301

0.00

 

Age x Gender Effects: Univariate Results

Rater

F

df

p

Partial ƞ2

Teacher

4.29

5, 1183

0.001

0.02

Parent

5.28

5, 1380

<0.001

0.02

Self-Report

2.31

2, 622

0.100

0.01

 

Analyses were also conducted to determine if similar scores would be obtained across the race/ethnic groups from the normative sample to determine generalizability of the BIMAS. In general, the more similar the results, the more utility the BIMAS has for use with diverse populations. Multivariate analyses of covariance (MANCOVAs) were conducted to analyze the effects of race/ethnicity on scores from the BIMAS–Teacher Standard (BIMAS–T Standard), BIMAS–Parent Standard (BIMAS–P Standard), and BIMAS–Self-Report Standard (BIMAS–SR Standard). The African American, Hispanic, and White groups were the only groups with large enough sample sizes to be included in the analyses. The demographic characteristics of the samples (e.g., age and gender of the rated youth) were controlled for by including these variables as covariates. Because differences in the composition of the groups (e.g., covariates) were statistically controlled in these analyses, adjusted means are presented. In addition to significance levels (p < 0.05), an estimate of effect size (Partial η2) is provided for every effect. The following planned comparisons were designed before the analyses were conducted in order to assess pair-wise differences (given that the omnibus F-test was significant):

1. White versus African American

2. White versus Hispanic

Results at the multivariate level indicated significant, but small, effects of race/ethnicity for all rater types (see first table). Univariate results are presented in the three tables that follow, revealing several significant effects (which is not surprising given the large sample size), although all effect sizes are small. An examination of the means reveals that ratings of White children were very close to the normative mean of 50, while ratings of African American and Hispanic children fell either slightly above, or slightly below this normative mean (scores never deviated more than 3.5 T-scores from the mean). The results of these analyses indicate that although some isolated differences were statistically significant, on the whole, there were very few meaningful differences between scores on the BIMAS across the races/ethnicities as demonstrated by the small effect sizes, indicating generalizability of the BIMAS across different race/ethnic groups. 

 

Race/Ethnicity Effects: Multivariate Results

Rater

Wilk’s Lambda

F

df

p

Partial ƞ2

Teacher

0.93

8.35

10, 2174

<0.001

0.04

Parent

0.98

2.33

10, 2578

0.010

0.01

Self-Report

0.90

15.21

10, 1212

<0.001

0.05

 

Differences between Race/Ethnic Groups: BIMAS–Teacher Standard

 

Race/Ethnicity

F

(2, 1293)

p

Partial ƞ2

Planned Comparisons

AA

(n = 250)

HI

(n = 169)

WH

(n = 677)

M

50.0

51.4

50.1

1.96

0.141

0.004

n/a

SD

7.6

8.5

8.8

 

Differences between Race/Ethnic Groups: BIMAS–Parent Standard

 

Race/Ethnicity

F

(2, 1293)

p

Partial ƞ2

Planned Comparisons

AA

(n = 236)

HI

(n = 203)

WH

(n = 859)

M

51.2

50.3

50.1

1.44

0.238

0.002

n/a

SD

6.5

7.3

5.7

 

Differences between Race/Ethnic Groups: BIMAS–Parent Standard

 

Race/Ethnicity

F

(2, 610)

p

Partial ƞ2

Planned Comparisons

AA

(n = 127)

HI

(n = 136)

WH

(n = 352)

M

50.6

46.7

49.9

6.76

0.001

0.022

AA=WH

WH>HI

SD

9.7

9.8

9.8

 

 

Assessment Format: Rating Scale

Rater / Scorer: Teacher, Parent, Child, School Psychologist, Clinician

Usability Study Conducted: Yes