PDA

View Full Version : Summary: ICC Reliability



Dr David Michael Hooper
12-16-1998, 09:00 PM
Biomech-L,

Several weeks ago I posted an inquiry into intraclass correlations and
their application to inter- and intra- rater reliability. I would
like to thank those that have responded and offer the information to
those who may be interested.

My conclusion is in general agreement with Patrick Neumann. The data
we collected provides the opportunity to look at inter and intra-
rater reliablity individually and is probably more useful than a
'general' reliability coefficient. Our first approach, described by
Jamie Tomlinson, has been to separate the data and evaluate the two
reliability scores for each condition. We have calculated three
within rater reliablity scores (three raters), and two between
reliablity scores (two test sessions). It seems to me that the
simplest and most logical way to truly assess the overall reliability,
would be to have a large pool of testors collecting over several
sessions, and to compute a confidence interval for each of these
values. Establishing an ICC confidence interval was mentioned by
Danny Pincivero. I have directed my students to pursue this course
first and when they have gotten that far, to consider looking at the
model given by David Brodie and to consider exploring generalizability
theory.

Thank you for your responses and I welcome any continued dialog on
this subject.

Best Wishes.

David

************************************************** ***
Reference Summary:

Rankin, Stokes (1998). Clinical Rehabilitation - 12: 187 - 199.

Denegar CR, Ball DW. Assessing reliability and precision of
measurement: an introduction to intraclass correlation and standard
error of measurement. J Sport Rehab. 2: 35-42. 1993.

Stratford PW, Goldsmith CH. Use of the standard error as reliability
index of interest: an applied example using elbow flexor data.
Physical Therapy. 77:745-750. 1997.

Psychometric Theory, Nunnally and Bernstein (1994)

Safrit & Wood (1989). Measurement concepts in physical education.
Published by Human Kinetics.

Generalizability Theory: A Primer written by Richard Shavelson and
Noreen Webb. Sage Publications:

Fleiss (1986) "The Design and Analysis of Clinical Experiments".

************************************************** **
Original Post:.


> > Hello,
> >
> > I am posting this on behalf of some students of mine. They have
> > conducted a reliability study in which three raters tested and
> > re-tested a group of twelve subjects on two different days.
> > Currently, they are attempting to calculate an intraclass
> > correlation coefficient (ICC) as described by Shrout and Fleiss
> > (1979) and Portney and Watkins (Book called 'Foundations of
> > Clinical Research).
> >
> > Portney and Watkins state that you can use any variable in the
> > analysis.
> >
> > 'The specifice facets included in the demoninator will vary,
> > depending on whether rater, occasions, or some other facetis the
> > variable of interest in the reliability study. For example, if we
> > include rater as a facet, then the total observed variance, which,
> > of course, does not include direct estimates of true variabce (as
> > this is unknown). Theoretically, however, we can estimate true
> > score variance by looking at the difference between observed
> > variance among subjects and error variance. These estimates can
> > be derived from an analysis of varaince.'
> >
> > The example given in the text has four raters evaluating six
> > subjects on a single day. In calculating the ICC, the between
> > subjects mean square, error mean square and between raters mean
> > square are taken from a repeated measures ANOVA. We can follow
> > this and reproduce it quite easily by doing a repeated measures
> > ANOVA with a single effect of, RATER. Now in my students study,
> > there are main effects of both RATER and SESSION.. We can't
> > decide which mean square terms to use because there are also
> > interactions involved.
> >
> > We could simplify it by calculating the ICCs separately for test
> > 1 and test 2 but then lose the effect of session. Perhaps the
> > study isn't suited for ICCs.
> >
> > Anyone have any advice on how to approach this, or references that
> > I can point them to?
> >
> > Thank you,
> > David
> >
> > David M. Hooper, Ph.D.
> > Department of Rehabilitation Sciences
> > University of East London
> > Romford Road
> > London E15 4LZ
> > Phone 0181-590-7000 (4025)
> d.m.hooper@uel.ac.uk

************************************************** *
Complete Replies:

> From: "A.D.Pandyan"
> Subject: Re: Reliability

> Dear David,
> Try the following references and if your are using SPSS the
> downloads are at the site mentioned. Regards David
>
> Rankin, Stokes (1998). Clinical Rehabilitation - 12: 187 - 199. Dr.
> Anand.D.Pandyan Centre for Rehab. Eng. Studies M25-Stephenson Bldg
> University of Newcastle Newcastle Upon Tyne UK - NE1 7RU
>
> Tel ++ 44 (0)191 - 222 5434
> Fax ++ 44 (0)191 - 222 8600
> e-mail A.D.Pandyan@ncl.ac.uk
> http://www.ncl.ac.uk/crest/

************************************************

> From: Danny Pincivero
> Subject: Reliability

> Dr. Hooper:
>
> I just noticed your posting on BIOMECH-L regarding reliability
> calculations. An article that might be of some help is this one:
>
> Denegar CR, Ball DW. Assessing reliability and precision of
> measurement: an introduction to intraclass correlation and standard
> error of measurement. J Sport Rehab. 2: 35-42. 1993.
>
> From your repeated measures ANOVA table, the TMS in the ICC formula
> refers to the mean square value for your repeated measures factor
> (ie - time) while EMS refers to the mean square value for the
> associated error term. In calculating the SEM's for your reliability
> values, you might also want to think about calculating 95%
> confidence intervals for this value. For this, refer to this
> article:
>
> Stratford PW, Goldsmith CH. Use of the standard error as
> reliability index of interest: an applied example using elbow
> flexor data. Physical Therapy.
> 77:745-750. 1997.
>
> Hope this information helps.
>
> Danny
>
>
> Danny M. Pincivero, Ph.D., C.S.C.S.
> Assistant Professor - Physical Therapy
> Eastern Washington University
> Department of Physical Therapy
> Mail Stop 4, 526 5th Street
> Cheney, WA 99004-2431
> phone: (509) 623-4323
> fax: (509) 623-4321
> E-mail: dpincivero@mail.ewu.edu

****************************************
> From: Stephen Page
> Subject: Re: Reliability

> Dr. Hooper:
>
> In their book, Psychometric Theory, Nunnally and Bernstein (1994)
> offer several good alternative methods for measuring
> reliability...see if their suggestions help.
>
> Steve Page, Ph.D.
> The Kessler Institute for Rehabilitation

**********************************
> From: Jamie Tomlinson
> Subject: Re: Reliability

> Dr. Hooper:
> For the purpose of determining the reliability of the measurements
you > have data that will allow you to calculate between rater
reliability as > well as within rater reliability (if the measures
are assumed to be > stable from Session 1 to Session 2). For the
purposes of ICC > calculation you must separate the data. The
between raters calculation > will require that you have only one
rating by each rater for each > subject for a total of 36 values. To
calculate the within rater ICC you > will need to have only data from
one rater (24 values). > > For the purpose of analysing the effects
of RATER and SESSION and the > interaction of these factors you will
use the data set that you have > described. > > I hope this helps. >
Jamie > -- > ___ ___ ___ > Jamie Tomlinson
| |__| |__| | Dept. of Physical Therapy > 215.572.2163 \
Beaver / 450 S Easton Rd > fax 215.572.2157 | College
| Glenside, PA 19038-3295
> | www.beaver.edu|
>
*********************************

> From: "Brodie, David"
> Subject: RE: Reliability

> Dr. Hooper,
>
> I performed a reliability study that sounds very similar to your
> students. The model I used was as follows:
>
> ICC = N(JMS - EMS) / N*JMS + k*AMS + (Nk -N - k)EMS
>
> where:
>
> N = number of jobs(of subjects)
> k = number of analysts
> JMS = job mean square
> AMS = analyst mean square
> EMS = error mean square
>
> I was able to obtain the ICCs for which you can calculate confidence
> intervals. You can evaluate any differences in sessions based on
> the confidence intervals, or through performing t-tests. I used a
> stats consultant to help set up the analysis, and he seemed to think
> this was a valid way to evaluate reliability, and did not mention
> any other way to model the situation to evaluate the effect of
> session.
>
> Another statistical formula you may want to consider, which is often
> cited for reliability studies, is Cohen's Kappa. Fleiss (1986)
> provides a good explanation of this method in his book "The Design
> and Analysis of Clinical Experiments".
>
> I hope this helps. Have a good day,
>
> > David M. Brodie, M.Sc.
> > Ergonomist, OSH Engineering Unit
> > Workplace Safety and Health Branch
> > Manitoba Labour
> > 204-945-0704 Tel
> 204-945-4556 Fax

***************************************

> From: Patrick Neumann
> Subject: Re: Reliability

> Hi - you have a juicy stats problem here. I am no statistician but
> we have done a number of similar reliability studies recently so
> please accept this comment: What is you main research question? You
> have, from your data, the ability to look at 2 aspects of
> reliability. 1) Inter-observer reliability ( between observers) and
> 2) intra-observer reliability (between days within observers)
>
> I suggest that these are both of interest to someone who wishes to
> apply the method and so should be considered separately as they
> affect how the tool might be applied in the future (implications for
> study design). This greatly simplifies your problem and is perhaps
> more useful than an "overall" reliability term.
>
> Good luck;
> -P.
> (Patrick Neumann
> Executive Co-ordinator
> Ergonomics Initiative
University of Waterloo)

******************************************

> From: Pat Patterson
> Subject: response to your reliability question

> Hi-
> One of my former students forwarded your inquiry to me and thought
> I might be able to offer some advice.
>
> You are correct in that your situation cannot easily be answered
> with classical test theory reliability approaches. A better
> approach would be to use generalizability theory, an approach that
> allows the error term to be differentiated. In your case, you have
> two facets that you wish to examine--rater and session.
>
> There are a number of excellent references on G-theory but I will
> refer you to a chapter in a measurement text that uses an example
> very similar to your own.
>
> Safrit & Wood (1989). Measurement concepts in physical education.
> Published by Human Kinetics. The chapter by Jim Morrow on G-theory
> is quite easy to understand and may assist you. There are also some
> good references listed.
> Another excellent reference is a small book published by Sage
> Publications:
> Generalizability Theory: A Primer written by Richard Shavelson
> and Noreen Webb.
>
> Good luck! Hope this information helps a bit.
>
> Patricia Patterson, PhD
>
>
> Patricia Patterson
> Department of Exercise and Nutritional Sciences
> San Diego State University
> San Diego, Ca 92182
> (619) 594-1919
> Email: ppatters@mail.sdsu.edu
David M. Hooper, Ph.D.
Department of Rehabilitation Sciences
University of East London
Romford Road
London E15 4LZ
Phone 0181-590-7000 (4025)
d.m.hooper@uel.ac.uk

---------------------------------------------------------------
To unsubscribe send SIGNOFF BIOMCH-L to LISTSERV@nic.surfnet.nl
For information and archives: http://isb.ri.ccf.org/biomch-l
---------------------------------------------------------------