## Test-retest reliability: implication of the average or single measures of ICC models

Dear subscribers,

There is a contradiction in the literature in choosing and interpreting ICC(2,1) or ICC(2,k) for day-to-day test-retest reliability assessment. I have a question and will be grateful if anyone kindly shares his/her experience about it. Also please let me know if there is a problem in my understanding of ICCs as described below.

After Shrout and Fleiss , the six ICC models are denoted by ICC(n,k); where n=1,2,3 is the main model and k denotes singles measure (k=1), or average of many trials (k>1). For test-retest studies in which we wish to generalize the results and find the trial-to-trial or day-to-day relative reliability, the second model (two-way random) is suitable (n=2) . Here, I have no question on n, but lets focus on choosing k.

In a test-retest context, k is the number of trials in each session (number of columns of data). Suppose weve collected 3 trials of data for 10 subjects in two different days. For within-day (intra-session), we can use ICC(2,1) and ICC(2,3), albeit we can compute ICC(2,2) after eliminating the last column. The message of an acceptable ICC(2,1) value is that the within-day recordings are reliable and one can rely on the results of one trial (perhaps the first trial). On the other hand, ICC(2,3) implies that the reader should rely on the average of the 3 trials. Accordingly, if we got an unreliable value of ICC(2,1), I think it is better to compute ICC(2,2) before calculating ICC(2,3), because it is beneficial to the readers in that if ICC(2,2) becomes reliable, it suggests averaging on two trials rather than 3 trials.

For day-to-day (inter-session) in the above example, I think only ICC(2,1) would be informative. Because, if we use ICC(2,k), where k is the number of days, its message would be that one should rely on the average values of different days!.

In contrast to my above understandings, some researchers have utilized ICC(2,k) for day-to-day reliability assessment. For the aforementioned example, what are your suggested models of ICC for both intra- and inter-session reliabilities?

Sorry for my lengthy post.

References:
1. Shrout, P.E. and J.L. Fleiss, Intraclass correlations: uses in assessing rater reliability. Psychological bulletin, 1979. 86(2): p. 420-428.
2. Weir, J.P., Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. The Journal of Strength & Conditioning Research, 2005. 19(1): p. 231-240.

Bests,
Ali

-----------------------------------
M.A. Sanjari, PhD.
Director of Biomechanics Lab.,
Rehabilitation Research Center, TUMS.
Tehran University of Medical Sciences
Tel: (+98) 21 2225 9306
Fax: (+98) 21 2222 0946
www.sanjari.ir
-----------------------------------