kalbracht47

11-27-2007, 01:59 AM

Dear all,

many thanks to all of you who responded to my query regarding fitting a

regression without an intercept term and determine the goodness of the

fit, when both variables having errors. I still need some time to work

through your suggestion but they were really valuable in finding the

right way.

All the responders agreed that rsquare is not a valid measure for the

goodness of such a regression.

I hope other people find the information below as useful as I did. I

have posted my original query followed by a summary of responses.

Many thanks again

Kirsten Albracht

The original query was:

-----------------------------

Dear Biomech-L readers,

I have some problems fitting a linear regression to my measured data and

especially to determine the goodness of the fit. The regressions we

need, have two substantial differences to the simple linear regressions:

1) Due to some physical consideration we have a regression model with no

intercept term, i.e. through the origin (y = b*x)

2) Both variables are erroneous

My question is: How to calculate a 'valid' rsquare and the standard

error of such a regression ?

In the literature (Casalla, G. & Berger, R. L. 2002: Statistical

Inference, Duxbury, pp 581-583) we found for the sake that both

variables are erroneous, the orthogonal least square distance is used

instead of the ordinary least square distance to fit the regression. In

addition we found that the calculation of rsquare is different for the

no-intercept model compared to the common used intercept model (Hahn, G.

H. 1977: Journal of Quality Technology 9(2), pp 56-61; Eisenhauer, J. G.

2003: Teaching Statistics 25(3), pp 76-80).For the intercept model

rsquare is the proportion of the initial variation, as measured by the

sum of squares around the mean of Y, which is accounted for by the

regression. For the no-intercept model, the variation around the fitted

regression, however, could exceed the variation around the mean,

resulting in a negative value of rsquar. Therefore, for the intercept

model it is recommended to calculate rsquare as the variation around the

origin.

However, I am wondering whether I can apply this calculation also when I

used the orthogonal least square distance to fit the regression.

The responses:

--------------------------------------

Dear Kirsten,

I do not have all statistical subtleties at hand, but in a model y = b*x

the maximum likelihood estimate for b is

b^ = cov(x,y)/var(x)

I used it in Gait& Posture 16: 78-86 (2002), eq (2).

The r2 is not a good measure for the goodness of fit in this case, as in

many more cases. Better is the standard error (r.m.s. error) :

sigma = stdev(y-bx) = sqrt(1/n*sum(y-bx)2 )

Now you still need a real mathematican to prove all these bold

statements....

At Hof

Center for Human Movement Sciences

University of Groningen

PO Box 196

9700 AD Groningen

The Netherlands

---------------------------------------------------------------------------

Dear Kirsten,

Having errors in both variables places you outside most statistics

books. There is no "optimal" solution. What you describe as

"perpendicular" seems rather strange to me, as it depends heavily on the

choice of units (going from cm to mm changes what is perpendicular). The

solution is to make sure that the errors in both directions are equal

size (i.e. use a scaling that makes them equal).

Secondly, r2 is not a very good measure of the quality of the fit it is

more a measure of how well you sampled the space (the larger the range

of values, the larger r2). If you have errors for each data point, you

can express the chance that the datapoints are distributed along the fit

given their precision (a chi-squared test). I found this method in

"Numerical Recipes in C".

You might run into problems with referees; I once had to put a figure

in a paper to explain the basic statistics of line-fitting (figure 2 in

the attached paper).

Good luck with finding a good way to describe your data.

--

Prof. dr. Jeroen B.J. Smeets

Faculty of Human Movement Sciences,

Vrije Universiteit

http://www.fbw.vu.nl/~JSmeets/

---------------------------------------------------------------------------

Dear Kirsten Albracht,

I am not sure you still need help, but in case you do...

What you want to do is actually a little tricky, there are analytical

equations to do the regression when only one of the variable has errors,

when the two variable are erroneous, there is actually no analytical

equations, but it is doable, the following book should help

Bevington, P. R. Data reduction and error analysis for the physical

sciences. New York, NY, Mc Graw-Hill, 1969.

I hope it helps,

Osmar.

--

Osmar Pinto Neto

Universidade do Vale do Paraíba - UNIVAP

http://www.univap.br/

---------------------------------------------------------------------------

Hello,

There is a zero intercept regression file available on the matlab file

exchange. It contains m-files to perform the regression as well as some

discussion (I think) on the statistical validity. If you have trouble

finding it, let me know and I can look it up.

Good luck,

Eric Wolbrecht, PhD.

Assistant Professor

Mechanical Engineering Department

University of Idaho

---------------------------------------------------------------------------

Hi Kirsten-

Your case sounds like a candidate for a (standardized) major axis

regression. Tthis analysis is found in major statistical packages, and

in R, the open-source statistics software. Look it up in your software,

or find someone that have spss.

Cheers- Roi

--

Kirsten Albracht

Institute for Biomechanics and Orthopaedics

German Sport University Cologne

Carl Diem Weg 6

50933 Cologne

Email: albracht@dshs-koeln.de

Tel.: +49 221 4982-5680

Fax.: +49 221 4971598

many thanks to all of you who responded to my query regarding fitting a

regression without an intercept term and determine the goodness of the

fit, when both variables having errors. I still need some time to work

through your suggestion but they were really valuable in finding the

right way.

All the responders agreed that rsquare is not a valid measure for the

goodness of such a regression.

I hope other people find the information below as useful as I did. I

have posted my original query followed by a summary of responses.

Many thanks again

Kirsten Albracht

The original query was:

-----------------------------

Dear Biomech-L readers,

I have some problems fitting a linear regression to my measured data and

especially to determine the goodness of the fit. The regressions we

need, have two substantial differences to the simple linear regressions:

1) Due to some physical consideration we have a regression model with no

intercept term, i.e. through the origin (y = b*x)

2) Both variables are erroneous

My question is: How to calculate a 'valid' rsquare and the standard

error of such a regression ?

In the literature (Casalla, G. & Berger, R. L. 2002: Statistical

Inference, Duxbury, pp 581-583) we found for the sake that both

variables are erroneous, the orthogonal least square distance is used

instead of the ordinary least square distance to fit the regression. In

addition we found that the calculation of rsquare is different for the

no-intercept model compared to the common used intercept model (Hahn, G.

H. 1977: Journal of Quality Technology 9(2), pp 56-61; Eisenhauer, J. G.

2003: Teaching Statistics 25(3), pp 76-80).For the intercept model

rsquare is the proportion of the initial variation, as measured by the

sum of squares around the mean of Y, which is accounted for by the

regression. For the no-intercept model, the variation around the fitted

regression, however, could exceed the variation around the mean,

resulting in a negative value of rsquar. Therefore, for the intercept

model it is recommended to calculate rsquare as the variation around the

origin.

However, I am wondering whether I can apply this calculation also when I

used the orthogonal least square distance to fit the regression.

The responses:

--------------------------------------

Dear Kirsten,

I do not have all statistical subtleties at hand, but in a model y = b*x

the maximum likelihood estimate for b is

b^ = cov(x,y)/var(x)

I used it in Gait& Posture 16: 78-86 (2002), eq (2).

The r2 is not a good measure for the goodness of fit in this case, as in

many more cases. Better is the standard error (r.m.s. error) :

sigma = stdev(y-bx) = sqrt(1/n*sum(y-bx)2 )

Now you still need a real mathematican to prove all these bold

statements....

At Hof

Center for Human Movement Sciences

University of Groningen

PO Box 196

9700 AD Groningen

The Netherlands

---------------------------------------------------------------------------

Dear Kirsten,

Having errors in both variables places you outside most statistics

books. There is no "optimal" solution. What you describe as

"perpendicular" seems rather strange to me, as it depends heavily on the

choice of units (going from cm to mm changes what is perpendicular). The

solution is to make sure that the errors in both directions are equal

size (i.e. use a scaling that makes them equal).

Secondly, r2 is not a very good measure of the quality of the fit it is

more a measure of how well you sampled the space (the larger the range

of values, the larger r2). If you have errors for each data point, you

can express the chance that the datapoints are distributed along the fit

given their precision (a chi-squared test). I found this method in

"Numerical Recipes in C".

You might run into problems with referees; I once had to put a figure

in a paper to explain the basic statistics of line-fitting (figure 2 in

the attached paper).

Good luck with finding a good way to describe your data.

--

Prof. dr. Jeroen B.J. Smeets

Faculty of Human Movement Sciences,

Vrije Universiteit

http://www.fbw.vu.nl/~JSmeets/

---------------------------------------------------------------------------

Dear Kirsten Albracht,

I am not sure you still need help, but in case you do...

What you want to do is actually a little tricky, there are analytical

equations to do the regression when only one of the variable has errors,

when the two variable are erroneous, there is actually no analytical

equations, but it is doable, the following book should help

Bevington, P. R. Data reduction and error analysis for the physical

sciences. New York, NY, Mc Graw-Hill, 1969.

I hope it helps,

Osmar.

--

Osmar Pinto Neto

Universidade do Vale do Paraíba - UNIVAP

http://www.univap.br/

---------------------------------------------------------------------------

Hello,

There is a zero intercept regression file available on the matlab file

exchange. It contains m-files to perform the regression as well as some

discussion (I think) on the statistical validity. If you have trouble

finding it, let me know and I can look it up.

Good luck,

Eric Wolbrecht, PhD.

Assistant Professor

Mechanical Engineering Department

University of Idaho

---------------------------------------------------------------------------

Hi Kirsten-

Your case sounds like a candidate for a (standardized) major axis

regression. Tthis analysis is found in major statistical packages, and

in R, the open-source statistics software. Look it up in your software,

or find someone that have spss.

Cheers- Roi

--

Kirsten Albracht

Institute for Biomechanics and Orthopaedics

German Sport University Cologne

Carl Diem Weg 6

50933 Cologne

Email: albracht@dshs-koeln.de

Tel.: +49 221 4982-5680

Fax.: +49 221 4971598