PDA

View Full Version : Summary: Goodness of a regression without an intercept term



kalbracht47
11-27-2007, 01:59 AM
Dear all,

many thanks to all of you who responded to my query regarding fitting a
regression without an intercept term and determine the goodness of the
fit, when both variables having errors. I still need some time to work
through your suggestion but they were really valuable in finding the
right way.
All the responders agreed that rsquare is not a valid measure for the
goodness of such a regression.
I hope other people find the information below as useful as I did. I
have posted my original query followed by a summary of responses.


Many thanks again

Kirsten Albracht


The original query was:
-----------------------------

Dear Biomech-L readers,

I have some problems fitting a linear regression to my measured data and
especially to determine the goodness of the fit. The regressions we
need, have two substantial differences to the simple linear regressions:

1) Due to some physical consideration we have a regression model with no
intercept term, i.e. through the origin (y = b*x)
2) Both variables are erroneous

My question is: How to calculate a 'valid' rsquare and the standard
error of such a regression ?

In the literature (Casalla, G. & Berger, R. L. 2002: Statistical
Inference, Duxbury, pp 581-583) we found for the sake that both
variables are erroneous, the orthogonal least square distance is used
instead of the ordinary least square distance to fit the regression. In
addition we found that the calculation of rsquare is different for the
no-intercept model compared to the common used intercept model (Hahn, G.
H. 1977: Journal of Quality Technology 9(2), pp 56-61; Eisenhauer, J. G.
2003: Teaching Statistics 25(3), pp 76-80).For the intercept model
rsquare is the proportion of the initial variation, as measured by the
sum of squares around the mean of Y, which is accounted for by the
regression. For the no-intercept model, the variation around the fitted
regression, however, could exceed the variation around the mean,
resulting in a negative value of rsquar. Therefore, for the intercept
model it is recommended to calculate rsquare as the variation around the
origin.

However, I am wondering whether I can apply this calculation also when I
used the orthogonal least square distance to fit the regression.


The responses:
--------------------------------------

Dear Kirsten,
I do not have all statistical subtleties at hand, but in a model y = b*x
the maximum likelihood estimate for b is
b^ = cov(x,y)/var(x)
I used it in Gait& Posture 16: 78-86 (2002), eq (2).
The r2 is not a good measure for the goodness of fit in this case, as in
many more cases. Better is the standard error (r.m.s. error) :
sigma = stdev(y-bx) = sqrt(1/n*sum(y-bx)2 )

Now you still need a real mathematican to prove all these bold
statements....
At Hof
Center for Human Movement Sciences
University of Groningen
PO Box 196
9700 AD Groningen
The Netherlands

---------------------------------------------------------------------------
Dear Kirsten,

Having errors in both variables places you outside most statistics
books. There is no "optimal" solution. What you describe as
"perpendicular" seems rather strange to me, as it depends heavily on the
choice of units (going from cm to mm changes what is perpendicular). The
solution is to make sure that the errors in both directions are equal
size (i.e. use a scaling that makes them equal).

Secondly, r2 is not a very good measure of the quality of the fit it is
more a measure of how well you sampled the space (the larger the range
of values, the larger r2). If you have errors for each data point, you
can express the chance that the datapoints are distributed along the fit
given their precision (a chi-squared test). I found this method in
"Numerical Recipes in C".

You might run into problems with referees; I once had to put a figure
in a paper to explain the basic statistics of line-fitting (figure 2 in
the attached paper).

Good luck with finding a good way to describe your data.

--
Prof. dr. Jeroen B.J. Smeets
Faculty of Human Movement Sciences,
Vrije Universiteit
http://www.fbw.vu.nl/~JSmeets/

---------------------------------------------------------------------------

Dear Kirsten Albracht,
I am not sure you still need help, but in case you do...
What you want to do is actually a little tricky, there are analytical
equations to do the regression when only one of the variable has errors,
when the two variable are erroneous, there is actually no analytical
equations, but it is doable, the following book should help
Bevington, P. R. Data reduction and error analysis for the physical
sciences. New York, NY, Mc Graw-Hill, 1969.
I hope it helps,
Osmar.

--
Osmar Pinto Neto
Universidade do Vale do Paraíba - UNIVAP
http://www.univap.br/

---------------------------------------------------------------------------

Hello,

There is a zero intercept regression file available on the matlab file
exchange. It contains m-files to perform the regression as well as some
discussion (I think) on the statistical validity. If you have trouble
finding it, let me know and I can look it up.

Good luck,

Eric Wolbrecht, PhD.
Assistant Professor
Mechanical Engineering Department
University of Idaho


---------------------------------------------------------------------------
Hi Kirsten-
Your case sounds like a candidate for a (standardized) major axis
regression. Tthis analysis is found in major statistical packages, and
in R, the open-source statistics software. Look it up in your software,
or find someone that have spss.
Cheers- Roi


--
Kirsten Albracht
Institute for Biomechanics and Orthopaedics
German Sport University Cologne
Carl Diem Weg 6
50933 Cologne

Email: albracht@dshs-koeln.de
Tel.: +49 221 4982-5680
Fax.: +49 221 4971598