Ton Van Den Bogert

07-31-1995, 06:30 AM

Dear Biomch-L:

Paolo de Leva asked some specific questions regarding artifacts

created by spline smoothing. I have not studied the mathematics

of splines extensively, but would like to clarify my observations

so as to avoid confusion. These observations are based on

experimentation with the Woltring implementation some ten years

ago, so my memory may not be accurate.

> 1) Are the artifacts present (a) only in the middle of each cubic or

>quintic polinomial, or (b) also in the extremes (corresponding to the instants

>when the raw data were measured)?

> In case (a) only interpolated data would be affected, while in case

> (b) also non-interpolated data might contain unwanted extra-noise.

> Van der Bogert and Glossop wrote that they noticed artifacts in

> interpolated data (case a). I would like to know for sure if artifacts

> can be excluded at the extremes of each polinomial (case b).

I have noticed these artifacts whenever I used a smoothing spline

on data with gaps. Note that I'm still talking about 'smoothing

splines', not about 'interpolating splines' (splines that pass

exactly through all data points). The interpolation across the

gap suffered from large wave-like fluctuations, especially with

quintic splines. This was on kinematic data which were basically

uniformly spaced, except for certain areas where a marker was

'out-of-view'. I'm not sure if the same artifacts would occur if

data are uniformly spaced throughout, and a spline is used before

resampling the data at a higher sampling rate. I think the same

thing could happen, to a lesser extent.

The answer to Paolo's question is (a). The artifacts only affect

the spline between data points, not right at the data points.

But the derivatives may be affected at the data points!

> 2) Does this uncertainty in the results have a clean mathematical

>explanation? It was not clear whether the artifacts were just observed

>by some researchers, USING SOME PARTICULAR SPLINE ROUTINES, or they are

>expected, embedded in the logic of the equations themselves, and cannot be

>avoided, whatever routine you are using.

A spline is created by minimizing an objective function which is

a combination of smoothness (integral of square of Nth

derivative) of the whole curve and close fit at the data points.

The relative weighting between the two criteria determines the

amount of smoothing. I think the problem with irregularly spaced

data is that widely spaced data points need more smoothing than

closely spaced data points. Since the whole spline is created

using a single smoothing parameter, there is insufficient

smoothing for those areas where data are further apart.

Another mathematical property of splines is that they are

equivalent to digital filters. See the file GCVSPL MEMO which

can be obtained from LISTSERV@nic.surfnet.nl. Roughly, a cubic

spline would be equivalent to a 2nd order Butterworth filter

applied twice. A quintic spline would be equivalent to a 3rd

order Butterworth filter applied twice. Since higher-order

filters have a tendency for 'ringing' close to sharp transitions

in the data, this may help explain why higher-order splines tend

to create more artifacts, especially in the derivatives.

The artifacts are definitely not a problem with specific

software, but inherent in the mathematics.

> 3) Are the artifacts mathematical singularities, that

>occur only in some precise cases, or they occur unpredictably?

They are predictable, and I have only had problems when

interpolating over relatively large gaps in the data.

In reply to Jesus Dapena's question:

>time. "Smoothing" means that the spline curve does not pass exactly through

>the raw data points; "interpolating" means that you are using the spline

>functions to calculate data for times in between the times of the original

>data points (although some people reserve the term "interpolating" only for

>zero smoothing).

> Or am I the one that missed the boat here??

No boat was missed. When I talk about interpolation, I'm still

using smoothing splines but calculate the function at times when

no data are available: between samples and across gaps. The

'some people' are correct in their terminology, by the way. But

zero smoothing isn't used in biomechanics, as far as I know.

Since I have used the Woltring package, some final comments:

1. The Fortran version can be obtained by sending 'GET GCVSPL

FORTRAN' to LISTSERV@nic.surfnet.nl. A C version exists (I

think). Its location must have been announced on Biomch-L (that

would require a search through the archives).

2. I have never had good results when using the GCV option, which

automatically determines the optimal amount of smoothing. The

smoothed function is OK, but the derivatives are much too noisy.

Do others have the same experience?

-- Ton van den Bogert

bogert@acs.ucalgary.ca

Paolo de Leva asked some specific questions regarding artifacts

created by spline smoothing. I have not studied the mathematics

of splines extensively, but would like to clarify my observations

so as to avoid confusion. These observations are based on

experimentation with the Woltring implementation some ten years

ago, so my memory may not be accurate.

> 1) Are the artifacts present (a) only in the middle of each cubic or

>quintic polinomial, or (b) also in the extremes (corresponding to the instants

>when the raw data were measured)?

> In case (a) only interpolated data would be affected, while in case

> (b) also non-interpolated data might contain unwanted extra-noise.

> Van der Bogert and Glossop wrote that they noticed artifacts in

> interpolated data (case a). I would like to know for sure if artifacts

> can be excluded at the extremes of each polinomial (case b).

I have noticed these artifacts whenever I used a smoothing spline

on data with gaps. Note that I'm still talking about 'smoothing

splines', not about 'interpolating splines' (splines that pass

exactly through all data points). The interpolation across the

gap suffered from large wave-like fluctuations, especially with

quintic splines. This was on kinematic data which were basically

uniformly spaced, except for certain areas where a marker was

'out-of-view'. I'm not sure if the same artifacts would occur if

data are uniformly spaced throughout, and a spline is used before

resampling the data at a higher sampling rate. I think the same

thing could happen, to a lesser extent.

The answer to Paolo's question is (a). The artifacts only affect

the spline between data points, not right at the data points.

But the derivatives may be affected at the data points!

> 2) Does this uncertainty in the results have a clean mathematical

>explanation? It was not clear whether the artifacts were just observed

>by some researchers, USING SOME PARTICULAR SPLINE ROUTINES, or they are

>expected, embedded in the logic of the equations themselves, and cannot be

>avoided, whatever routine you are using.

A spline is created by minimizing an objective function which is

a combination of smoothness (integral of square of Nth

derivative) of the whole curve and close fit at the data points.

The relative weighting between the two criteria determines the

amount of smoothing. I think the problem with irregularly spaced

data is that widely spaced data points need more smoothing than

closely spaced data points. Since the whole spline is created

using a single smoothing parameter, there is insufficient

smoothing for those areas where data are further apart.

Another mathematical property of splines is that they are

equivalent to digital filters. See the file GCVSPL MEMO which

can be obtained from LISTSERV@nic.surfnet.nl. Roughly, a cubic

spline would be equivalent to a 2nd order Butterworth filter

applied twice. A quintic spline would be equivalent to a 3rd

order Butterworth filter applied twice. Since higher-order

filters have a tendency for 'ringing' close to sharp transitions

in the data, this may help explain why higher-order splines tend

to create more artifacts, especially in the derivatives.

The artifacts are definitely not a problem with specific

software, but inherent in the mathematics.

> 3) Are the artifacts mathematical singularities, that

>occur only in some precise cases, or they occur unpredictably?

They are predictable, and I have only had problems when

interpolating over relatively large gaps in the data.

In reply to Jesus Dapena's question:

>time. "Smoothing" means that the spline curve does not pass exactly through

>the raw data points; "interpolating" means that you are using the spline

>functions to calculate data for times in between the times of the original

>data points (although some people reserve the term "interpolating" only for

>zero smoothing).

> Or am I the one that missed the boat here??

No boat was missed. When I talk about interpolation, I'm still

using smoothing splines but calculate the function at times when

no data are available: between samples and across gaps. The

'some people' are correct in their terminology, by the way. But

zero smoothing isn't used in biomechanics, as far as I know.

Since I have used the Woltring package, some final comments:

1. The Fortran version can be obtained by sending 'GET GCVSPL

FORTRAN' to LISTSERV@nic.surfnet.nl. A C version exists (I

think). Its location must have been announced on Biomch-L (that

would require a search through the archives).

2. I have never had good results when using the GCV option, which

automatically determines the optimal amount of smoothing. The

smoothed function is OK, but the derivatives are much too noisy.

Do others have the same experience?

-- Ton van den Bogert

bogert@acs.ucalgary.ca