Herman J. Woltring
02231991, 12:03 PM
Dear BiomchL readers,
Review and rebuttal belong to the essences of academic life. While most of
this occurs in the privacy of the submission and acceptance process for refe
reed (archive) journals such as the Journal of Biomechanics, the debate may
become a public one (once a paper has been published) by counterpublications
or letter(s) to the Editor. Of course, if a paper is published without an
extensive review procedure, one may either leave the matter as it is, or opt
for a public response. In the present case, the latter seems justified, as
a number of readers on this list have been using some of the published files
on the BiomchL file server.
During last week's 1st European Conference of Biomedical Engineering, I was
pleased to hear a reference to my own work on natural spline smoothing and
differentiation (see the relevant package which can be retrieved via the
request SEND GCVSPL FORTRAN to LISTSERV@HEARN  subscribers only). In their
paper "A Procedure for Quantitative Comparison of Movement Data", coauthored
by Moreno D'Amico, Giancarlo Ferrigno, and Giorgio C. Santambrogio from the
Centro di Bioingegneria, Dipartimento di Bioengignerio, Politecnico di Milano,
Fondazione Pro Juventute in Milan, Italy, a general procedure encompassing
data acquisition, pretreatment, check of steadystate, normalisation,
stratification, collection, statistic testing, and I.E.D. computation is
provided, with cited references to the authors' own work only. Smoothing
and derivative estimation were covered in a few lines under "pretreatment".
During the oral presentation, however, the first author confined himself to
a comparison between his own smoothing and differentiation procedure (an AR
model with automatic order determination) and the GCVSPL package, claiming
that the GCVSPL package was generally viewed as the best in the litterature.
(While this was a very flattering statement, I have never come across such a
claim.) More seriously, some of the visitors to this lecture were annoyed
that the spoken presentation covered only a small part of the published
abstract. (There were 5 parallel sessions, and one had to shuttle between
sessions on the basis of the conference Proceedings. When a subsequent
speaker proved to be a noshow, the paper was debated at length, but no
further details could be obtained on the remainder of the published abstract.)
The speaker proceeded by comparing the results of his package with those from
GCVSPL when operated in quintic spline mode, using 3rd derivatives obtained
with either package. While third derivatives have, indeed, been assessed with
a natural, quintic spline precursor to the GCVSPL package (see my Human Movement
Science 1985 paper), this was done to demonstrate that this is not a proper
procedure. Use of natural splines imposes zero end conditions from certain
derivatives and up. This is fine if the data meet such a condition sufficient
ly; in the opposite case, serious artefacs may occur t h r o u g h o u t the
data record, and not just at the record end(s) as suggested by, e.g., Hatze in
his 1981/1 J. of Biomechanics paper.
For these reasons, it was recommended in the 1986 ESB Berlin Proceedings (not
in the 1986 Advances in Engineering Software paper describing the GCVSPL
package, as I said during the debate at the Nice meeting) that the half spline
order should be higher than the highest derivative sought. This leads to the
following table:
Spline order Spline name Derivative
2 Linear Smoothed data only
4 Cubic Up to and including 1st Derivative
6 Quintic ,, ,, ,, ,, 2nd Derivative
8 Heptic ,, ,, ,, ,, 3rd Derivative
etc.
(estimating 3rd and higher derivatives is, however, a highly illposed problem
and may fail unless the data are very accurate).
In fact, the smoothing constraint occurs in the same, lowest derivative that
imposes zeroend boundary conditions in the case of natural splines. Thus,
even if no zero boundary constraints are imposed (as when using periodic or
complete splines), the same table as above should be used in order to avoid
a negative bias in the highest derivative sought. While one might use higher
order splines than advocated above, this has the disadvantage of a steeper
transition bandwidth in the equivalent Butterworth filter: see the paper in
the 1986 ESB Berlin Proceedings.
Given the above choice of the spline order, GCVSPL allows a variety of options
to obtain `optimal' results. Some of these options require iterative operation
of the package, and this may naturally be timeconsuming. My Italian colleague
provided some timing comparisons, and concluded that his package is much faster
than the spline package. However, he did not state whether this was done in the
MODE < 0 mode (where previously assessed matrices are reused). Using GCVSPL
with the provided test data, such differences can be substantial. Furthermore,
I should add that I have never bothered to look at great length into computatio
nal speed ; one may easily experiment with this, by changing the TOL parameter
in the GCVSPL subroutine (currently 1D6) into a larger value (e.g., 1D3).
With a little experimenting today, I found no serious differences in the
quality of the estimation procedure, while timing improvements on the order
of 30% could be reached. However, timing is of limited importance if (as in
videobased movement analysis) other parts of the data processing chain require
much time, too.
Furthermore, the speaker claimed that his (third) derivatives were less noisy
than the spline package; however, I do not recall seeing any comparison between
known, true derivatives and estimated ones with either procedure. At any rate,
it may be useful to experiment with initial guesses for the smoothing parameter,
by first running GCVSPL in MODE=1 with a high smoothing parameter (or in MODE=4
with a large number of degrees of freedom), and than run the package in MODE=2.
This will force the package to start searching for the optimal amount of smooth
ing from a low, equivalent cutoff frequency.
The frequencydomain explanation is as follows. Assuming a signal + noise power
spectrum as depicted below,

xxxxxxxxx
 xxxx
 xxx x
 xx x x
 xxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxx
+
0 freqency > Fnyquist
the highfrequency peak may be `signal' or `noise'; neither algorithm knows
this a priori. If we have reason to believe that the high peak is a true
signal component, it should clearly be taken along; in the opposite case
(e.g., mains hum at 50, 60, 100, or 120 Hz), it should clearly be discarded.
An iterative procedure such as GCVSPL may discard the peak if searching from
below unless it is too strong, and accept it unconditionally if searching from
above. N.B.: If the noisy data truly represent a lowpass signal with `white',
additive, uncorrelated noise (i.e., with a flat spectrum), the highest frequency
is choosen where the data spectrum becomes essentially flat. Note, however,
that this is only true if there is sufficient redundancy in the data, i.e., a
significant highfrequency part of the spectrum is, indeed, flat: the criteria
via which both procedures estimate an optimum choice are based on noisy data,
and are, therefore, noisy stochastics themselves requiring sufficient
redundancy.
One should note also that finding an optimal amount of smoothing on the
original data level does not necessarily imply that also the derivatives of
this optimally estimated signal are in some way optimal. As apparent from
the Berlin 1986 paper (see also the paper "Effects of Data Smoothing on the
Reconstruction of Helical Axis Parameters in Human Joint Kinematics" by De
Lange et al., J. of Biomech. Engng 112, May 1990, 107113, Fig. 4), more
smoothing can, in fact, be appropriate. Actually, this finding may appease
the fear of some users of automatic cutoff determining algorithms that
unwarranted oversmoothing is performed; at the same time, the results
obtained f o r t h e g i v e n d a t a were quite satisfactory.
Hatze's 1981 paper justifiably addressed the issue of optimality between
derivatives.
The morale of this (possibly unusual) rebuttal is a simple one: never trust an
automatic algorithm's results blindly. Try to understand what the algorithm
does, and play around with it until you know its pro's and con's. For the
GCVSPL package, a wise activity is to run it for a given set of data at a
logarithmically increasing series of smoothing parameters in MODE=1 (first
time +1), and plot the residual variance and GCV function values. This will
provide an idea of any multiple optima caused by highfrequency peaks such as
the one depicted above.
Reference: M. D'Amico & G. Ferrigno, Techniques for the Evaluation of Deriva
tives from Noisy Biomechanical Displacement Data Using a ModelBased Bandwidth
Selection Procedure. Med. & Biolog. Engng & Comp. 28(1990), 407415.
P.S. Since the Italian group is not, to my knowledge, subscribing to BiomchL,
I'll send a copy of this posting to them by facsimile. Hopefully, they will
respond on the list. Regretfully, I may not be accessible for a while by
email as of the first of March, but I'll keep in touch with this list somehow.
Herman J. Woltring
Brussellaan 29
NL5628 TB EINDHOVEN
The Netherlands
Tel. (private) +31.40.480869
Voice/fax/modem +31.40.413744
Review and rebuttal belong to the essences of academic life. While most of
this occurs in the privacy of the submission and acceptance process for refe
reed (archive) journals such as the Journal of Biomechanics, the debate may
become a public one (once a paper has been published) by counterpublications
or letter(s) to the Editor. Of course, if a paper is published without an
extensive review procedure, one may either leave the matter as it is, or opt
for a public response. In the present case, the latter seems justified, as
a number of readers on this list have been using some of the published files
on the BiomchL file server.
During last week's 1st European Conference of Biomedical Engineering, I was
pleased to hear a reference to my own work on natural spline smoothing and
differentiation (see the relevant package which can be retrieved via the
request SEND GCVSPL FORTRAN to LISTSERV@HEARN  subscribers only). In their
paper "A Procedure for Quantitative Comparison of Movement Data", coauthored
by Moreno D'Amico, Giancarlo Ferrigno, and Giorgio C. Santambrogio from the
Centro di Bioingegneria, Dipartimento di Bioengignerio, Politecnico di Milano,
Fondazione Pro Juventute in Milan, Italy, a general procedure encompassing
data acquisition, pretreatment, check of steadystate, normalisation,
stratification, collection, statistic testing, and I.E.D. computation is
provided, with cited references to the authors' own work only. Smoothing
and derivative estimation were covered in a few lines under "pretreatment".
During the oral presentation, however, the first author confined himself to
a comparison between his own smoothing and differentiation procedure (an AR
model with automatic order determination) and the GCVSPL package, claiming
that the GCVSPL package was generally viewed as the best in the litterature.
(While this was a very flattering statement, I have never come across such a
claim.) More seriously, some of the visitors to this lecture were annoyed
that the spoken presentation covered only a small part of the published
abstract. (There were 5 parallel sessions, and one had to shuttle between
sessions on the basis of the conference Proceedings. When a subsequent
speaker proved to be a noshow, the paper was debated at length, but no
further details could be obtained on the remainder of the published abstract.)
The speaker proceeded by comparing the results of his package with those from
GCVSPL when operated in quintic spline mode, using 3rd derivatives obtained
with either package. While third derivatives have, indeed, been assessed with
a natural, quintic spline precursor to the GCVSPL package (see my Human Movement
Science 1985 paper), this was done to demonstrate that this is not a proper
procedure. Use of natural splines imposes zero end conditions from certain
derivatives and up. This is fine if the data meet such a condition sufficient
ly; in the opposite case, serious artefacs may occur t h r o u g h o u t the
data record, and not just at the record end(s) as suggested by, e.g., Hatze in
his 1981/1 J. of Biomechanics paper.
For these reasons, it was recommended in the 1986 ESB Berlin Proceedings (not
in the 1986 Advances in Engineering Software paper describing the GCVSPL
package, as I said during the debate at the Nice meeting) that the half spline
order should be higher than the highest derivative sought. This leads to the
following table:
Spline order Spline name Derivative
2 Linear Smoothed data only
4 Cubic Up to and including 1st Derivative
6 Quintic ,, ,, ,, ,, 2nd Derivative
8 Heptic ,, ,, ,, ,, 3rd Derivative
etc.
(estimating 3rd and higher derivatives is, however, a highly illposed problem
and may fail unless the data are very accurate).
In fact, the smoothing constraint occurs in the same, lowest derivative that
imposes zeroend boundary conditions in the case of natural splines. Thus,
even if no zero boundary constraints are imposed (as when using periodic or
complete splines), the same table as above should be used in order to avoid
a negative bias in the highest derivative sought. While one might use higher
order splines than advocated above, this has the disadvantage of a steeper
transition bandwidth in the equivalent Butterworth filter: see the paper in
the 1986 ESB Berlin Proceedings.
Given the above choice of the spline order, GCVSPL allows a variety of options
to obtain `optimal' results. Some of these options require iterative operation
of the package, and this may naturally be timeconsuming. My Italian colleague
provided some timing comparisons, and concluded that his package is much faster
than the spline package. However, he did not state whether this was done in the
MODE < 0 mode (where previously assessed matrices are reused). Using GCVSPL
with the provided test data, such differences can be substantial. Furthermore,
I should add that I have never bothered to look at great length into computatio
nal speed ; one may easily experiment with this, by changing the TOL parameter
in the GCVSPL subroutine (currently 1D6) into a larger value (e.g., 1D3).
With a little experimenting today, I found no serious differences in the
quality of the estimation procedure, while timing improvements on the order
of 30% could be reached. However, timing is of limited importance if (as in
videobased movement analysis) other parts of the data processing chain require
much time, too.
Furthermore, the speaker claimed that his (third) derivatives were less noisy
than the spline package; however, I do not recall seeing any comparison between
known, true derivatives and estimated ones with either procedure. At any rate,
it may be useful to experiment with initial guesses for the smoothing parameter,
by first running GCVSPL in MODE=1 with a high smoothing parameter (or in MODE=4
with a large number of degrees of freedom), and than run the package in MODE=2.
This will force the package to start searching for the optimal amount of smooth
ing from a low, equivalent cutoff frequency.
The frequencydomain explanation is as follows. Assuming a signal + noise power
spectrum as depicted below,

xxxxxxxxx
 xxxx
 xxx x
 xx x x
 xxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxx
+
0 freqency > Fnyquist
the highfrequency peak may be `signal' or `noise'; neither algorithm knows
this a priori. If we have reason to believe that the high peak is a true
signal component, it should clearly be taken along; in the opposite case
(e.g., mains hum at 50, 60, 100, or 120 Hz), it should clearly be discarded.
An iterative procedure such as GCVSPL may discard the peak if searching from
below unless it is too strong, and accept it unconditionally if searching from
above. N.B.: If the noisy data truly represent a lowpass signal with `white',
additive, uncorrelated noise (i.e., with a flat spectrum), the highest frequency
is choosen where the data spectrum becomes essentially flat. Note, however,
that this is only true if there is sufficient redundancy in the data, i.e., a
significant highfrequency part of the spectrum is, indeed, flat: the criteria
via which both procedures estimate an optimum choice are based on noisy data,
and are, therefore, noisy stochastics themselves requiring sufficient
redundancy.
One should note also that finding an optimal amount of smoothing on the
original data level does not necessarily imply that also the derivatives of
this optimally estimated signal are in some way optimal. As apparent from
the Berlin 1986 paper (see also the paper "Effects of Data Smoothing on the
Reconstruction of Helical Axis Parameters in Human Joint Kinematics" by De
Lange et al., J. of Biomech. Engng 112, May 1990, 107113, Fig. 4), more
smoothing can, in fact, be appropriate. Actually, this finding may appease
the fear of some users of automatic cutoff determining algorithms that
unwarranted oversmoothing is performed; at the same time, the results
obtained f o r t h e g i v e n d a t a were quite satisfactory.
Hatze's 1981 paper justifiably addressed the issue of optimality between
derivatives.
The morale of this (possibly unusual) rebuttal is a simple one: never trust an
automatic algorithm's results blindly. Try to understand what the algorithm
does, and play around with it until you know its pro's and con's. For the
GCVSPL package, a wise activity is to run it for a given set of data at a
logarithmically increasing series of smoothing parameters in MODE=1 (first
time +1), and plot the residual variance and GCV function values. This will
provide an idea of any multiple optima caused by highfrequency peaks such as
the one depicted above.
Reference: M. D'Amico & G. Ferrigno, Techniques for the Evaluation of Deriva
tives from Noisy Biomechanical Displacement Data Using a ModelBased Bandwidth
Selection Procedure. Med. & Biolog. Engng & Comp. 28(1990), 407415.
P.S. Since the Italian group is not, to my knowledge, subscribing to BiomchL,
I'll send a copy of this posting to them by facsimile. Hopefully, they will
respond on the list. Regretfully, I may not be accessible for a while by
email as of the first of March, but I'll keep in touch with this list somehow.
Herman J. Woltring
Brussellaan 29
NL5628 TB EINDHOVEN
The Netherlands
Tel. (private) +31.40.480869
Voice/fax/modem +31.40.413744