View Full Version : Xpost: Raw data archiving, publication, and science

H.j. Woltring, Fax/tel +31.40.413 744
06-01-1992, 07:06 AM
Dear Biomch-L readers,

The following posting was received today from Psycoloquy; some of the issues
might be relevant for the Biomch-L archives, too, both on the Biomch-L List-
Server and at ftp.nici.kun.nl:/pub/biomch-l under anonymous ftp. For example,
during the past few days I have been reviewing some journal manuscripts in
which published software in the biokinematics field was compared with unpub-
lished material.

Regards -- hjw

- = - = - = - = - = - = - = - = - = - = - = - = - = - = - = - = - = -

Received: Mon, 1 Jun 92 19:45 MET
Date: Fri, 29 May 1992 01:59:36 EST
From: Stevan Harnad
Subject: psycoloquy.92.3.29.data-archive.1.skoyles (198 lines)
Sender: "PSYCOLOQUY: Refereed Electronic Journal of Peer Discussion"

To: Multiple recipients of


This target article has been accepted for publication in PSYCOLOQUY.
Commentary is now invited. Commentaries should not exceed 100 lines.
Each should have a keyword-indexable title and the commentator's full
name and affiliation. Please submit commentaries to:

psyc@pucc.bitnet or psyc@pucc.princeton.edu
psycoloquy.92.3.29.data-archive.1.skoyles Friday May 29 1992
ISSN 1055-0143 (21 paragraphs, 2 references, 182 lines)
Copyright 1992 John R. Skoyles


John R. Skoyles
Department of Psychology
University College London

1.0 ABSTRACT: American Psychological Association (APA) journals do
not publish raw data, hence data are effectively inaccessible. I
propose that authors of research papers should transfer their data
to an Internet site so it can be accessed over Internet by anonymous
ftp. I suggest that such data archiving would (1) make fraud easier
to detect, (2) encourage scientific criticism and (3) aid the
scientific process in general. Nor should it be difficult to

KEYWORDS: data archiving, deception, electronic retrieval, error
detection, ftp, fraud, meta-analysis, statistics

1.1 Experimental data are rarely published. Usually we are happy with
their author's own statistical treatment. But not always. Researchers
do not always fully analyse their data; sometimes editors restrict
their publication space; and sometimes we have an idea we would like to
try out on those data. It would be nice if the experimental data we read
about were easy to access. I suggest that the approaching-universal use
of computers and the Internet mail and file transfer system have made
this possible. PSYCOLOQUY is archived and easily accessed through
anonymous ftp: There is no reason why archived research data should not
be equally accessible. Though there are several potential problems with
ftp archiving of published data, the benefits would, I believe, vastly
outweigh them.

2.1 Here follows a case for the ftp archiving of data published in APA
(American Psychological Association) journals. I raise a few objections
and last consider how it might be implemented. Note that when I refer
to ftp this also applies to other forms of electronic data transfer.

3.1 First, electronic data archiving should be easy to implement and
will become increasingly so. Most researchers now (unlike, say, even
two years ago) would have little trouble archiving their data upon
publication. Most Results sections are based upon computer analyzed
ASCII data files (usually by a statistical package such as SPSS or
BMDP). Most researchers should have their raw data stored in a form
(i.e. file and subdirectory names) which makes it easy for other
researchers to use. The commands and procedures for transferring it to
a central data archive will be familiar to most psychologists (if not,
most departments have people who will help). Of course, all the details
about the research will be contained in the published paper, so these
need not be stored. Indeed, the names of journals, their volume and
issue numbers, make a convenient directory and subdirectory structure
for organising the archive. There is something self evident about what
data are contained in /JEPHPP/18/1/SMITH/EXP1. And just as it is easy
to MSEND data to an archive so it is easy to MGET them for reanalysis.

3.2.1 Second, the scientific ethic is to make error correction as easy
as possible. Scientists are not always entirely competent or honest.
Numerous cases of fraud and intellectual dishonesty have occurred in
psychology (as elsewhere in science). Researchers are subject to
enormous pressures to publish but unfortunately this normally requires
positive findings. This puts pressure on researchers to rerun analyses
(changing criteria for categorising data, excluding subjects, treating
missing data, etc.) when only negative findings turn up. It is not
clear how many researchers resist these pressures on the integrity of
data analysis. At present, it is difficult to check. In a recent case
reported in *Science*, two psychologists were only able to check the
data analysis of another psychologist through the intervention of
lawyers (Palca 1991).

3.2.2 There is public disquiet in the US Congress (notably, on the
part of Congressman John Dingell) concerning fraud and intellectual
dishonesty in science. Research on published fraudulent papers has
revealed many defects (Stewart & Feder 1987). It is likely that any
archived data would contain even more accessible and noticeable defects
(in their data distributions, treatment and analysis). Archiving data
would thus make it easier to detect both fraud and intellectual

3.3 Third, much honestly obtained and analyzed data is incompetently
handled, yut many legitimate criticisms never arise because of
difficulties accessing data. At present, if you suspect that a
researcher's own analysis gives only part of the story or is
misleading, you face an involved process of contacting them for the
original data (something inconvenient to all concerned). Archiving data
would increase the opportunities for legitimate criticism of published

3.4 Fourth, researchers ask different questions. Sometimes a
researcher may wish to reanalyse data to answer questions the original
authors ignored. People carrying out meta-analyses will often want to
check the quality of the work they are using. At present this is not

3.5 Fifth, students could gain much by examining real research papers
and then "playing around" with their data, seeing the affects of
different data-analytic strategies. They might even even find things
overlooked by their authors.

3.6 Sixth, much data is accidentally lost (despite APA's requirement
that authors retain their data for a number of years). An ftp archive
would make a convenient data backup.

3.7 Seventh, scientific papers are printed on paper -- this, not the
nature of science, is the reason data are not normally made accessible
at this time. Science is about open communication that maximally
exposes ideas and arguments to criticism (one legitimate criticism of
an idea is the way its data are handled). Printed paper is a convenient
means for opening written ideas to criticism, but it is unsuitable for
making data accessible to criticism (it limits the quantity which can
be published and communicates in a form that is inconvenient for
computer reanalysis). Print has until recently been the only means for
disseminating scientific ideas and data. Hence the tradition has arisen
of limiting the dissemination of data. We should recognise the
opportunity that electronic archives provide for breaking with this.

4.0 There are some reasons against ftp archiving:

4.1 Certain classes of data (e.g., clinical data) may have to be
excluded to preserve the confidentiality and privacy of those from whom
it is collected. This constraint does not apply to large portions of
psychology, however, such as research on animals, reaction time studies on
student subjects, or computer simulations.

4.2 Researchers certainly have the right to the "first go" at their
data. However, the fact of publication, unless contrary notice is
given, usually signifies that the data have already been substantially
analyzed, and frequently no further analysis is intended.

4.3 There is another entirely invalid objection. Many researchers
will be uncomfortable with their data being ftp archived because none
of us are perfect. If our data can be reanalyzed we may be shown to
have carried out, quite unintentionally, inappropriate or misleading
analysis. To some extent the present state of affairs is quite
convenient for hiding the fact that many researchers could be better
statisticians and could keep better records.

5.0 Since impracticability may be an objection, I describe how an ftp
archive might work:

5.1 The archive would have to be moderated by an archivist. Journal
editors, for example, could contact the archivist, who would in turn
contact the paper's chief author, providing a password and a temporary
directory into which raw data files could be transferred. Researchers
would be free to create the subdirectories they felt best organised the
data and to write a brief contents file. The archivist would transfer
the files to a permanent directory. A standard note on the front page
of the published paper would state whether its data had been archived.

5.2 I suggest that not only the raw data be stored but also the
statistical and data analysis programs (SPSS or BMDP; or uncomplied
Basic, Pascal or C) used to analyse them. Without these programs,
tracing the transformation of the raw data into the reported
statistical findings would be much more difficult.

5.3 Parallel to the archive there should be a directory for comments
by people who have accessed the data, to record their findings. Anyone
wanting to reexamine anyone's data would be interested in any previous
reanalyses, good and bad.

5.4 There is no reason such a data archive could not grow to
cover non-APA journals, theses, and nonpublished data (for example,
unpublished negative findings).

5.5 Such a system would of course involve some cost and effort,
perhaps even some inconvenience. However, with the public and
congressional concern about whether scientists are maximally ensuring
the integrity of their data, a ftp archive would show a commitment from
the psychological community to ensuring honesty in published
psychological research.


Palca, J. (1991). News and Comment: Get-the-lead-out guru challenged.
Science 253: 842-844.

Stewart, W. W. & Feder, N. (1987). The integrity of the scientific
literature. Nature 325: 207-214.