Announcement

Collapse
No announcement yet.

Xpost: Raw data archiving, publication, and science

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Xpost: Raw data archiving, publication, and science

    Dear Biomch-L readers,

    The following posting was received today from Psycoloquy; some of the issues
    might be relevant for the Biomch-L archives, too, both on the Biomch-L List-
    Server and at ftp.nici.kun.nl:/pub/biomch-l under anonymous ftp. For example,
    during the past few days I have been reviewing some journal manuscripts in
    which published software in the biokinematics field was compared with unpub-
    lished material.

    Regards -- hjw

    - = - = - = - = - = - = - = - = - = - = - = - = - = - = - = - = - = -

    Received: Mon, 1 Jun 92 19:45 MET
    Date: Fri, 29 May 1992 01:59:36 EST
    From: Stevan Harnad
    Subject: psycoloquy.92.3.29.data-archive.1.skoyles (198 lines)
    Sender: "PSYCOLOQUY: Refereed Electronic Journal of Peer Discussion"

    To: Multiple recipients of

    Skoyles: FTP INTERNET DATA ARCHIVING: A Cousin for PSYCOLOQUY

    This target article has been accepted for publication in PSYCOLOQUY.
    Commentary is now invited. Commentaries should not exceed 100 lines.
    Each should have a keyword-indexable title and the commentator's full
    name and affiliation. Please submit commentaries to:

    psyc@pucc.bitnet or psyc@pucc.princeton.edu
    -----------------------------------------------------------------------
    psycoloquy.92.3.29.data-archive.1.skoyles Friday May 29 1992
    ISSN 1055-0143 (21 paragraphs, 2 references, 182 lines)
    Copyright 1992 John R. Skoyles

    FTP INTERNET DATA ARCHIVING: A Cousin for PSYCOLOQUY

    John R. Skoyles
    Department of Psychology
    University College London
    WC1E 6BT, UK
    ucjtprs@ucl.ac.uk

    1.0 ABSTRACT: American Psychological Association (APA) journals do
    not publish raw data, hence data are effectively inaccessible. I
    propose that authors of research papers should transfer their data
    to an Internet site so it can be accessed over Internet by anonymous
    ftp. I suggest that such data archiving would (1) make fraud easier
    to detect, (2) encourage scientific criticism and (3) aid the
    scientific process in general. Nor should it be difficult to
    implement.

    KEYWORDS: data archiving, deception, electronic retrieval, error
    detection, ftp, fraud, meta-analysis, statistics

    1.1 Experimental data are rarely published. Usually we are happy with
    their author's own statistical treatment. But not always. Researchers
    do not always fully analyse their data; sometimes editors restrict
    their publication space; and sometimes we have an idea we would like to
    try out on those data. It would be nice if the experimental data we read
    about were easy to access. I suggest that the approaching-universal use
    of computers and the Internet mail and file transfer system have made
    this possible. PSYCOLOQUY is archived and easily accessed through
    anonymous ftp: There is no reason why archived research data should not
    be equally accessible. Though there are several potential problems with
    ftp archiving of published data, the benefits would, I believe, vastly
    outweigh them.

    2.1 Here follows a case for the ftp archiving of data published in APA
    (American Psychological Association) journals. I raise a few objections
    and last consider how it might be implemented. Note that when I refer
    to ftp this also applies to other forms of electronic data transfer.

    3.1 First, electronic data archiving should be easy to implement and
    will become increasingly so. Most researchers now (unlike, say, even
    two years ago) would have little trouble archiving their data upon
    publication. Most Results sections are based upon computer analyzed
    ASCII data files (usually by a statistical package such as SPSS or
    BMDP). Most researchers should have their raw data stored in a form
    (i.e. file and subdirectory names) which makes it easy for other
    researchers to use. The commands and procedures for transferring it to
    a central data archive will be familiar to most psychologists (if not,
    most departments have people who will help). Of course, all the details
    about the research will be contained in the published paper, so these
    need not be stored. Indeed, the names of journals, their volume and
    issue numbers, make a convenient directory and subdirectory structure
    for organising the archive. There is something self evident about what
    data are contained in /JEPHPP/18/1/SMITH/EXP1. And just as it is easy
    to MSEND data to an archive so it is easy to MGET them for reanalysis.

    3.2.1 Second, the scientific ethic is to make error correction as easy
    as possible. Scientists are not always entirely competent or honest.
    Numerous cases of fraud and intellectual dishonesty have occurred in
    psychology (as elsewhere in science). Researchers are subject to
    enormous pressures to publish but unfortunately this normally requires
    positive findings. This puts pressure on researchers to rerun analyses
    (changing criteria for categorising data, excluding subjects, treating
    missing data, etc.) when only negative findings turn up. It is not
    clear how many researchers resist these pressures on the integrity of
    data analysis. At present, it is difficult to check. In a recent case
    reported in *Science*, two psychologists were only able to check the
    data analysis of another psychologist through the intervention of
    lawyers (Palca 1991).

    3.2.2 There is public disquiet in the US Congress (notably, on the
    part of Congressman John Dingell) concerning fraud and intellectual
    dishonesty in science. Research on published fraudulent papers has
    revealed many defects (Stewart & Feder 1987). It is likely that any
    archived data would contain even more accessible and noticeable defects
    (in their data distributions, treatment and analysis). Archiving data
    would thus make it easier to detect both fraud and intellectual
    dishonesty.

    3.3 Third, much honestly obtained and analyzed data is incompetently
    handled, yut many legitimate criticisms never arise because of
    difficulties accessing data. At present, if you suspect that a
    researcher's own analysis gives only part of the story or is
    misleading, you face an involved process of contacting them for the
    original data (something inconvenient to all concerned). Archiving data
    would increase the opportunities for legitimate criticism of published
    work.

    3.4 Fourth, researchers ask different questions. Sometimes a
    researcher may wish to reanalyse data to answer questions the original
    authors ignored. People carrying out meta-analyses will often want to
    check the quality of the work they are using. At present this is not
    possible.

    3.5 Fifth, students could gain much by examining real research papers
    and then "playing around" with their data, seeing the affects of
    different data-analytic strategies. They might even even find things
    overlooked by their authors.

    3.6 Sixth, much data is accidentally lost (despite APA's requirement
    that authors retain their data for a number of years). An ftp archive
    would make a convenient data backup.

    3.7 Seventh, scientific papers are printed on paper -- this, not the
    nature of science, is the reason data are not normally made accessible
    at this time. Science is about open communication that maximally
    exposes ideas and arguments to criticism (one legitimate criticism of
    an idea is the way its data are handled). Printed paper is a convenient
    means for opening written ideas to criticism, but it is unsuitable for
    making data accessible to criticism (it limits the quantity which can
    be published and communicates in a form that is inconvenient for
    computer reanalysis). Print has until recently been the only means for
    disseminating scientific ideas and data. Hence the tradition has arisen
    of limiting the dissemination of data. We should recognise the
    opportunity that electronic archives provide for breaking with this.

    4.0 There are some reasons against ftp archiving:

    4.1 Certain classes of data (e.g., clinical data) may have to be
    excluded to preserve the confidentiality and privacy of those from whom
    it is collected. This constraint does not apply to large portions of
    psychology, however, such as research on animals, reaction time studies on
    student subjects, or computer simulations.

    4.2 Researchers certainly have the right to the "first go" at their
    data. However, the fact of publication, unless contrary notice is
    given, usually signifies that the data have already been substantially
    analyzed, and frequently no further analysis is intended.

    4.3 There is another entirely invalid objection. Many researchers
    will be uncomfortable with their data being ftp archived because none
    of us are perfect. If our data can be reanalyzed we may be shown to
    have carried out, quite unintentionally, inappropriate or misleading
    analysis. To some extent the present state of affairs is quite
    convenient for hiding the fact that many researchers could be better
    statisticians and could keep better records.

    5.0 Since impracticability may be an objection, I describe how an ftp
    archive might work:

    5.1 The archive would have to be moderated by an archivist. Journal
    editors, for example, could contact the archivist, who would in turn
    contact the paper's chief author, providing a password and a temporary
    directory into which raw data files could be transferred. Researchers
    would be free to create the subdirectories they felt best organised the
    data and to write a brief contents file. The archivist would transfer
    the files to a permanent directory. A standard note on the front page
    of the published paper would state whether its data had been archived.

    5.2 I suggest that not only the raw data be stored but also the
    statistical and data analysis programs (SPSS or BMDP; or uncomplied
    Basic, Pascal or C) used to analyse them. Without these programs,
    tracing the transformation of the raw data into the reported
    statistical findings would be much more difficult.

    5.3 Parallel to the archive there should be a directory for comments
    by people who have accessed the data, to record their findings. Anyone
    wanting to reexamine anyone's data would be interested in any previous
    reanalyses, good and bad.

    5.4 There is no reason such a data archive could not grow to
    cover non-APA journals, theses, and nonpublished data (for example,
    unpublished negative findings).

    5.5 Such a system would of course involve some cost and effort,
    perhaps even some inconvenience. However, with the public and
    congressional concern about whether scientists are maximally ensuring
    the integrity of their data, a ftp archive would show a commitment from
    the psychological community to ensuring honesty in published
    psychological research.

    REFERENCES.

    Palca, J. (1991). News and Comment: Get-the-lead-out guru challenged.
    Science 253: 842-844.

    Stewart, W. W. & Feder, N. (1987). The integrity of the scientific
    literature. Nature 325: 207-214.
Working...
X