Announcement

Collapse
No announcement yet.

Learning in Multi-Linked Manipulator Control

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Learning in Multi-Linked Manipulator Control

    Dear Biomch-L readers,

    One of our subscribers asked me last evening to be removed from the list
    because its apparently too medical contents. In the wake of the recent
    postings on X-Ray System Collision Detection & Avoidance and on Artificial
    Neural / Logic Networks, I thought that the below X-posting might set the
    balance straight.

    For those of you who do not have FTP, the BITFTP@PUCC.BITNET service can
    be useful. On the Internet, ftpmail@decwrl.dec.com is a recent alternative
    (send HELP in the body of an email note).

    Regards -- hjw.
    ------------------------------------
    Article 5759 in comp.ai.neural-nets:
    From: ckt@eng.cam.ac.uk (C.K. Tham)
    Subject: Reinforcement Learning for Robot Control
    Message-ID:
    Date: 1 Sep 92 13:10:22 GMT
    Sender: ckt@eng.cam.ac.uk (C.K. Tham)
    Distribution: comp.ai.neural-nets
    Organization: Cambridge University Engineering Department, UK

    TECHNICAL REPORT ANNOUNCEMENT

    The following technical report is available via anonymous ftp:

    REINFORCEMENT LEARNING FOR MULTI-LINKED MANIPULATOR CONTROL
    by Chen K. Tham & Richard W. Prager.
    (Technical Report CUED/F-INFENG/TR104)

    Abstract:

    We present a trajectory planning and obstacle avoidance method which uses
    Reinforcement Learning to learn the appropriate real-valued torques to
    apply at each joint of a simulated two-linked manipulator in order to move
    the end-effector to a desired destination in the workspace. The inputs to
    the controller are the joint positions and velocities which are fed
    directly into a Cerebellar Model Arithmetic Computer (CMAC) (Albus,75). In
    each state, the expected reward and appropriate torques for each joint are
    learnt through self-experimentation using a combination of the Temporal
    Difference (TD) technique (Sutton,87) and stochastic hillclimbing
    (Williams,88). Actions which cause the manipulator to reach the desired
    destination are rewarded whereas actions which lead to collisions with
    either joint limits or obstacles are punished by an amount proportional to
    the velocity before collision. After training, the manipulator is able to
    move along smooth collision-free paths from different start positions in
    the workspace to the destination.

    The file is in compressed Postscript format (length 961935 bytes).

    Procedure for obtaining the report:

    unix> ftp svr-ftp.eng.cam.ac.uk
    Name: anonymous
    Password: (your e-mail address)
    ftp> cd reports
    ftp> binary
    ftp> get tham_tr104.ps.Z
    ftp> quit
    unix> uncompress tham_tr104.ps.Z
    unix> lpr tham_tr104.ps .. etc. .. to print

    The authors welcome comments and suggestions from readers.

    ----------------------------------------------------------------------------
    Chen K. THAM, E-mail: ckt@eng.cam.ac.uk
    Speech, Vision and Robotics Group, Tel. : +44 223 332754
    Cambridge University Engineering Department, Fax : +44 223 332662
    Trumpington Street,
    Cambridge CB2 1PZ,
    United Kingdom.
Working...
X