PDA

View Full Version : Learning in Multi-Linked Manipulator Control



H.j. Woltring, Fax/tel +31.40.413 744
09-02-1992, 01:10 PM
Dear Biomch-L readers,

One of our subscribers asked me last evening to be removed from the list
because its apparently too medical contents. In the wake of the recent
postings on X-Ray System Collision Detection & Avoidance and on Artificial
Neural / Logic Networks, I thought that the below X-posting might set the
balance straight.

For those of you who do not have FTP, the BITFTP@PUCC.BITNET service can
be useful. On the Internet, ftpmail@decwrl.dec.com is a recent alternative
(send HELP in the body of an email note).

Regards -- hjw.
------------------------------------
Article 5759 in comp.ai.neural-nets:
From: ckt@eng.cam.ac.uk (C.K. Tham)
Subject: Reinforcement Learning for Robot Control
Message-ID:
Date: 1 Sep 92 13:10:22 GMT
Sender: ckt@eng.cam.ac.uk (C.K. Tham)
Distribution: comp.ai.neural-nets
Organization: Cambridge University Engineering Department, UK

TECHNICAL REPORT ANNOUNCEMENT

The following technical report is available via anonymous ftp:

REINFORCEMENT LEARNING FOR MULTI-LINKED MANIPULATOR CONTROL
by Chen K. Tham & Richard W. Prager.
(Technical Report CUED/F-INFENG/TR104)

Abstract:

We present a trajectory planning and obstacle avoidance method which uses
Reinforcement Learning to learn the appropriate real-valued torques to
apply at each joint of a simulated two-linked manipulator in order to move
the end-effector to a desired destination in the workspace. The inputs to
the controller are the joint positions and velocities which are fed
directly into a Cerebellar Model Arithmetic Computer (CMAC) (Albus,75). In
each state, the expected reward and appropriate torques for each joint are
learnt through self-experimentation using a combination of the Temporal
Difference (TD) technique (Sutton,87) and stochastic hillclimbing
(Williams,88). Actions which cause the manipulator to reach the desired
destination are rewarded whereas actions which lead to collisions with
either joint limits or obstacles are punished by an amount proportional to
the velocity before collision. After training, the manipulator is able to
move along smooth collision-free paths from different start positions in
the workspace to the destination.

The file is in compressed Postscript format (length 961935 bytes).

Procedure for obtaining the report:

unix> ftp svr-ftp.eng.cam.ac.uk
Name: anonymous
Password: (your e-mail address)
ftp> cd reports
ftp> binary
ftp> get tham_tr104.ps.Z
ftp> quit
unix> uncompress tham_tr104.ps.Z
unix> lpr tham_tr104.ps .. etc. .. to print

The authors welcome comments and suggestions from readers.

----------------------------------------------------------------------------
Chen K. THAM, E-mail: ckt@eng.cam.ac.uk
Speech, Vision and Robotics Group, Tel. : +44 223 332754
Cambridge University Engineering Department, Fax : +44 223 332662
Trumpington Street,
Cambridge CB2 1PZ,
United Kingdom.