Dal Alert!

Receive alerts from Dalhousie by text message.

X

PhD Thesis Proposal - A Predictive Actor-Critic: An Actor-Critic that Learns Internal Models

Who: Farzaneh Sheikhnezhad Fard

Title: A Predictive Actor-Critic: An Actor-Critic that Learns Internal Models

Examining Committee:

Dr. Thomas Trappenberg - Faculty of Computer Science (Supervisor)
Dr. Evangelos Milios - Faculty of Computer Science (Reader)
Dr. Malcolm Heywood - Faculty of Computer Science (Reader)
Dr. Mae Seto - Faculty of Computer Science (External Examiner)

Abstract:

In reinforcement learning, an agent must learn behavior through trial-and-error interactions with an environment. There are two types of reinforcement learning: a) model-free and b) model-based. The model consists of knowledge of the state transition probability function that is the knowledge of knowing the next state and future reward for the agent corresponding to the taken action and its current state. Model-based solutions learn such models and use it to derive a controller, while, model-free methods learn a controller without learning an explicit model.

The model-free reinforcement learning methods have demonstrated success in recent years by learning to perform complex tasks without a priori knowledge about system dynamics. Moreover, using deep neural network enables these methods to perform better. In contrast to model-free methods, model-based solutions perform better and converge faster. However, obtaining the model is very difficult or impossible in some cases.

From the literature, we know that the brain takes advantage of both model-free and model-based control systems. We propose a novel architecture that shows how two control systems cooperate. Our proposed model combines a model-free and a model-based control system similar to neuroscientific findings in the literature. The present study tests the ability of such a model to control a simulated robotic arm (with two joints) during multiple reaching tasks, including (A) both static and dynamic target properties, (B) absence of visual inputs, and (C) adapting to slowly changing kinematics. We show that such an architecture is capable of rapidly learning dynamics of the system, and show superior performance to existing state of the art models. We also expand the proposed model and represent the second version of it by which arbitration between the two control system is explainable.

Time

Location

FCS Room 142