Home > Papers

 
 
Causal-Proto SSRL: Learning Dynamic-Necessary State Variables for Multi-Environment Reinforcement Learning
Zhang Meng 1,Zhang Chunhong 2,Hu Zheng 1 *,Zhuang Benhui 1
1.State Key Laboratory Of Networking and Switching Technology, University of Posts and Telecommunications, Beijing 100876;Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, 100876;State Key Laboratory Of Networking and Switching Technology, University of Posts and Telecommunications, Beijing 100876;State Key Laboratory Of Networking and Switching Technology, University of Posts and Telecommunications, Beijing 100876
2.Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Pos ts and Telecommunications, 100876
*Correspondence author
#Submitted by
Subject:
Funding: none
Opened online:27 March 2024
Accepted by: none
Citation: Zhang Meng,Zhang Chunhong,Hu Zheng.Causal-Proto SSRL: Learning Dynamic-Necessary State Variables for Multi-Environment Reinforcement Learning[OL]. [27 March 2024] http://en.paper.edu.cn/en_releasepaper/content/4762946
 
 
The ability to learn directly from high-dimensional observations such as pixels allows reinforcement learning(RL) to achieve more widely applications. However, high-dimensional observations contain entangled task-relevant and task-irrelevant informations, as well as informations related to actions but unnecessary, which leads to non-essential dependencies and thus affects the generalization and robustness of the reinforcement learning. In order to learn abstract state representations from high-dimensional observations which generalized acorss multiple tasks and robust in environments with different task-unnecessary informations, this paper formulates the POMDP as Partially Observable Temporal Causal Dynamic Models (POTCDMs) and proposes a self-supervised RL with causal representation learning, Causal-Proto RL. This method seperates encoded observations into dynamic-necessary and dynamic-unnecessary state variables where only dynamic-necessary state variables are fed into RL by predicting the causal relationships simultaneously. This method is pretrained in absence of specific task rewards with an intrinsic rewards fo curiosity of causal relationships and are implemented in multiple difficult downstream tasks. This paper evaluate the algorithm in DeepMind Control Suit. This algorithm performs as well as other SOTA slef-supervised RL on a series of downstream tasks in environents as same as pretraining, and demonstrates generalization and robustness in downstream task environments different from pretraining.
Keywords:artificial intelligence, reinforcement learning, causal learning, representation learning
 
 
 

For this paper

  • PDF (0B)
  • ● Revision 0   
  • ● Print this paper
  • ● Recommend this paper to a friend
  • ● Add to my favorite list

    Saved Papers

    Please enter a name for this paper to be shown in your personalized Saved Papers list

Tags

Add yours

Related Papers

Statistics

PDF Downloaded 4
Bookmarked 0
Recommend 0
Comments Array
Submit your papers