AI-Optimizer is a next-generation deep reinforcement learning suit, providing rich algorithm libraries ranging from model-free to model-based RL algorithms, from single-agent to multi-agent algorithms. Moreover, AI-Optimizer contains a flexible and easy-to-use distributed training framework for efficient policy training.

AI-Optimizer now provides the following built-in libraries, and more libraries and implementations are coming soon. - Multiagent Reinforcement learning - Self-supervised Representation Reinforcement Learning - Offline Reinforcement Learning - Transfer and Multi-task Reinforcement Learning - Model-based Reinforcement Learning
The Multiagent RL repo contains the released codes of representative research works of TJU-RL-Lab on Multiagent Reinforcement Learning (MARL).

Multi-agent reinforcement learning (MARL) has successfully addressed many complex real-world problems, such as playing the game of Go (AlphaGo, AlphaGo Zero), playing real-time multi-player strategy games (StarCraft II, Dota 2, Honor of Kings),playing card games (Poker, no-limit Poker), robotic control and autonomous driving (Smarts). However, MARL suffers from several challenges in theoretical analysis, in addition to those that arise in single-agent RL. We summarize below the challenges that we regard as fundamental in developing theories for MARL.
Our target is to design
MARL algorithms which could solve or alleviate the problems mentioned above and promote the deployment and landing of MARL in more real-world applications.
We carry out our studies according to the challenges mentioned above. To solve the the curse of dimensionality issue, we design a series of scalable multiagent neural networks which could efficiently reduce the size of the search space by leveraging the permutation invariance and permutation equivariance properties, explicitly taking the action semantics into consideration, etc. To better make a balance of the exploration–exploitation tradeoff, we propose Progressive Mutual Information Collaboration to achieve more efficient cooperative exploration... An overall picture of the proposed methods is shown below.

The main contribution of this repository is that:
For beginners who are interested in MARL, our easy-marl codebase and ZhiHu blogs: MARL and communication-based MARL can be a preliminary tutorial.
For researchers, we provide a systematic overview of typical challenges in MARL from different perspectives, each of which is a very valuable research direction and contains a series recent research works. We hope with our research works and the corresponding released codes can make it easier for researchers to design new algorithms.
For example, given the significant interest in designing novel MARL architectures over the past few years, the research direction of scalable multiagent networks is definitely of interest to the MARL community. More recently, the notion of permutation-invariance and permutation-equivariance in the design of MARL agents has relatively drawn less attention than deserved, and therefore the presented idea in API paper is interesting and very relevant to MARL researchers.
For practitioners, we release a serials of efficient, scalable, well-performed and easy to use MARL algorithms which achieve superior performance in the typical benchmarks of the MARL research community.
For example, the API-QMIX, API-VDN, API-MAPPO and API-MADDPG algorithms proposed in our paper "API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks" achieve State-Of-The-Art Performance in the StarCraft Multi-Agent Challenge (SMAC) and Multi-agent Particle Environment benchmarks, which achieves 100% win-rates in almost all hard and super-hard SMAC scenarios (never achieved before).
See more here.
Current deep RL methods still typically rely on active data collection to succeed, hindering their application in the real world especially when the data collection is dangerous or expensive. Offline RL (also known as batch RL) is a data-driven RL paradigm concerned with learning exclusively from static datasets of previously-collected experiences. In this setting, a behavior policy interacts with the environment to collect a set of experiences, which can later be used to learn a policy without further interaction. This paradigm can be extremely valuable in settings where online interaction is impractical. However, current offline rl methods are restricted to three challenges: * Low upper limit of algorithm: The quality of offline data determines the performance of offline reinforcement learning algorithms. How to expand low-quality offline data without additional interaction to increase the learning upper limit of offline reinforcement learning algorithms? * Poor algorithm effect: Existing off-policy/offline algorithm trains on the offline data distribution. When interacting with the environment, the distribution of the accessed state-action may change compared with the offline data (Distributional Shift). In this situation, the Q value of the pair is easy to be overestimated, which affects the overall performance. How to characterize the data outside the offline data distribution (Out Of Distribution, OOD) to avoid overestimation? * Difficulty in applying the algorithm: Due to the limited quality of the dataset, the learned strategy cannot be directly deployed in the production environment, and further online learning is required. How to design data sampling in the online training phase to avoid the sudden drop in the initial performance of the strategy due to the redundant data generated by the distribution change, and quickly converge to the optimal solution in a limited number of interactions?
This repository contains the codes of representative benchmarks and algorithms on the topic of Offline Reinforcement Learning. The repository is developed based on d3rlpy(https://github.com/takuseno/d3rlpy) following MIT license to shed lights on the research on the above three challenges. While inheriting its advantages, the additional features include (or will be included): - A unified algorithm framework with rich and fair comparisons bewteen different algorithms: - REDQ - UWAC - BRED - … - Abundant and real-world datasets: - Real-world industrial datasets - Multimodal datasets - Augmented datasets (and corresponding methods) - Datasets obtained using representation learning (and corresponding methods) - More easy-to-use log systems support: - Wandb

SSRL repo contains the released codes of representative research works of TJU-RL-Lab on Self-supervised Representation Learning for RL.
To the best of our knowledge, this is the first code repository for SSRL established by following a systematic research taxonomy and a unified algorithmic framework.
Since the RL agent always receives, processes, and delivers all kinds of data in the learning process (i.e., the typical Agent-Environment Interface), how to properly represent such "data" is naturally the key point to the effectiveness and efficiency of RL.
In this branch, we focus on three key questions as follows: - What should a good representation for RL be? (Theory) - How can we obtain or realize such good representations? (Methodology) - How can we making use of good representations to improve RL? (Downstream Learning Tasks & Application)
Taking Self-supervised Learning (SSL) as our major paradigm for representation learning, we carry out our studies from four perspectives: - State Representation, - Action Representation, - Policy Representation, - Environment (and Task) Representation.
These four pespectives are major elements involved in general Agent-Environment Interface of RL. They play the roles of input, optimization target and etc. in the process of RL. The representation of these elements make a great impact on the sample efficiency, convergence optimality and cross-enviornment generalization.
The central contribution of this repo is A Unified Algorithmic Framework (Implementation Design) of SSRL Algorithm. The framework provides a unified interpretation for almost all currently existing SSRL algorithms. Moreover, the framework can also serve as a paradigm when we are going to devise new methods.
Our ultimate goal is to promote the establishment of the ecology of SSRL, which is illustrated below.
Towards addressing the key problems of RL, we study SSRL with four types of representations. For researches from all four pespectives, a unified framework of algorithm and imeplementation serves as the underpinnings. The representations studied from different pespectives further boost various downstream RL tasks. Finally, this promotes the deployment and landing of RL in real-world applications.

See more here.
With this repo and our research works, we want to draw the attention of RL community to studies on Self-supervised Representation Learning for RL.
We are also looking forward to feedback in any form to promote more in-depth researches.
Recently, Deep Reinforcement Learning (DRL) has achieved a lot of success in human-level control problems, such as video games, robot control, autonomous vehicles, smart grids and so on. However, DRL is still faced with the sample-inefficiency problem especially when the state-action space becomes large, which makes it difficult to learn from scratch. This means the agent has to use a large number of samples to learn a good policy. Furthermore, the sample-inefficiency problem is much more severe in Multiagent Reinforcement Learning (MARL) due to the exponential increase of the state-action space.
**Sample-inefficien
$ claude mcp add AI-Optimizer \
-- python -m otcore.mcp_server <graph>