hub / github.com/TJU-DRL-LAB/AI-Optimizer

github.com/TJU-DRL-LAB/AI-Optimizer @main sqlite

3,745 symbols 11,088 edges 424 files 607 documented · 16%

README

AI-Optimizer

AI-Optimizer is a next-generation deep reinforcement learning suit, providing rich algorithm libraries ranging from model-free to model-based RL algorithms, from single-agent to multi-agent algorithms. Moreover, AI-Optimizer contains a flexible and easy-to-use distributed training framework for efficient policy training.

AI-Optimizer now provides the following built-in libraries, and more libraries and implementations are coming soon. - Multiagent Reinforcement learning - Self-supervised Representation Reinforcement Learning - Offline Reinforcement Learning - Transfer and Multi-task Reinforcement Learning - Model-based Reinforcement Learning

Multiagent Reinforcement Learning (MARL)

The Multiagent RL repo contains the released codes of representative research works of TJU-RL-Lab on Multiagent Reinforcement Learning (MARL).

❓ Problem to Solve

Four representative applications of recent successes of MARL: unmanned aerial vehicles, game of Go, Poker games, and team-battle video games.

Multi-agent reinforcement learning (MARL) has successfully addressed many complex real-world problems, such as playing the game of Go (AlphaGo, AlphaGo Zero), playing real-time multi-player strategy games (StarCraft II, Dota 2, Honor of Kings)，playing card games (Poker, no-limit Poker), robotic control and autonomous driving (Smarts). However, MARL suffers from several challenges in theoretical analysis, in addition to those that arise in single-agent RL. We summarize below the challenges that we regard as fundamental in developing theories for MARL.

The curse of dimensionality (scalability) issue
Non-stationarity
Non-Unique Learning Goals
Exploration–exploitation tradeoff
Multiagent credit assignment problem
Partial observability
Hybrid action

Our target is to design MARL algorithms which could solve or alleviate the problems mentioned above and promote the deployment and landing of MARL in more real-world applications.

⭐️ Core Directions

We carry out our studies according to the challenges mentioned above. To solve the the curse of dimensionality issue, we design a series of scalable multiagent neural networks which could efficiently reduce the size of the search space by leveraging the permutation invariance and permutation equivariance properties, explicitly taking the action semantics into consideration, etc. To better make a balance of the exploration–exploitation tradeoff, we propose Progressive Mutual Information Collaboration to achieve more efficient cooperative exploration... An overall picture of the proposed methods is shown below.

our solutions

💦 Contribution

The main contribution of this repository is that:

For beginners who are interested in MARL, our easy-marl codebase and ZhiHu blogs: MARL and communication-based MARL can be a preliminary tutorial.
For researchers, we provide a systematic overview of typical challenges in MARL from different perspectives, each of which is a very valuable research direction and contains a series recent research works. We hope with our research works and the corresponding released codes can make it easier for researchers to design new algorithms.
For example, given the significant interest in designing novel MARL architectures over the past few years, the research direction of scalable multiagent networks is definitely of interest to the MARL community. More recently, the notion of permutation-invariance and permutation-equivariance in the design of MARL agents has relatively drawn less attention than deserved, and therefore the presented idea in API paper is interesting and very relevant to MARL researchers.
For practitioners, we release a serials of efficient, scalable, well-performed and easy to use MARL algorithms which achieve superior performance in the typical benchmarks of the MARL research community.
For example, the API-QMIX, API-VDN, API-MAPPO and API-MADDPG algorithms proposed in our paper "API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks" achieve State-Of-The-Art Performance in the StarCraft Multi-Agent Challenge (SMAC) and Multi-agent Particle Environment benchmarks, which achieves 100% win-rates in almost all hard and super-hard SMAC scenarios (never achieved before).
We strongly recommend that practitioners try and use our API-Network solution FIRST when solving practical MARL problems (because it is very easy to use and does work very well) ! We hope our works can promote the deployment and landing of MARL in more real-world applications.

See more here.

Offline-rl-algorithms (Offrl)

❓ Problem to Solve

Current deep RL methods still typically rely on active data collection to succeed, hindering their application in the real world especially when the data collection is dangerous or expensive. Offline RL (also known as batch RL) is a data-driven RL paradigm concerned with learning exclusively from static datasets of previously-collected experiences. In this setting, a behavior policy interacts with the environment to collect a set of experiences, which can later be used to learn a policy without further interaction. This paradigm can be extremely valuable in settings where online interaction is impractical. However, current offline rl methods are restricted to three challenges: * Low upper limit of algorithm: The quality of offline data determines the performance of offline reinforcement learning algorithms. How to expand low-quality offline data without additional interaction to increase the learning upper limit of offline reinforcement learning algorithms? * Poor algorithm effect: Existing off-policy/offline algorithm trains on the offline data distribution. When interacting with the environment, the distribution of the accessed state-action may change compared with the offline data (Distributional Shift). In this situation, the Q value of the pair is easy to be overestimated, which affects the overall performance. How to characterize the data outside the offline data distribution (Out Of Distribution, OOD) to avoid overestimation? * Difficulty in applying the algorithm: Due to the limited quality of the dataset, the learned strategy cannot be directly deployed in the production environment, and further online learning is required. How to design data sampling in the online training phase to avoid the sudden drop in the initial performance of the strategy due to the redundant data generated by the distribution change, and quickly converge to the optimal solution in a limited number of interactions?

💦 Contribution

This repository contains the codes of representative benchmarks and algorithms on the topic of Offline Reinforcement Learning. The repository is developed based on d3rlpy(https://github.com/takuseno/d3rlpy) following MIT license to shed lights on the research on the above three challenges. While inheriting its advantages, the additional features include (or will be included): - A unified algorithm framework with rich and fair comparisons bewteen different algorithms: - REDQ - UWAC - BRED - … - Abundant and real-world datasets: - Real-world industrial datasets - Multimodal datasets - Augmented datasets (and corresponding methods) - Datasets obtained using representation learning (and corresponding methods) - More easy-to-use log systems support: - Wandb

Ecology of Offline RL

Self-supervised Reinforcement Learning (SSRL)

SSRL repo contains the released codes of representative research works of TJU-RL-Lab on Self-supervised Representation Learning for RL.

To the best of our knowledge, this is the first code repository for SSRL established by following a systematic research taxonomy and a unified algorithmic framework.

❓ Problem to Solve

Since the RL agent always receives, processes, and delivers all kinds of data in the learning process (i.e., the typical Agent-Environment Interface), how to properly represent such "data" is naturally the key point to the effectiveness and efficiency of RL.

In this branch, we focus on three key questions as follows: - What should a good representation for RL be? (Theory) - How can we obtain or realize such good representations? (Methodology) - How can we making use of good representations to improve RL? (Downstream Learning Tasks & Application)

⭐️ Core Idea

Taking Self-supervised Learning (SSL) as our major paradigm for representation learning, we carry out our studies from four perspectives: - State Representation, - Action Representation, - Policy Representation, - Environment (and Task) Representation.

These four pespectives are major elements involved in general Agent-Environment Interface of RL. They play the roles of input, optimization target and etc. in the process of RL. The representation of these elements make a great impact on the sample efficiency, convergence optimality and cross-enviornment generalization.

The central contribution of this repo is A Unified Algorithmic Framework (Implementation Design) of SSRL Algorithm. The framework provides a unified interpretation for almost all currently existing SSRL algorithms. Moreover, the framework can also serve as a paradigm when we are going to devise new methods.

Our ultimate goal is to promote the establishment of the ecology of SSRL, which is illustrated below.

Towards addressing the key problems of RL, we study SSRL with four types of representations. For researches from all four pespectives, a unified framework of algorithm and imeplementation serves as the underpinnings. The representations studied from different pespectives further boost various downstream RL tasks. Finally, this promotes the deployment and landing of RL in real-world applications.

Ecology of SSRL

See more here.

💦 Contribution

With this repo and our research works, we want to draw the attention of RL community to studies on Self-supervised Representation Learning for RL.

For people who are insterested in RL, our introduction in this repo and our blogs can be a preliminary tutorial.
For cutting-edge RL researchers, we believe that our research thoughts and the proposed SSRL framework are insightful and inspiring, openning up new angles for future works on more advanced RL.
For RL practicers (especially who work on related fields), we provide advanced RL algorithms with strong performance in online RL (e.g., PPO-PeVFA), hybrid-action decision-making (e.g., HyAR), policy adaptation from offline experience (e.g., PAnDR) ..., which can be adopted or developed in associated academic and industrial problems.

We are also looking forward to feedback in any form to promote more in-depth researches.

Transfer and Multi-task Reinforcement Learning

Recently, Deep Reinforcement Learning (DRL) has achieved a lot of success in human-level control problems, such as video games, robot control, autonomous vehicles, smart grids and so on. However, DRL is still faced with the sample-inefficiency problem especially when the state-action space becomes large, which makes it difficult to learn from scratch. This means the agent has to use a large number of samples to learn a good policy. Furthermore, the sample-inefficiency problem is much more severe in Multiagent Reinforcement Learning (MARL) due to the exponential increase of the state-action space.

❓ Problem to Solve

**Sample-inefficien

Core symbols most depended-on inside this repo

append

called by 347

multiagent-rl/easy-marl/buffer.py

mean

called by 137

modelbased-rl/Dreamer/ED2-Dreamer/tools.py

get

called by 120

modelbased-rl/Dreamer/ED2-Dreamer/tools.py

called by 74

offline-rl-algorithms/E2O/PEX-main/pex/networks/policy.py

get

called by 73

modelbased-rl/PlaNet/planet/tools/attr_dict.py

append

called by 66

offline-rl-algorithms/E2O/d3rlpy_new/d3rlpy/online/buffers.py

step

called by 57

offline-rl-algorithms/E2O/d3rlpy_new/d3rlpy/envs/wrappers.py

copy

called by 52

modelbased-rl/BMPO/models/fc.py

Shape

Method 2,746

Class 507

Function 484

Route 8

Languages

Python100%

Modules by API surface

modelbased-rl/PlaNet/planet/control/wrappers.py109 symbols

multiagent-rl/easy-marl/envs/continuous_mpe/multiagent/rendering.py69 symbols

modelbased-rl/Dreamer/ED2-Dreamer/wrappers.py68 symbols

modelbased-rl/Dreamer/Vanilla_Dreamer/wrappers.py67 symbols

modelbased-rl/Dreamer/ED2-Dreamer/tools.py60 symbols

modelbased-rl/Dreamer/Vanilla_Dreamer/tools.py58 symbols

offline-rl-algorithms/E2O/d3rlpy_new/d3rlpy/dataset.py51 symbols

offline-rl-algorithms/E2O/d3rlpy_new/d3rlpy/models/torch/encoders.py46 symbols

offline-rl-algorithms/E2O/d3rlpy_new/d3rlpy/preprocessing/reward_scalers.py45 symbols

offline-rl-algorithms/E2O/d3rlpy_new/d3rlpy/models/torch/policies.py44 symbols

offline-rl-algorithms/E2O/d3rlpy_new/d3rlpy/base.py43 symbols

offline-rl-algorithms/E2O/d3rlpy_new/d3rlpy/torch_utility.py40 symbols

Dependencies from manifests, versioned

Click7.0 · 1×

Cython0.29.1 · 1×

GitPython2.1.11 · 1×

Keras-Applications1.0.6 · 1×

Keras-Preprocessing1.0.5 · 1×

Markdown3.0.1 · 1×

Pillow6.2.0 · 1×

PySocks1.6.8 · 1×

PyWavelets1.0.1 · 1×

PyYAML4.2b4 · 1×

Werkzeug0.15.3 · 1×

absl-py0.6.1 · 1×

For agents

$ claude mcp add AI-Optimizer \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact