Convenient Reinforcement Learning With Stable-Baselines3

Starling-7B: LLM with Reinforcement Learning from AI Feedback

Hands-On Deep Q-Learning

Beyond Q-Star: OpenAI’s AGI breakthrough possible with PPO

A Gentle Introduction to Deep Reinforcement Learning in JAX

Unleashing ChatGPT AI-1: Constructing an Advanced LLM-Based System

A Practitioner’s Guide to Reinforcement Learning

Decision Science Meets Design

Sb3, the Swiss Army Knife of Applied RL

Entropy-Regularized Reinforcement Learning Explained