Introduction
Player of Games (PoG) is my autonomous cryptocurrency trading system – a collection of interconnected microservices that train, optimize, and execute algorithmic trading strategies using reinforcement learning. The system is designed to continuously improve itself, discovering optimal trading parameters through machine learning rather than manual tuning.
The name comes from Iain M. Banks’ novel – fitting for a system that learns to “play” the markets.
System Architecture
The system consists of three main components that work together:
┌─────────────────┐
│ UCB │ Port 9001
│ (RL Training) │
└────────┬────────┘
│ Models & Signals
▼
┌─────────────────┐ ┌─────────────────┐
│ Trader │ │ TraderLighter │
│ (HyperLiquid) │ │ (Lighter) │
│ Port 9002 │ │ Port 9102 │
└────────┬────────┘ └────────┬────────┘
│ │
▼ ▼
HyperLiquid Lighter
Exchange Exchange
UCB – The Brain
UCB (Upper Confidence Bound) is the reinforcement learning training service – the brain of the operation. It uses Proximal Policy Optimization (PPO) to train trading strategies, with the UCB algorithm for intelligent exploration of the parameter space.
Key capabilities:
- PPO Training – Episode-based policy training with GPU acceleration
- UCB Optimizer – Multi-armed bandit approach with temporal encoding that analyzes reward trends, momentum, and volatility
- Multi-Symbol Support – Train strategies independently on different trading pairs (ETH, BTC, SOL, etc.)
- Gymnasium Environment – Realistic market simulation using actual historical candle data
- Statistics Dashboard – Performance matrices showing win rates, drawdowns, and PnL across all strategy-symbol combinations
The system continuously trains and evaluates strategies, automatically identifying the best performers for each trading pair.
Trader – HyperLiquid Execution
Trader is the live execution service for HyperLiquid perpetual futures. It fetches trained models from UCB and executes trades in real-time.
Key capabilities:
- Auto-Strategy Selection – Automatically enables the highest-reward strategy per product
- Signal Processing – Models output HOLD, BUY, SELL, or EXIT actions
- Risk Management – Configurable risk per trade, stop losses, and per-strategy kill switches
- Position Sync – Real-time position monitoring with external close detection
- Analytics Dashboard – Equity curves, trade statistics, win rates, and R:R ratios
TraderLighter – Lighter Exchange Execution
TraderLighter mirrors the Trader service but executes on Lighter – a ZK rollup-based perpetual futures platform on Ethereum L2.
It shares the same architecture and capabilities as Trader, adapted for Lighter’s unique requirements:
- Client-side transaction signing via native Go binaries
- Market indices instead of symbol strings
- ZK rollup architecture for verifiable trades
- Precise decimal scaling per market
How It Works Together
- Training Phase – UCB trains strategies using historical market data, running thousands of episodes to find optimal parameters
- Deployment – Best-performing models are marked for deployment per trading pair
- Execution – Trader/TraderLighter fetch deployed models and begin generating signals
- Live Trading – When models signal BUY/SELL, orders are placed with automatic stop-loss management
- Continuous Improvement – UCB keeps training, and when better models emerge, they’re automatically deployed
Technology Stack
- ML Framework: Stable Baselines 3 (PPO), Gymnasium, PyTorch
- Backend: FastAPI, Uvicorn, WebSockets
- Database: SQLite with SQLAlchemy
- Exchanges: HyperLiquid SDK, Lighter SDK
- Frontend: Vanilla JavaScript, Chart.js
Current Status
The system is live and trading. UCB is at v2.7.0 with multi-symbol support, pause/resume capabilities, and comprehensive statistics dashboards. Trader is at v1.9.5 with recent improvements to position sync and stop-loss management.
Different strategies are deployed to different pairs based on which RL training produces the best results – for example, momentum-based strategies might perform better on ETH while pattern-recognition strategies excel on BTC.
What’s Next
The beauty of this architecture is its extensibility. New strategies can be added simply by creating a Python class that inherits from BaseStrategy – UCB will automatically discover it, train it, and if it performs well, deploy it to live trading.
The system continues to learn and adapt, playing the perpetual game of the markets.