Deep RL Stock Trading
Springer 2022 — Co-Author — DRL with short selling
Rutgers
Problem
Problem
Most deep RL stock trading research restricts agents to long-only positions — buy, hold, or sell what you own. Real trading includes short selling, which requires different risk management and introduces asymmetric reward dynamics. The research question: can a DRL agent learn profitable long+short strategies, and how does allowing short positions affect policy behavior and risk-adjusted returns?
Approach
Approach
We built a custom OpenAI Gym environment modeling a trading account with margin requirements for short positions. The action space includes three position states per asset: long, flat, short. The state space is a window of price history, technical indicators (RSI, MACD, Bollinger Bands), and current portfolio state. We trained and evaluated multiple DRL algorithms (DQN, PPO, A3C) with and without short-selling capability, measuring cumulative return, Sharpe ratio, and maximum drawdown across multiple market regimes.
Architecture
Architecture
Deep RL Stock Trading — system diagram
Key Technical Decisions
Key Technical Decisions
Sharpe ratio as secondary reward signal
Optimizing only for cumulative return produces agents with extremely high volatility. Adding a Sharpe ratio component to the reward shaped agents toward risk-adjusted returns. The risk-tolerance weighting became a hyperparameter that produced a meaningful spectrum of agent personalities.
Multiple DRL algorithms for comparison
Rather than championing a single algorithm, the paper provides comparative analysis across DQN, PPO, and A3C. This empirically grounded the finding that policy gradient methods (PPO, A3C) adapt more gracefully to the non-stationary nature of financial time series than value-based DQN.
Results
Results
- ✓Short-selling capability improved risk-adjusted returns (Sharpe ratio) vs. long-only agents
- ✓PPO outperformed DQN and A3C on out-of-sample test periods
- ✓Published in Springer 2022 as co-author
- ✓Custom trading environment handles margin requirements and position sizing
Tech Stack
Tech Stack
Links
Links