ResearchSpringer 2022Co-AuthorNot on GitHub

Deep RL Stock Trading

Springer 2022 — Co-Author — DRL with short selling

Rutgers

6 technologies
2 key decisions
4 results

Problem

Problem

Most deep RL stock trading research restricts agents to long-only positions — buy, hold, or sell what you own. Real trading includes short selling, which requires different risk management and introduces asymmetric reward dynamics. The research question: can a DRL agent learn profitable long+short strategies, and how does allowing short positions affect policy behavior and risk-adjusted returns?

Approach

Approach

We built a custom OpenAI Gym environment modeling a trading account with margin requirements for short positions. The action space includes three position states per asset: long, flat, short. The state space is a window of price history, technical indicators (RSI, MACD, Bollinger Bands), and current portfolio state. We trained and evaluated multiple DRL algorithms (DQN, PPO, A3C) with and without short-selling capability, measuring cumulative return, Sharpe ratio, and maximum drawdown across multiple market regimes.

Architecture

Architecture

Deep RL Stock Trading — system diagram

Market Data (yfina…Custom Trading GymState: OHLCV + ind…Action: long/flat/…DRL Agent (DQN/PPO…Return / Sharpe / …

Key Technical Decisions

Key Technical Decisions

Assembly Instructions — 2 Steps
01

Sharpe ratio as secondary reward signal

Optimizing only for cumulative return produces agents with extremely high volatility. Adding a Sharpe ratio component to the reward shaped agents toward risk-adjusted returns. The risk-tolerance weighting became a hyperparameter that produced a meaningful spectrum of agent personalities.

02

Multiple DRL algorithms for comparison

Rather than championing a single algorithm, the paper provides comparative analysis across DQN, PPO, and A3C. This empirically grounded the finding that policy gradient methods (PPO, A3C) adapt more gracefully to the non-stationary nature of financial time series than value-based DQN.

Results

Results

  • Short-selling capability improved risk-adjusted returns (Sharpe ratio) vs. long-only agents
  • PPO outperformed DQN and A3C on out-of-sample test periods
  • Published in Springer 2022 as co-author
  • Custom trading environment handles margin requirements and position sizing

Tech Stack

Tech Stack

PyTorchOpenAI GymPythonPandasNumPyyfinance

Links