ResearchIEEE TENSYMP 2023First AuthorNot on GitHub

Deep RL Traffic Control

IEEE TENSYMP 2023 — First Author — Deep RL for intelligent traffic signal control

Rutgers RUCI Lab

5 technologies
2 key decisions
4 results

Problem

Problem

Fixed-time traffic signal controllers are ubiquitous despite being provably suboptimal — they use static phase durations calibrated for average historical conditions and cannot adapt to real-time traffic state. Actuated controllers improve on this but follow simple rule-based logic. The research question: can a deep reinforcement learning agent learn a more optimal signal control policy from simulation experience alone?

Approach

Approach

I formulated adaptive traffic signal control as a continuous control MDP. The state space is the queue length and waiting time for each approach at the intersection. The action space is continuous — the agent outputs a phase duration in seconds. The reward is negative total vehicle delay (minimizing delay = maximizing throughput). DDPG (Deep Deterministic Policy Gradient) was chosen for its ability to handle continuous action spaces. Training used the SUMO traffic simulator with an OpenAI Gym interface I built for it. The agent was evaluated against fixed-time, Webster-optimal, and actuated control baselines.

Architecture

Architecture

Deep RL Traffic Control — system diagram

SUMO SimulatorCustom Gym EnvState: queue + waitReward: −delayDDPG AgentActor (phase dur)Critic (Q-value)

Key Technical Decisions

Key Technical Decisions

Assembly Instructions — 2 Steps
01

DDPG over DQN for continuous action space

Discretizing phase duration into bins (DQN-compatible) introduces quantization error and increases action space size. DDPG directly outputs a continuous duration, enabling finer-grained signal timing and simpler policy representation.

02

Single-intersection scope for the paper

Multi-intersection coordination adds the challenge of multi-agent credit assignment. Isolating the single-intersection problem allowed rigorous comparison against established baselines and a clean ablation study. Multi-intersection extension is noted as future work.

Results

Results

  • 17–23% reduction in average vehicle delay vs. fixed-time control
  • Outperformed Webster-optimal and actuated controllers across all traffic scenarios
  • Published as first author at IEEE TENSYMP 2023
  • Custom OpenAI Gym interface to SUMO simulator contributed to research infrastructure

Tech Stack

Tech Stack

PyTorchSUMOOpenAI GymPythonNumPy

Links