The problem
Cooperative Intelligent Transportation Systems (C-ITS) have a specific coordination problem that classical traffic-light optimization does not: the vehicles themselves are the decision agents. There is no central signal head at the intersection deciding who goes. There is a fleet of connected vehicles approaching a shared conflict zone, each with its own local view, each latency-bound to sub-100ms decisions, and none of them are allowed to assume a working uplink to a cloud coordinator.
The paper we submitted to IEEE ICCIES 2025 — "Swarm Intelligence-Based Cooperative Intelligent Transportation System" — was about the decision layer that sits underneath that. Given a four-way intersection, a set of approaching CAVs (connected autonomous vehicles), and no central authority, how do the agents negotiate ordering and speed profiles fast enough that the intersection clears without a stop?
The constraint we actually cared about was not throughput. It was behavior under partial connectivity. Every C-ITS paper I read at the time reported gorgeous throughput curves under the assumption that every agent could talk to every other agent, every tick. In our simulation, that assumption held for exactly zero of the real-world V2X traces we could get our hands on.
Why swarm heuristics over MARL
The reflex, in 2024–2025, was to reach for multi-agent reinforcement learning. QMIX, MADDPG, MAPPO — the shelf was full. And on the paper benchmarks, MARL wins.
We didn't pick MARL. Three reasons:
- Convergence under non-stationarity. Every vehicle's policy is another vehicle's environment. MARL papers handle this with centralized training and decentralized execution, which needs a training-time oracle we did not have and could not fake.
- Explainability at review time. A swarm heuristic answers "why did the vehicle yield?" with a pheromone value and a local rule. A neural policy answers with an activation vector. Guess which one gets through peer review faster.
- Failure mode when connectivity drops. A swarm agent that loses its neighbors falls back to a conservative local rule and stops. A MARL agent runs a policy trained on a joint observation it no longer has. In our early runs, the MARL fallback was worse than "just stop."
Swarm intelligence — specifically an ACO-flavored (ant colony optimization) heuristic with a PSO-flavored velocity update for the speed profile — was the boring choice that composed cleanly with the constraint. Each vehicle deposits a virtual pheromone on the intersection lanes it plans to cross, decays over time, and reads its neighbors' pheromones through V2X broadcasts. The intersection clears in the order that emerges from the pheromone gradient, not the order a central authority picks.
The tradeoff
| Axis | MARL (QMIX / MAPPO family) | Our swarm heuristic |
|---|---|---|
| Peak throughput in fully-connected simulation | higher | comparable |
| Behavior under 30–50% packet loss | degrades sharply | graceful degradation |
| Training data required | large — millions of joint episodes | none — heuristic parameters only |
| Explainability to a traffic engineer | opaque activations | pheromone value + local rule |
| Compute at the vehicle | GPU-class for inference on some architectures | fits on the ECU we targeted |
| Time to a working baseline | weeks | days |
| Failure mode on comms drop | policy runs on stale joint obs | falls back to local yield rule |
| Formal safety-argument story | hard | tractable |
The trade was explicit. We traded ceiling throughput for floor safety, and we traded end-to-end learned behavior for something a domain reviewer could actually read.
The decision loop, roughly
# Per-vehicle decision loop, called every planning tick (~50ms in sim).
# The two things that mattered were the pheromone decay rate and the
# yield-rule threshold — everything else was second-order.
def swarm_decide(self, neighbors, intersection):
# 1. Read pheromones from neighbors' V2X broadcasts (may be partial).
field = pheromone_field(neighbors, decay=self.rho)
# 2. Score each candidate maneuver: {go, yield, slow}.
scored = {}
for m in candidate_maneuvers(self.state, intersection):
conflict = field.conflict_score(m.path, m.arrival_window)
urgency = self.urgency(m) # local: fuel, delay, priority
safety = self.safety_margin(m, neighbors)
scored[m] = (safety, -conflict, urgency) # lex order
# 3. Pick best; if conflict above threshold, fall back to yield rule.
best = max(scored, key=scored.get)
if field.conflict_score(best.path, best.arrival_window) > self.yield_tau:
best = local_yield_rule(self.state, intersection) # comms-independent
# 4. Deposit pheromone on chosen path for downstream agents.
self.broadcast_pheromone(best.path, mass=self.tau_dep)
return bestThe local_yield_rule at step 3 is the entire reason the paper cleared review. It is a boring right-of-way rule — the same one a human driver would use at an unsignalized intersection with no other information. It is what runs when V2X is dead. Everything above it is optimization; that line is the safety floor.
The result table and one honest ablation
The paper reports the throughput and average intersection-clearing time under three connectivity regimes: full V2X, 30% packet loss, and 50% packet loss. Full-connectivity numbers are competitive with the MARL baselines we could get to converge; the interesting result is the shape of the degradation curve. Ours slopes; theirs cliff.
The honest ablation is the one on pheromone decay rate rho. There is a sweet spot around a decay half-life that matches the typical intersection-crossing time — decay too fast and neighbors don't have time to read your intent, decay too slow and stale intent pollutes the field long after the vehicle has passed. The paper reports the sweep. What the paper does not fully advertise is that this parameter is the load-bearing knob of the entire system. If a downstream implementer misses this, the whole thing degrades to random.
I mention it here because it's the thing I'd flag first to anyone building on the work.
The parts we cut
Two things did not make the submitted version.
The RL baseline that didn't converge in time. We ran a MAPPO baseline against the same intersection scenario, and it never got to a policy we were willing to compare on. The training was under-budgeted — a few days of GPU time we did not really have — and the reward shaping was doing more work than it should have. In our simulation, the swarm heuristic outperformed the MAPPO agent, but I do not believe that comparison. A properly-trained MARL agent could plausibly meet or beat the swarm on peak throughput in the fully-connected regime. The paper claims a different thing — behavior under degraded connectivity — and we cut the half-cooked MARL numbers rather than defend a comparison we knew was thin.
The microscopic-traffic-sim adapter. Most of our simulation ran in a custom lightweight harness — enough to model vehicle kinematics, V2X packet drops, and intersection geometry, but not mixed traffic with human-driven vehicles. I had a partial adapter to SUMO that would have let us run the swarm agents with human-driven traffic as background. It ran, it produced numbers, but the numbers were sensitive to SUMO configuration in ways I could not fully explain in the review window. We cut it. That cut is the one I regret — the follow-up work has to build that adapter from scratch.
What I'd take further at CMU
The ICCIES paper is the ceiling of what the swarm-only formulation can do. The natural next questions:
- Learned pheromone deposition. The decay rate
rhois a hand-tuned scalar. In reality it should be a policy — a small model that decides how much pheromone to deposit given local state. That is a MARL problem again, but a much smaller one, and the safety floor is still the local yield rule. - Formal guarantees on the yield rule. We argued informally that the fallback is safe. A responsibility-sensitive-safety or barrier-function certificate would let the whole system inherit that guarantee.
- A real SUMO integration, done properly. The adapter that got cut is the piece the community will actually want to reproduce — with human-driven background traffic, calibrated geometries, and reproducible seeds.
- Heterogeneous fleets. Every simulation ran with identical agents. The real question is what happens when a subset runs the swarm policy and the rest run something else — MARL, legacy ADAS, or a human driver.
Cooperative ITS is a problem area where the ceiling is set by the modelling assumptions, not the algorithms. The paper bet on a specific set of assumptions — partial connectivity is the default, explainability is not optional, and the safety floor has to hold when the optimization ceiling doesn't. That bet held for review. What comes after is a different set of bets.
Full paper: IEEE ICCIES 2025 (document 11033077).
See also
- HNSW or IVF-PQ? What I actually chose at 2M documents — a different flavor of "the paper picks the boring option and the boring option was correct."
More on ongoing research directions at CMU MS-AIE is on the projects page.