Making A* Fast — JPS+ and Flow Fields in an RTS
A deep walkthrough of how the remaster accelerated the classic A* pathfinder with jump tables and shared-goal flow fields.
Video length target: ~8-10 minutes. Recording setup: Editor split with pathfinder source on one side, a scratch diagram canvas on the other. In-game footage for the cold open and outro. Prereqs for viewer: Working knowledge of A*. No prior exposure to JPS+ or flow fields assumed.
Cold open (0:00 – 0:45)
[VISUAL: Split-screen. Same map, same query, 200 units commanded to attack-move across an open plain. Left pane: classic A* visualised, open-list expansions blooming outward cell by cell. Right pane: the remaster, rays sweeping straight across the same terrain and stopping only at jump points. Same path, dramatically fewer expansions.]
Say:
"Both panes are running the same pathfinder. Same map, same command, same deterministic result. The left took hundreds of cell expansions per unit. The right took tens. Today we're going to look at the two pieces of code that made that difference — and why they are carefully designed to not break anything."
Act 1 — Why A* gets expensive in an RTS (0:45 – 2:00)
[VISUAL: Editor — GeneralsMD/Code/GameEngine/Source/GameLogic/AI/AIPathfind.cpp. Scroll through examineNeighboringCells. Highlight the eight-neighbour expansion loop.]
Say:
"Classic A* with an octile heuristic. For every cell popped off the open list, you look at eight neighbours, compute a tentative cost, push them back on. Correct, well-understood, and in an RTS it gets expensive fast."
Say:
"Two things compound. Unit count: a single attack-move command can fire a hundred findPath calls. And goal overlap: those hundred calls usually go to the same place. You end up doing almost the same search over and over."
[VISUAL: Overlay with profiler numbers from the memory notes — findPath p99 at 53 ms, worst case 564 ms, six consecutive frames over 373 ms. Stutter visible.]
Say:
"This isn't theoretical. Our own profile showed single frames spending over half a second inside one findPath call. That's a hitch a player will feel. So we added two fast paths on top of classic A* — both of which fall back the moment their preconditions don't hold."
Act 2 — JPS+ intuition (2:00 – 4:00)
[VISUAL: Scratch diagram. A grid with a unit at one end and open terrain ahead. Classic A* expands every cell on the ray.]
Say:
"Here's the insight behind JPS+, which stands for Jump Point Search Plus. If a ray of cells is completely uniform and open, expanding every one of them during search is wasted work. None of them are decision points. The path can't turn until something forces it to turn."
[VISUAL: Mark one cell on the grid where an obstacle sits diagonally adjacent. Label it "forced neighbour — first cell where the path could turn."]
Say:
"A jump point is that first cell where the path might need to turn. JPS online computes these by scanning at query time. JPS+ pre-computes them. For every cell, every direction, every locomotor class, you store: how far can I sweep from here before I hit a jump point?"
[VISUAL: Open GeneralsMD/Code/GameEngine/Include/GameLogic/AIPathfindPrecomputed.h. Scroll to the comment block describing the encoding.]
Say:
"That's what this table stores. Int16 per cell per direction. The sign of the value is the trick."
// Jump value encoding per cell per direction: // v > 0 : step exactly v cells in this direction to reach a jump point. // v == 0 : direction is blocked (adjacent cell not jumpable). // v < 0 : no jump point in this ray; |v| open cells before obstacle/edge. // Using Int16: |v| ≤ 32767 cells is plenty (largest map is ~1024 on a side). typedef Short PFJumpVal;
Say:
"Positive: jump exactly this far. Zero: blocked, don't try. Negative: open run with no jump point on the ray — still useful if the goal happens to lie on it."
[VISUAL: Scroll to the PFLocoClass enum.]
Say:
"One table per locomotor class. Infantry, hover, cliff-climber, rubble-walker, crusher. Five classes cover every unit. One table per class, indexed by cell times eight directions plus the direction index."
// Per class: flat array of PFJumpVal[cellIndex * 8 + dir]. std::vector<PFJumpVal> m_jump[static_cast<Int>( PFLocoClass::PF_LOCO_CLASS_COUNT )];
Act 3 — Plugging JPS+ into A* (4:00 – 5:30)
[VISUAL: Jump to AIPathfind.cpp, Pathfinder::appendJumpPointSuccessors around line 6387. Scroll to the early-exit block.]
Say:
"Here's the consumer. This runs at the top of the classic expansion step. The first thing it does is bail out of anything it can't handle."
if ( attackDistance != NO_ATTACK ) return 0; if ( m_isTunneling ) return 0; if ( locomotorSet.isDownhillOnly() ) return 0; if ( !goalCell ) return 0; if ( parentCell->getLayer() != LAYER_GROUND ) return 0;
Say:
"Attack-distance paths, tunneling, downhill-only, bridge and wall layers — none of those match the straight-ray assumption. Return zero, classic A* handles it. This is the pattern you'll see everywhere in this system. Narrow the case, bail on anything weird."
[VISUAL: Scroll to the per-direction loop. Pause on the allied-unit check.]
Say:
"Inside the loop we iterate all eight directions. For each, look up the jump value. If the goal lies on this ray, jump directly to it. Otherwise, if the entry is positive, jump to the forced-neighbour point. Then — critically — we walk the ray one cell at a time to confirm no allied unit is sitting in the middle of it."
if ( c->getPosUnit() != INVALID_ID ) break; if ( c->getGoalUnit() != INVALID_ID ) break;
Say:
"An allied unit blocker mid-ray would produce a path that tries to walk through a friendly. Can't have that. If we hit one, truncate to the cell just before, and let classic expansion pick up from there."
Say:
"Notice what this function does not do. It does not replace A*. It adds extra successors to the open list. Classic eight-neighbour expansion still runs right after. Worst case, JPS+ contributes zero successors and you paid for a handful of lookups. There is no path regression possible here."
Act 4 — Flow fields for shared goals (5:30 – 7:00)
[VISUAL: Back to the cold-open visual, but now focus on the 200 units.]
Say:
"JPS+ makes each query cheaper. Flow fields attack a different problem. If a hundred units are going to the same cell, we shouldn't run a hundred searches. We should run one search, and let everyone consult the answer."
[VISUAL: Open AIPathfindFlowField.h. Scroll to the per-cell encoding comment.]
Say:
"A flow field is a Dijkstra that runs outward from the goal, and for every cell it visits it records: from here, what's the next direction to step? One byte per cell."
enum : UnsignedByte
{
PF_FLOW_GOAL = 0xFE,
PF_FLOW_UNREACHABLE = 0xFF
};
Say:
"Zero through seven are the eight compass directions. 0xFE is the goal itself, stop here. 0xFF means unreachable. A unit queries the field at its start cell, reads a direction, walks one cell, reads again, until it hits the goal. Per-unit cost is O of path length, not O of search."
[VISUAL: Scroll to PathfindFlowFieldCache::getOrBuild.]
Say:
"Building a field is expensive, roughly ten to thirty milliseconds on a big map. So we cache. An LRU, 32 entries, keyed on locomotor class and goal cell."
/// Fetch (or synchronously build + cache) the flow field for this goal /// and locomotor class. Returns nullptr if the cache is not allocated or /// the goal cell is outside the map extent. PathfindFlowField* getOrBuild( PFLocoClass cls, Int goalCellX, Int goalCellY );
Say:
"First unit to a novel goal pays the Dijkstra cost. The other 199 are cache hits, O of 1 each. The entire attack-move collapses into one search plus a bunch of reads."
Act 5 — Determinism is the rule, not a feature (7:00 – 8:00)
[VISUAL: Back to AIPathfindPrecomputed.h. Highlight the header comment block about no RNG, no float keys.]
Say:
"This is a lockstep multiplayer engine. Every client runs the simulation, there is no server truth. If two clients produce different paths for the same unit, they desync within seconds. So neither of these accelerators is allowed to introduce any source of divergence."
Say:
"Integer costs throughout. Ten units for an orthogonal step, fourteen for a diagonal — the classic octile ratio, rounded to integers. No floats in the search. Min-heap ties broken by monotonic insertion sequence, so same inputs always produce the same pop order."
[VISUAL: Scroll to PFBuildStatus and zoneDistReady in the header.]
Say:
"Async rebuilds are the subtle part. When the map mutates — a structure goes up, a zone is dirtied — we kick a worker thread to rebuild the jump tables. While the worker runs, the accelerator reports not-ready, and consumers fall back to classic A*. The readiness flag is a release-acquire atomic pair."
Bool zoneDistReady() const { return m_zoneDistReady.load( std::memory_order_acquire ); }
Say:
"Acquire on the reader, release on the writer. A partially-built table read as ready would corrupt pathing, maybe silently. Release-acquire guarantees that when a reader sees ready, it sees the fully-written table. Every cell-mutation site also calls
waitForAsyncbefore writing, so the worker never races with a grid change."
Act 6 — The fallback is the load-bearing wall (8:00 – 8:30)
[VISUAL: Zoom out. Show a diagram: classic A* as the trunk, JPS+ and flow fields as side branches, each with an arrow labelled "fail open → A*" pointing back to the trunk.]
Say:
"Count the bail-outs we've seen. Attack distance. Tunneling. Downhill. Non-ground layer. Missing goal. Table not ready. Exotic locomotor. Allied unit on the ray. Cache miss plus unreachable start cell. Every one of those paths ends in 'return nullptr, classic A* runs'."
Say:
"This is what 'additive optimisation' means in an engine with a twenty-year-old determinism contract. Don't replace the old thing. Sidecar a new thing that can bail. The speedup is free. The correctness burden did not move."
Outro (8:30 – 9:00)
[VISUAL: Back to the split-screen cold open. Fade the left pane out. The right pane keeps playing.]
Say:
"Pathfinding in a game engine is rarely a clean rewrite. It's usually a layering exercise. You find the case that matters, you write a fast path for it, you make absolutely sure the fast path fails open. Do that a few times and the pathfinder stops being your stutter."
Takeaway:
Two additive accelerators. JPS+ gives per-query speedup via precomputed jump tables. Flow fields give per-batch speedup via shared-goal Dijkstra caching. Both fall back to classic A* the instant a precondition doesn't hold. Integer costs, atomic readiness, deterministic tiebreaks — lockstep stays lockstep.
B-roll suggestions
- Side-by-side of classic A* expansion vs. JPS+ sweep over the same query.
- Time-lapse of 200 units attack-moving in a single tide, the moment the first unit pays the Dijkstra cost and the rest stream out behind.
- Scrolling through the Int16 jump table as a heatmap coloured by jump distance.
- Frame-time graph before and after, with the p99 spike collapsing.