LOG IN
Docs
By GitHub Copilot

GPU Volumetric Particles

How missile exhaust and contrails are staged on the CPU, simulated on the GPU, and rendered as a fixed-size smoke pool.

This system is not the old Generals particle engine. It is a newer side path for volumetric smoke trails. The classic particle system still owns the gameplay-facing emitters. The GPU particle system only takes over the expensive smoke body for missile exhaust and aircraft contrails, then leaves the original ribbon or sprite path in place for coverage.

That split explains most of the code shape. Render::GPUParticleSystem in Renderer/GPUParticles.h is a renderer-owned singleton with a fixed pool of 16,384 particles:

class GPUParticleSystem
{
public:
    static GPUParticleSystem& Instance();
    // ...
    static constexpr uint32_t MAX_PARTICLES = 16384;
};

It does not know about weapons, objects, or emitters. It accepts world-space positions and velocities, advances a structured buffer in a compute pass, and renders billboard quads later in the frame.

How a trail reaches the GPU

The handoff starts in the classic particle code. In the D3D11 shim path at D3D11Shims.cpp:9276, the contrail/trail iteration walks linked particles inside a trail system and emits extra GPU smoke along each segment between prev and p. The code interpolates along the segment, spaces emissions every ~2 or 4 world units, adds a little random jitter, and caps per-segment emission at 20. That produces a denser, less visibly segmented smoke column than the original trail sprites alone.

Each call lands in GPUParticleSystem::Emit:

void GPUParticleSystem::Emit(const Float3& position, const Float3& velocity, int type)
{
    if (m_stagingCount >= 256) return;  // staging buffer cap

    GPUParticle& p = m_stagingBuffer[m_stagingCount++];
    memset(&p, 0, sizeof(p));
    p.position = position;
    p.velocity = velocity;
    p.alive = 1;

    if (type == 0) {  // missile exhaust
        p.lifetime = 1.5f + (rand() % 60) * 0.01f;
        p.startSize = 6.0f + (rand() % 100) * 0.03f;
        p.color = { 0.8f, 0.78f, 0.75f };
    } else {          // contrail
        p.lifetime = 3.0f + (rand() % 150) * 0.01f;
        p.startSize = 1.5f + (rand() % 30) * 0.01f;
        p.color = { 0.7f, 0.72f, 0.75f };
    }
}

Emit does not talk to the GPU immediately. It writes a packed GPUParticle into m_stagingBuffer[256] and leaves the particle in that CPU queue.

The ring buffer and dispatch

The actual upload happens once per frame in GPUParticleSystem::Update, called from W3DDisplay::draw at W3DDisplay.cpp:989. FlushStagingToGPU copies staged entries into the GPU structured buffer one slot at a time with UpdateSubresource, starting at m_nextEmitSlot. That write head is a ring buffer — when it wraps, the oldest particles are silently overwritten.

Once uploads are flushed, Update binds the compute shader and dispatches one thread group per 256 particles:

uint32_t groups = (MAX_PARTICLES + 255) / 256;  // 64 groups for 16k particles
ctx->Dispatch(groups, 1, 1);

The compute constants are simple: delta time, turbulence, drag, gravity, pool size, wind direction. The D3D11 path uses a fixed 1.0f / 30.0f timestep, which intentionally matches the game logic cadence rather than the raw render delta.

Rendering is the last step. GPUParticleSystem::Render updates billboard-facing constants, binds the particle SRV and a procedural noise texture, then draws MAX_PARTICLES * 6 indices. The pixel shader decides which entries are alive — the CPU never compacts the buffer. Dead particles waste vertex shader time but produce no fragments.

Why it lives beside the old particle system

The key design choice is that the classic particle system still decides where trails exist. The GPU path is only a renderer-side enhancement layer. That keeps gameplay code simple: the engine already has trail emitters, trail lifetimes, and trail placement; the GPU system piggybacks on that data instead of teaching the object system about a second particle engine.

It also keeps the feature optional. g_debugDisableVolumetricTrails in W3DDisplay.cpp can disable the whole pass, and the classic trail rendering still produces something readable. For a while on the classic-rendering branch it was default-off because of a drift bug in the volumetric streak path (see bug_contrail_volumetric_drift.md); it's now default-on with the classic ribbon also always drawn as a safety net.

Quirks

  • The pool is destructive by design. The comment in GPUParticles.h calls it out: this is a fixed-size FIFO ring buffer, and the oldest smoke is overwritten when the pool fills. No LRU, no priority — just newest-wins.

  • Emission is best-effort. The CPU staging buffer holds 256 new particles per frame (if (m_stagingCount >= 256) return;). Dense scenes — multiple simultaneous SCUD launches over a Chinook column — silently drop smoke before it ever reaches the compute pass. There's no overflow counter, so the drop is invisible.

  • The timestep is fixed at 30 Hz in W3DDisplay.cpp:995. That makes the motion stable and logic-aligned, but it also means the system is not simulating with true render-frame delta. A missile fired on a single logic frame will emit the same curve shape regardless of render rate, which matches the rest of the engine's simulation-rate visuals.

  • The renderer always issues the full indexed draw. ctx->DrawIndexed(MAX_PARTICLES * 6, 0, 0) runs whether 1 or 16,384 particles are alive. Dead particles are a shader concern, not a CPU culling concern, so cost scales with pool capacity rather than live count. This is cheaper than it sounds — the dead-particle branch short-circuits the VS early.

  • Not the sole visual for contrails. A note in D3D11Shims.cpp explains why: staging-cap drops and drift can misalign the volumetric smoke under large planes, so the ribbon path stays as a safety net. Both draw simultaneously in the default configuration.

  • Type discrimination is crude. Emit takes an int type (0 = missile exhaust, 1 = contrail). Adding a third type requires extending both the CPU Emit switch and the compute shader's per-type simulation branch. There's no data-driven particle type table.