We once had a service that looked perfect on dashboards. CPU steady. Memory within limits. Autoscaling behaving exactly as expected. Yet latency percentiles told a different story—p99 spikes appearing in clusters, every few minutes.
No obvious bottleneck. No slow database queries. No thread starvation.
Then we turned on detailed GC logs.
What surfaced wasn’t catastrophic—no long full GCs, no OOMs. Instead, it was something more subtle and more dangerous: consistent micro-pauses. Each pause was small, but the frequency aligned perfectly with our latency spikes.
That was the turning point. GC wasn’t failing—it was doing exactly what it was designed to do. But our system design, allocation patterns, and collector choice were misaligned.
And that’s the uncomfortable truth: modern GC rarely “breaks” your system. It quietly shapes its behavior.
There’s a persistent myth that modern collectors have “solved” GC.
They haven’t. They’ve shifted the problem.
In today’s environments:
GC is now part of your latency model, not just your memory model.
You don’t tune GC to avoid failure. You tune it to:
At a conceptual level, the heap still looks familiar:
graph TD A[Young Gen] --> B[Eden] A --> C[Survivor S0] A --> D[Survivor S1] E[Old Gen] --> F[Tenured]
But modern collectors—especially G1, ZGC, and Shenandoah—don’t operate on contiguous generations. They operate on regions.
graph TD A[Heap] --> B[Region 1] A --> C[Region 2] A --> D[Region N] B --> E[Eden-like] C --> F[Survivor-like] D --> G[Old-like]
Each region can dynamically change roles. This gives collectors flexibility to:
Despite architectural changes, one assumption still holds:
Most objects die young.
In high-throughput systems:
Modern collectors either:
Allocation is almost never the problem.
Thanks to TLABs (Thread Local Allocation Buffers), allocation is effectively:
sequenceDiagram participant Thread participant TLAB participant Heap Thread->>TLAB: Allocate object alt TLAB has space TLAB-->>Thread: Fast path else TLAB full Thread->>Heap: Refill TLAB Heap-->>Thread: New buffer end
High allocation rate → more frequent GC cycles → more pressure on reclaim mechanisms.
That’s where things break down.
graph LR A[Eden Allocation] --> B[Minor GC] B --> C[Survivor] C --> D[Promotion] D --> E[Old Gen] E --> F[Major GC]
Modern collectors modify this flow:
G1 is not the fastest, nor the lowest latency. It’s the most balanced.
graph TD A[Heap] --> B[Regions] B --> C[Young Regions] B --> D[Old Regions] B --> E[Humongous Regions]
Instead of collecting entire generations, G1:
sequenceDiagram participant App participant G1 App->>G1: Allocation pressure G1->>G1: Young GC G1->>G1: Mixed GC (young + old) G1-->>App: Controlled pause
Pros
Cons
G1 is ideal when:
ZGC was built with one goal: eliminate pause times as a concern.
ZGC encodes metadata in pointers:
graph TD A[Reference] --> B[Marked] A --> C[Relocated] A --> D[Remapped]
This enables:
sequenceDiagram participant App participant ZGC App->>ZGC: Allocate ZGC->>ZGC: Concurrent mark ZGC->>ZGC: Concurrent relocate ZGC-->>App: Pause < 1ms
Pros
Cons
ZGC shines in:
Shenandoah takes a different route using Brooks pointers.
graph TD A[Object] --> B[Forwarding Pointer] B --> C[Data]
Every object has an indirection layer, allowing:
Pros
Cons
graph TD A[GC Roots] --> B[Reachable Objects]
Roots include:
Memory leaks are rarely “forgotten objects.”
They are reachable objects you didn’t expect to be reachable.
To support region-based collection, the JVM tracks cross-region references.
graph LR A[Region A] -->|Reference| B[Region B] B --> C[Remembered Set]
obj.field = newValue;
Triggers:
Objects larger than half a region size are treated specially.
| GC | Latency | Throughput | Heap Size | Best For |
|---|---|---|---|---|
| Serial | High | Low | Small | Embedded |
| Parallel | High | High | Medium | Batch |
| G1 | Medium | Medium | Large | Default |
| ZGC | Very Low | Medium | Huge | Low latency |
| Shenandoah | Low | Medium | Large | Low latency alt |
-XX:+UseG1GC
-XX:+UseZGC
-XX:+UseShenandoahGC
-Xms8g
-Xmx8g
-XX:+UseZGC
-XX:ZUncommitDelay=300
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-Xlog:gc)The JVM is moving toward:
But trade-offs remain:
What’s changing is not the existence of GC—but its role.
It’s no longer just memory management.
It’s a first-class performance characteristic.
And the engineers who understand that tend to be the ones debugging production issues while everyone else is still looking at CPU charts.