Modern Java Garbage Collectors in JDK 25: Architecture, Generations, Allocation, and Memory Reclamation#

The Day GC Became a Production Incident#

We once had a service that looked perfect on dashboards. CPU steady. Memory within limits. Autoscaling behaving exactly as expected. Yet latency percentiles told a different story—p99 spikes appearing in clusters, every few minutes.

No obvious bottleneck. No slow database queries. No thread starvation.

Then we turned on detailed GC logs.

What surfaced wasn’t catastrophic—no long full GCs, no OOMs. Instead, it was something more subtle and more dangerous: consistent micro-pauses. Each pause was small, but the frequency aligned perfectly with our latency spikes.

That was the turning point. GC wasn’t failing—it was doing exactly what it was designed to do. But our system design, allocation patterns, and collector choice were misaligned.

And that’s the uncomfortable truth: modern GC rarely “breaks” your system. It quietly shapes its behavior.

Why GC Still Matters (Even With ZGC and Shenandoah)#

There’s a persistent myth that modern collectors have “solved” GC.

They haven’t. They’ve shifted the problem.

In today’s environments:

Containers impose hard memory ceilings
CPU shares fluctuate under orchestration
Allocation rates spike unpredictably (especially in event-driven systems)
Tail latency matters more than averages

GC is now part of your latency model, not just your memory model.

You don’t tune GC to avoid failure. You tune it to:

Stabilize p95/p99 latency
Control memory footprint in constrained environments
Avoid pathological allocation/reclamation cycles

JVM Heap Architecture — The Model vs Reality#

At a conceptual level, the heap still looks familiar:

graph TD
    A[Young Gen] --> B[Eden]
    A --> C[Survivor S0]
    A --> D[Survivor S1]
    E[Old Gen] --> F[Tenured]

But modern collectors—especially G1, ZGC, and Shenandoah—don’t operate on contiguous generations. They operate on regions.

Region-Based Reality#

graph TD
    A[Heap] --> B[Region 1]
    A --> C[Region 2]
    A --> D[Region N]
    B --> E[Eden-like]
    C --> F[Survivor-like]
    D --> G[Old-like]

Each region can dynamically change roles. This gives collectors flexibility to:

Reclaim memory incrementally
Avoid large stop-the-world compactions
Optimize based on live data density

Generational Hypothesis — Still the Backbone#

Despite architectural changes, one assumption still holds:

Most objects die young.

In high-throughput systems:

Request-scoped objects dominate allocation
Intermediate objects (streams, DTOs, buffers) are short-lived
Only a small subset survives beyond a few cycles

Modern collectors either:

Explicitly implement generations (G1)
Or implicitly optimize for the same pattern (ZGC, Shenandoah)

Object Allocation — Why It’s Not Your Bottleneck#

Allocation is almost never the problem.

Thanks to TLABs (Thread Local Allocation Buffers), allocation is effectively:

Pointer increment
No locks
CPU-cache friendly

sequenceDiagram
    participant Thread
    participant TLAB
    participant Heap

    Thread->>TLAB: Allocate object
    alt TLAB has space
        TLAB-->>Thread: Fast path
    else TLAB full
        Thread->>Heap: Refill TLAB
        Heap-->>Thread: New buffer
    end

The Real Problem: Allocation Rate#

High allocation rate → more frequent GC cycles → more pressure on reclaim mechanisms.

That’s where things break down.

Object Lifecycle — From Eden to Reclamation#

graph LR
    A[Eden Allocation] --> B[Minor GC]
    B --> C[Survivor]
    C --> D[Promotion]
    D --> E[Old Gen]
    E --> F[Major GC]

Modern collectors modify this flow:

Promotions are adaptive
Regions replace contiguous spaces
Some collectors eliminate explicit generations

Modern Garbage Collectors in JDK 25#

Serial GC#

Single-threaded
Predictable pauses
Useful for tiny heaps or debugging

Parallel GC#

Throughput-focused
Uses all CPU cores
Long pauses

G1 GC#

Region-based
Pause-time goals
Balanced performance

ZGC#

Ultra-low latency
Concurrent everything
Uses colored pointers

Shenandoah#

Concurrent compaction
Brooks pointer indirection

Epsilon GC#

No-op collector
Useful for benchmarking

G1 GC — The Engineering Compromise#

G1 is not the fastest, nor the lowest latency. It’s the most balanced.

Architecture#

graph TD
    A[Heap] --> B[Regions]
    B --> C[Young Regions]
    B --> D[Old Regions]
    B --> E[Humongous Regions]

Core Idea#

Instead of collecting entire generations, G1:

Identifies regions with the most garbage
Collects them first
Meets a pause-time target

Collection Flow#

sequenceDiagram
    participant App
    participant G1

    App->>G1: Allocation pressure
    G1->>G1: Young GC
    G1->>G1: Mixed GC (young + old)
    G1-->>App: Controlled pause

Trade-offs#

Pros

Predictable pauses
Mature ecosystem
Works out-of-the-box

Cons

Still stop-the-world
Sensitive to humongous allocations
Pause predictability not perfect

Real-world usage#

G1 is ideal when:

You want stability without deep tuning
Latency matters, but isn’t ultra-critical
Heap sizes are moderate to large

ZGC — Latency as a First-Class Constraint#

ZGC was built with one goal: eliminate pause times as a concern.

Colored Pointers#

ZGC encodes metadata in pointers:

graph TD
    A[Reference] --> B[Marked]
    A --> C[Relocated]
    A --> D[Remapped]

This enables:

Concurrent marking
Concurrent relocation
No long pauses

Execution Model#

sequenceDiagram
    participant App
    participant ZGC

    App->>ZGC: Allocate
    ZGC->>ZGC: Concurrent mark
    ZGC->>ZGC: Concurrent relocate
    ZGC-->>App: Pause < 1ms

Trade-offs#

Pros

Near-zero pauses
No fragmentation
Scales to massive heaps

Cons

Higher CPU overhead
Needs memory headroom (~10–20%)
Less forgiving under extreme memory pressure

Real-world usage#

ZGC shines in:

Low-latency APIs
Financial systems
AI inference workloads

Shenandoah — Concurrent Compaction Done Differently#

Shenandoah takes a different route using Brooks pointers.

graph TD
    A[Object] --> B[Forwarding Pointer]
    B --> C[Data]

Key idea#

Every object has an indirection layer, allowing:

Relocation without stopping threads
Concurrent compaction

Trade-offs#

Pros

Low latency
Efficient compaction

Cons

Extra pointer overhead
Slightly higher memory footprint

GC Roots — Where Everything Begins#

graph TD
    A[GC Roots] --> B[Reachable Objects]

Roots include:

Thread stacks
Static fields
JNI references

Practical insight#

Memory leaks are rarely “forgotten objects.”
They are reachable objects you didn’t expect to be reachable.

Remembered Sets and Write Barriers#

To support region-based collection, the JVM tracks cross-region references.

graph LR
    A[Region A] -->|Reference| B[Region B]
    B --> C[Remembered Set]

Write Barrier Example#

obj.field = newValue;

Triggers:

Metadata update
Remembered set tracking

Trade-off#

Adds overhead to writes
Enables efficient partial GC

Humongous Objects — The Silent Performance Killer#

Objects larger than half a region size are treated specially.

Problems#

Allocated directly in old regions
Hard to relocate
Cause fragmentation

Real-world triggers#

Large JSON payloads
Byte buffers
ML tensors

Mitigation#

Tune region size
Avoid large contiguous allocations
Stream data where possible

GC Comparison#

GC	Latency	Throughput	Heap Size	Best For
Serial	High	Low	Small	Embedded
Parallel	High	High	Medium	Batch
G1	Medium	Medium	Large	Default
ZGC	Very Low	Medium	Huge	Low latency
Shenandoah	Low	Medium	Large	Low latency alt

Configuring the Right GC#

Basic Switching#

-XX:+UseG1GC
-XX:+UseZGC
-XX:+UseShenandoahGC

Example: Latency-sensitive setup (ZGC)#

-Xms8g
-Xmx8g
-XX:+UseZGC
-XX:ZUncommitDelay=300

Example: Balanced setup (G1)#

-XX:+UseG1GC
-XX:MaxGCPauseMillis=200

Observability — Where Theory Meets Reality#

Tools#

GC logs (-Xlog:gc)
Java Flight Recorder
Prometheus + Grafana

Metrics That Matter#

Allocation rate
Pause percentiles (p95/p99)
Promotion rate
Old gen occupancy

GC in Modern Systems#

Microservices#

High churn
G1 works well
ZGC for latency-critical paths

Cloud#

Memory limits amplify GC pressure
CPU throttling affects concurrent phases

AI Systems#

Massive allocation spikes
Large object graphs
ZGC often performs best

Where This Is Heading#

The JVM is moving toward:

Fully concurrent collectors
Region-based everything
Predictable memory behavior

But trade-offs remain:

CPU vs latency
Memory overhead vs stability
Simplicity vs control

What’s changing is not the existence of GC—but its role.

It’s no longer just memory management.
It’s a first-class performance characteristic.

And the engineers who understand that tend to be the ones debugging production issues while everyone else is still looking at CPU charts.

Modern Java Garbage Collectors in JDK 25: Architecture, Generations, Allocation, and Memory Reclamation#

The Day GC Became a Production Incident#

No obvious bottleneck. No slow database queries. No thread starvation.

Then we turned on detailed GC logs.

That was the turning point. GC wasn’t failing—it was doing exactly what it was designed to do. But our system design, allocation patterns, and collector choice were misaligned.

And that’s the uncomfortable truth: modern GC rarely “breaks” your system. It quietly shapes its behavior.

Why GC Still Matters (Even With ZGC and Shenandoah)#

There’s a persistent myth that modern collectors have “solved” GC.

They haven’t. They’ve shifted the problem.

In today’s environments:

Containers impose hard memory ceilings
CPU shares fluctuate under orchestration
Allocation rates spike unpredictably (especially in event-driven systems)
Tail latency matters more than averages

GC is now part of your latency model, not just your memory model.

You don’t tune GC to avoid failure. You tune it to:

Stabilize p95/p99 latency
Control memory footprint in constrained environments
Avoid pathological allocation/reclamation cycles

JVM Heap Architecture — The Model vs Reality#

At a conceptual level, the heap still looks familiar:

graph TD
    A[Young Gen] --> B[Eden]
    A --> C[Survivor S0]
    A --> D[Survivor S1]
    E[Old Gen] --> F[Tenured]

But modern collectors—especially G1, ZGC, and Shenandoah—don’t operate on contiguous generations. They operate on regions.

Region-Based Reality#

graph TD
    A[Heap] --> B[Region 1]
    A --> C[Region 2]
    A --> D[Region N]
    B --> E[Eden-like]
    C --> F[Survivor-like]
    D --> G[Old-like]

Each region can dynamically change roles. This gives collectors flexibility to:

Reclaim memory incrementally
Avoid large stop-the-world compactions
Optimize based on live data density

Generational Hypothesis — Still the Backbone#

Despite architectural changes, one assumption still holds:

Most objects die young.

In high-throughput systems:

Request-scoped objects dominate allocation
Intermediate objects (streams, DTOs, buffers) are short-lived
Only a small subset survives beyond a few cycles

Modern collectors either:

Explicitly implement generations (G1)
Or implicitly optimize for the same pattern (ZGC, Shenandoah)

Object Allocation — Why It’s Not Your Bottleneck#

Allocation is almost never the problem.

Thanks to TLABs (Thread Local Allocation Buffers), allocation is effectively:

Pointer increment
No locks
CPU-cache friendly

sequenceDiagram
    participant Thread
    participant TLAB
    participant Heap

    Thread->>TLAB: Allocate object
    alt TLAB has space
        TLAB-->>Thread: Fast path
    else TLAB full
        Thread->>Heap: Refill TLAB
        Heap-->>Thread: New buffer
    end

The Real Problem: Allocation Rate#

High allocation rate → more frequent GC cycles → more pressure on reclaim mechanisms.

That’s where things break down.

Object Lifecycle — From Eden to Reclamation#

graph LR
    A[Eden Allocation] --> B[Minor GC]
    B --> C[Survivor]
    C --> D[Promotion]
    D --> E[Old Gen]
    E --> F[Major GC]

Modern collectors modify this flow:

Promotions are adaptive
Regions replace contiguous spaces
Some collectors eliminate explicit generations

Modern Garbage Collectors in JDK 25#

Serial GC#

Single-threaded
Predictable pauses
Useful for tiny heaps or debugging

Parallel GC#

Throughput-focused
Uses all CPU cores
Long pauses

G1 GC#

Region-based
Pause-time goals
Balanced performance

ZGC#

Ultra-low latency
Concurrent everything
Uses colored pointers

Shenandoah#

Concurrent compaction
Brooks pointer indirection

Epsilon GC#

No-op collector
Useful for benchmarking

G1 GC — The Engineering Compromise#

G1 is not the fastest, nor the lowest latency. It’s the most balanced.

Architecture#

graph TD
    A[Heap] --> B[Regions]
    B --> C[Young Regions]
    B --> D[Old Regions]
    B --> E[Humongous Regions]

Core Idea#

Instead of collecting entire generations, G1:

Identifies regions with the most garbage
Collects them first
Meets a pause-time target

Collection Flow#

sequenceDiagram
    participant App
    participant G1

    App->>G1: Allocation pressure
    G1->>G1: Young GC
    G1->>G1: Mixed GC (young + old)
    G1-->>App: Controlled pause

Trade-offs#

Pros

Predictable pauses
Mature ecosystem
Works out-of-the-box

Cons

Still stop-the-world
Sensitive to humongous allocations
Pause predictability not perfect

Real-world usage#

G1 is ideal when:

You want stability without deep tuning
Latency matters, but isn’t ultra-critical
Heap sizes are moderate to large

ZGC — Latency as a First-Class Constraint#

ZGC was built with one goal: eliminate pause times as a concern.

Colored Pointers#

ZGC encodes metadata in pointers:

graph TD
    A[Reference] --> B[Marked]
    A --> C[Relocated]
    A --> D[Remapped]

This enables:

Concurrent marking
Concurrent relocation
No long pauses

Execution Model#

sequenceDiagram
    participant App
    participant ZGC

    App->>ZGC: Allocate
    ZGC->>ZGC: Concurrent mark
    ZGC->>ZGC: Concurrent relocate
    ZGC-->>App: Pause < 1ms

Trade-offs#

Pros

Near-zero pauses
No fragmentation
Scales to massive heaps

Cons

Higher CPU overhead
Needs memory headroom (~10–20%)
Less forgiving under extreme memory pressure

Real-world usage#

ZGC shines in:

Low-latency APIs
Financial systems
AI inference workloads

Shenandoah — Concurrent Compaction Done Differently#

Shenandoah takes a different route using Brooks pointers.

graph TD
    A[Object] --> B[Forwarding Pointer]
    B --> C[Data]

Key idea#

Every object has an indirection layer, allowing:

Relocation without stopping threads
Concurrent compaction

Trade-offs#

Pros

Low latency
Efficient compaction

Cons

Extra pointer overhead
Slightly higher memory footprint

GC Roots — Where Everything Begins#

graph TD
    A[GC Roots] --> B[Reachable Objects]

Roots include:

Thread stacks
Static fields
JNI references

Practical insight#

Memory leaks are rarely “forgotten objects.”
They are reachable objects you didn’t expect to be reachable.

Remembered Sets and Write Barriers#

To support region-based collection, the JVM tracks cross-region references.

graph LR
    A[Region A] -->|Reference| B[Region B]
    B --> C[Remembered Set]

Write Barrier Example#

obj.field = newValue;

Triggers:

Metadata update
Remembered set tracking

Trade-off#

Adds overhead to writes
Enables efficient partial GC

Humongous Objects — The Silent Performance Killer#

Objects larger than half a region size are treated specially.

Problems#

Allocated directly in old regions
Hard to relocate
Cause fragmentation

Real-world triggers#

Large JSON payloads
Byte buffers
ML tensors

Mitigation#

Tune region size
Avoid large contiguous allocations
Stream data where possible

GC Comparison#

GC	Latency	Throughput	Heap Size	Best For
Serial	High	Low	Small	Embedded
Parallel	High	High	Medium	Batch
G1	Medium	Medium	Large	Default
ZGC	Very Low	Medium	Huge	Low latency
Shenandoah	Low	Medium	Large	Low latency alt

Configuring the Right GC#

Basic Switching#

-XX:+UseG1GC
-XX:+UseZGC
-XX:+UseShenandoahGC

Example: Latency-sensitive setup (ZGC)#

-Xms8g
-Xmx8g
-XX:+UseZGC
-XX:ZUncommitDelay=300

Example: Balanced setup (G1)#

-XX:+UseG1GC
-XX:MaxGCPauseMillis=200

Observability — Where Theory Meets Reality#

Tools#

GC logs (-Xlog:gc)
Java Flight Recorder
Prometheus + Grafana

Metrics That Matter#

Allocation rate
Pause percentiles (p95/p99)
Promotion rate
Old gen occupancy

GC in Modern Systems#

Microservices#

High churn
G1 works well
ZGC for latency-critical paths

Cloud#

Memory limits amplify GC pressure
CPU throttling affects concurrent phases

AI Systems#

Massive allocation spikes
Large object graphs
ZGC often performs best

Where This Is Heading#

The JVM is moving toward:

Fully concurrent collectors
Region-based everything
Predictable memory behavior

But trade-offs remain:

CPU vs latency
Memory overhead vs stability
Simplicity vs control

What’s changing is not the existence of GC—but its role.

It’s no longer just memory management.
It’s a first-class performance characteristic.

And the engineers who understand that tend to be the ones debugging production issues while everyone else is still looking at CPU charts.