A locality-first virtual thread scheduler for the JVM. Virtual threads that work together run on the same carrier thread, reducing context switches and improving cache locality.
You want locality. The default ForkJoinPool scatters virtual threads across carriers randomly. When a Netty handler spawns a VT for blocking work, it runs on a different thread; when it posts back, that's a wakeup and a cache miss. This scheduler keeps related work on the same carrier — fewer context switches, better cache hit rates, less CPU wasted on handoffs. See Netty with NIO or without Netty.
You need native transports. Kernel I/O calls (epoll_wait, io_uring_enter) pin the virtual thread to its carrier. On ForkJoinPool, each native poller permanently occupies a shared carrier thread. This scheduler gives each native poller its own dedicated carrier, so pinning doesn't starve the rest of the system. See Netty with native transport.
You want control. The scheduler exposes a simple API to register your own I/O pollers, get per-carrier thread factories, and decide exactly which virtual threads share a carrier. No black-box scheduling decisions — you control the topology. See writing a custom pinned poller.
This scheduler doesn't replace ForkJoinPool — it runs alongside it. Thread.ofVirtual() still creates virtual threads on the default FJP, and third-party libraries that create their own virtual threads are unaffected. You choose which work runs on which scheduler:
var group = new VirtualIoPollerEventLoopGroup(EpollIoHandler.newFactory());
var schedulerFactory = group.vThreadFactory();
schedulerFactory.newThread(() -> {
// scope with our factory → forked tasks stay on the same carrier
try (var scope = StructuredTaskScope.open(allSuccessfulOrThrow(),
cf -> cf.withThreadFactory(schedulerFactory))) {
scope.fork(() -> fetchFromCache());
scope.join();
}
// scope without factory → forked tasks run on default FJP
try (var scope = StructuredTaskScope.open(allSuccessfulOrThrow())) {
scope.fork(() -> callThirdPartyLibrary());
scope.join();
}
}).start();Split topology (default) — 8 OS threads for 4 cores, every arrow is a handoff:
graph LR
subgraph Netty I/O threads
IO0[IO thread 0]
IO1[IO thread 1]
IO2[IO thread 2]
IO3[IO thread 3]
end
subgraph ForkJoinPool
FJ0[FJP carrier 0]
FJ1[FJP carrier 1]
FJ2[FJP carrier 2]
FJ3[FJP carrier 3]
end
IO0 -- handoff --> FJ0
IO1 -- handoff --> FJ1
IO2 -- handoff --> FJ2
IO3 -- handoff --> FJ3
FJ0 -. wakeup .-> IO0
FJ1 -. wakeup .-> IO1
FJ2 -. wakeup .-> IO2
FJ3 -. wakeup .-> IO3
Unified topology (this scheduler) — 4 OS threads, no cross-pool handoffs:
graph TB
subgraph EventLoopSchedulerGroup
subgraph Carrier 0
P0[I/O poller VT]
V0[handler VTs]
end
subgraph Carrier 1
P1[I/O poller VT]
V1[handler VTs]
end
subgraph Carrier 2
P2[I/O poller VT]
V2[handler VTs]
end
subgraph Carrier 3
P3[I/O poller VT]
V3[handler VTs]
end
end
This scheduler runs I/O and virtual threads on the same carrier threads. No oversubscription, no cross-pool handoffs, and native pollers get their own dedicated carrier.
For the full analysis with benchmarks, see the talk at Devoxx and slides.
- A carrier-affinity scheduler (
EventLoopSchedulerGroup) — a global pool of permanent carrier threads, each with its own MPSC queue. Virtual threads created from a carrier's factory have affinity to that carrier. - Netty integration — drop-in event loop groups that run Netty I/O on the scheduler's carriers, so handler-spawned virtual threads stay on the same carrier as the event loop that received the request.
| Transport | Class | Pinned poller? |
|---|---|---|
| NIO / LOCAL | VirtualIoEventLoopGroup |
No — NIO parks via Loom, carrier is freed |
| EPOLL / IO_URING | VirtualIoPollerEventLoopGroup |
Yes — one per carrier, does kernel I/O |
| No Netty | EventLoopSchedulerGroup |
Optional — registerPinnedPoller() |
EventLoopSchedulerGroup (global singleton, N carriers)
├── EventLoopScheduler[0]
│ ├── carrier thread (permanent, daemon)
│ ├── MPSC run queue
│ └── virtualThreadFactory() → VTs with affinity to this carrier
├── EventLoopScheduler[1]
│ └── ...
└── EventLoopScheduler[N-1]
Carrier threads run forever (like ForkJoinPool workers). They drain the run queue and execute virtual thread continuations. When idle, they park.
Virtual threads created via a scheduler's virtualThreadFactory() are scheduled on that carrier. When they block, they park (freeing the carrier); when they resume, the continuation goes back into the same carrier's run queue.
A carrier can optionally host a pinned poller — a long-running virtual thread dedicated to kernel I/O (epoll_wait, io_uring_enter). It runs as a VT (not directly on the carrier thread) to avoid deadlocking the carrier: all carrier-to-VT coordination goes through lock-free structures. The scheduler coordinates preemption: when external VTs have work queued, the poller yields; when nothing is pending, the poller can block in kernel I/O.
- A Loom-enabled JDK (Java 27+, builds.shipilev.net or build from openjdk/loom)
- JVM flag:
-Djdk.virtualThreadScheduler.implClass=io.netty.loom.spi.NettyScheduler - Maven 3.6+
Use VirtualIoEventLoopGroup. NIO parks via Loom (carrier is freed), so no pinned poller is needed — the event loop just gets VT affinity to a carrier for locality.
var group = new VirtualIoEventLoopGroup(4, NioIoHandler.newFactory());
new ServerBootstrap()
.group(group)
.channel(NioServerSocketChannel.class)
.childHandler(/* ... */)
.bind(8080).sync();Use VirtualIoPollerEventLoopGroup. It registers a pinned poller on each carrier — a virtual thread that runs the Netty event loop and does kernel I/O (epoll_wait, io_uring_enter) with affinity to that carrier.
var group = new VirtualIoPollerEventLoopGroup(EpollIoHandler.newFactory());
new ServerBootstrap()
.group(group)
.channel(EpollServerSocketChannel.class)
.childHandler(new ChannelInitializer<SocketChannel>() {
@Override
protected void initChannel(SocketChannel ch) {
ch.pipeline().addLast(new MyHandler(group));
}
})
.bind(8080).sync();Spawn virtual threads from a handler — they run on the same carrier as the event loop:
class MyHandler extends SimpleChannelInboundHandler<FullHttpRequest> {
private final VirtualIoPollerEventLoopGroup group;
@Override
protected void channelRead0(ChannelHandlerContext ctx, FullHttpRequest req) {
group.vThreadFactory().newThread(() -> {
// blocking work here — same carrier as the event loop
byte[] result = blockingHttpCall();
ctx.channel().eventLoop().execute(() -> writeResponse(ctx, result));
}).start();
}
}Use the scheduler directly. No Netty dependency needed — just the core module.
var group = EventLoopSchedulerGroup.instance();
var scheduler = group.scheduler(0);
// all VTs from this factory have affinity to carrier 0
ThreadFactory tf = scheduler.virtualThreadFactory();
tf.newThread(() -> {
// runs on carrier 0
doWork();
}).start();Round-robin across all carriers:
var group = EventLoopSchedulerGroup.instance();
for (int i = 0; i < tasks; i++) {
var scheduler = group.scheduler(i % group.size());
scheduler.virtualThreadFactory().newThread(() -> doWork()).start();
}Register your own I/O poller on a carrier via registerPinnedPoller. The poller runs as a virtual thread with affinity to the carrier (not directly on the carrier thread — this avoids deadlocks by keeping all coordination lock-free). The returned CompletionStage completes when the poller exits and the slot is freed.
var scheduler = EventLoopSchedulerGroup.instance().scheduler(0);
CompletionStage<Void> termination = scheduler.registerPinnedPoller(
() -> { /* empty — no blocking, no wakeup needed */ },
() -> {
while (!shutdown) {
int events = doPollNonBlocking();
scheduler.maybeYield();
processTasks();
scheduler.maybeYield();
}
}
);That's the simplest correct poller — non-blocking poll, yield between phases, no wakeup coordination needed. The scheduler handles the rest.
A pinned poller has three responsibilities:
-
Yield CPU time to the scheduler. Call
maybeYield()between phases — it yields the carrier if external VTs have work queued, letting the carrier loop drain them before the poller resumes. -
Terminate. The poller slot is freed when the body
Runnablereturns (via try-finally internally). The returnedCompletionStagecompletes after cleanup, so callers can wait for the slot to be available again. -
Never miss a wakeup — if you choose to block. The simple poller above never blocks, so it doesn't need wakeup coordination. But if you want to block in kernel I/O when idle (to save CPU), the blocking path introduces a coordination problem.
The blocking decision must be made inside your transport — the transport must advertise it's about to sleep (store a flag) before checking
canBlock()(a load), with a StoreLoad barrier between the two so the load cannot slip before the advertisement. This is the Seastar sleep/wakeup pattern: the symmetric store-barrier-load on both producer and consumer sides ensures at least one side always sees the other's store.This is how our Netty integration works:
canBlock()is injected into the ManualIoEventLoop via override, the transport advertises sleep via a volatile write (pollerRunning.set(false)) before readingcanBlock(), and the wakeup is the transport's owneventLoop::wakeup(see netty#15922):// inject canBlock into the transport var eventLoop = new ManualIoEventLoop(parent, null, handlerFactory) { @Override public boolean canBlock() { return scheduler.canBlock(); } }; scheduler.registerPinnedPoller( eventLoop::wakeup, // transport's own wakeup mechanism () -> { while (!eventLoop.isShuttingDown()) { eventLoop.run(); // transport checks canBlock() internally scheduler.maybeYield(); } } );
Between
canBlock()returning true and the transport entering its blocking syscall, work may arrive and the scheduler callswakeup(). That signal must not be lost. Two approaches:Permit-based (lock-free): The transport's wakeup is sticky — if called before the blocking call starts, the blocking call returns immediately. Examples:
eventfd(stays readable until consumed),Selector.wakeup()(sets a flag),LockSupport.unpark()(stores a permit). This is what Netty's transports use.Lock-based (rendezvous): The
canBlock()check and the blocking wait happen inside a lock shared withwakeup(). The signal cannot slip between the check and the wait. Note:Condition.signal()is not sticky — if it arrives beforeCondition.await(), it's lost. The queue-empty check must be inside the locked region.For background on why this coordination is subtle, see:
- Seastar's memory barrier approach — the symmetric store-barrier-load pattern between producer and consumer
- Mechanical-sympathy discussion — why
Condition.signal()deadlocks when it arrives beforeCondition.await(), and why permit-based mechanisms (LockSupport.park/unpark) don't have this problem - Viktor Klang's actor — the atomic-flag-with-recheck pattern
Additional constraints:
canBlock()is a snapshot — it can go stale immediately. Never cache the result.- One poller per carrier.
registerPinnedPollerthrows if a poller is already registered.
For a deeper look at the store-barrier-load protocol, JCStress proofs that the guard prevents missed wakeups (and that removing it causes 94% signal loss), and the BlockingPollGuard utility that encapsulates it, see concurrency-tests/README.md.
| Property | Default | Description |
|---|---|---|
io.netty.loom.schedulers |
availableProcessors() |
Number of carrier threads |
io.netty.loom.yield.us |
10 |
Yield duration in microseconds |
io.netty.loom.resumed.continuations |
1024 |
Initial MPSC queue capacity |
| Module | Description |
|---|---|
netty-virtualthread-bootstrap |
JDK-only shim (NettyScheduler + NettySchedulerSpi). Must be on the system classloader. |
netty-virtualthread-core |
Scheduler + Netty integration. Discovered via ServiceLoader (TCCL). |
The JVM loads the scheduler via the system classloader. Frameworks like Spring Boot use isolated classloaders. The bootstrap module must be visible to the system classloader; the core module is discovered via ServiceLoader through the TCCL.
Use Multi-Release JAR entries to expose bootstrap classes to the system classloader:
<!-- Mark as Multi-Release -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifestEntries>
<Multi-Release>true</Multi-Release>
</manifestEntries>
</archive>
</configuration>
</plugin>
<!-- Unpack bootstrap into META-INF/versions/27/ -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<id>unpack-bootstrap-scheduler</id>
<phase>prepare-package</phase>
<goals><goal>unpack</goal></goals>
<configuration>
<artifactItems>
<artifactItem>
<groupId>io.netty.loom</groupId>
<artifactId>netty-virtualthread-bootstrap</artifactId>
<version>${netty-loom.version}</version>
<type>jar</type>
<includes>io/netty/loom/spi/**</includes>
<outputDirectory>${project.build.outputDirectory}/META-INF/versions/27</outputDirectory>
</artifactItem>
</artifactItems>
</configuration>
</execution>
</executions>
</plugin>
<!-- Exclude bootstrap from BOOT-INF/lib/ -->
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<excludes>
<exclude>
<groupId>io.netty.loom</groupId>
<artifactId>netty-virtualthread-bootstrap</artifactId>
</exclude>
</excludes>
</configuration>
</plugin>Place the bootstrap JAR on the system classpath or -Xbootclasspath/a:. The core JAR stays inside the application deployment (WAR/EAR).
The easiest way to get started is the provided dev container (.devcontainer/), which uses shipilev/openjdk:loom.
Works with VS Code (Dev Containers extension) and IntelliJ IDEA (File > Remote Development > Dev Containers).
export JAVA_HOME=/path/to/loom-jdk
mvn clean installApache License 2.0 — see LICENSE.
Credit: dreamlike-ocean for identifying and fixing the fat-JAR classloader issue.