Pinning: A pitfall to avoid when using virtual threads in Java
Virtual threads debuted as a preview feature in Java 19 and achieved General Availability with Java 21
To fully grasp virtual
threads and their potential pitfalls in Java, we’ll begin with an overview of the core threading concepts.
Kernel Threads v/s User Threads
Kernel threads operate as integral threads of execution within the operating system’s kernel itself, holding these key characteristics:
- Kernel Domain: They reside entirely within the kernel’s address space, directly executing kernel code.
- Independent Scheduling: The kernel’s scheduler manages them autonomously, similar to regular processes.
- Specialized Creation: They’re brought to life using specific kernel functions or APIs, such as
kthread_create()
andkthread_run()
in Unix-like systems.
User threads, also known as lightweight threads, offer a more agile approach to multithreading, residing within the application’s user space and operating outside the kernel’s direct control. Here’s how they work:
- Application-Level Management: They’re created and orchestrated by thread libraries within the application’s address space, independent of the kernel.
- Custom Scheduling: Thread libraries implement their own scheduling algorithms to determine which user thread gets execution time on a kernel thread.
- Efficient Context Switching: Transitions between user threads are typically faster than kernel threads or processes, as they bypass the overhead of kernel transitions.
goroutines in go and virtual threads in Java are examples of user level threads
Platform Threads and Virtual Threads in Java
Platform
threads and virtual
threads embody distinct approaches to thread management in Java, each with unique characteristics:
Platform
Threads: One-to-One with OS Threads
- Thin Wrappers: They serve as lightweight proxies for underlying OS threads.
- Dedicated Resources: Platform threads are mapped 1:1 with OS threads.
Virtual
Threads: Agile and Resource-Efficient
- OS Thread Independence: They aren’t bound to specific OS threads, enabling a greater number of virtual threads than available OS threads.
- Dynamic Reassignment: The Java runtime seamlessly reassigns virtual threads to available OS threads, optimizing resource utilization.
- Blocking I/O Gracefulness: When a virtual thread encounters a blocking I/O operation, it’s suspended, freeing the OS thread to execute other virtual threads, reducing potential bottlenecks.
Pinning in Virtual Threads
The documentation, however, reveals a crucial scalability constraint known as pinning
, which can hinder virtual thread performance in certain scenarios.
A virtual thread cannot be unmounted during blocking operations when it is pinned to its carrier. A virtual thread is pinned in the following situations:
The virtual thread runs code inside a
synchronized
block or methodThe virtual thread runs a
native
method or a foreign function (see Foreign Function and Memory API)Pinning does not make an application incorrect, but it might hinder its scalability. Try avoiding frequent and long-lived pinning by revising
synchronized
blocks or methods that run frequently and guarding potentially long I/O operations with java.util.concurrent.locks.ReentrantLock.
To assess the severity of virtual
thread pinning, I investigated how their performance compares to platform
threads with synchronized
blocks. Can virtual
threads maintain their benefit, or do they become detrimental in such scenarios? Let’s uncover the truth.
Testing for Pinning in Virtual Threads with Synchronization
To investigate this behavior, I conducted an experiment involving a program that spawned varying numbers of concurrent threads, both platform
and virtual
, to execute tasks within synchronized
blocks.
This experiment was conducted for both platform
and virtual
threads, each with a range of concurrent thread counts to observe performance differences.
int[] threadsCountList = new int[] {2, 4, 8, 11, 13, 16, 23, 25, 32};
The task executed was a simple Thread.sleep()
with 4 seconds
in a synchronized
block.
public class TaskExecutor {
public void execute() throws InterruptedException {
synchronized (this) {
Thread.sleep(4000);
}
}
}
The ThreadPool
creation and execution of tasks was done through a TaskManager
class as shown below.
public class TaskManager {
private ThreadFactory threadFactory;
private int threads;
public TaskManager(ThreadType threadType, int threads) {
this.threads = threads;
switch (threadType) {
case VIRTUAL -> this.threadFactory = Thread.ofVirtual().factory();
case PLATFORM -> this.threadFactory = Thread.ofPlatform().factory();
}
}
public void start() {
try (ExecutorService executor =
Executors.newFixedThreadPool(this.threads, this.threadFactory)) {
for (int j = 0; j < this.threads; j++) {
TaskExecutor taskExecutor = new TaskExecutor();
executor.submit(
() -> {
try {
taskExecutor.execute();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
});
}
}
}
}
The experiment was run on my local laptop which has 12
available cores.
The Result and Its Interpretation
Total Available Cores 12
========================
Thread Count = 2
Thread Type: PLATFORM Time: 4012 ms
Thread Type: VIRTUAL Time: 4006 ms
------------------------
Thread Count = 4
Thread Type: PLATFORM Time: 4006 ms
Thread Type: VIRTUAL Time: 4005 ms
------------------------
Thread Count = 8
Thread Type: PLATFORM Time: 4002 ms
Thread Type: VIRTUAL Time: 4006 ms
------------------------
Thread Count = 11
Thread Type: PLATFORM Time: 4007 ms
Thread Type: VIRTUAL Time: 4006 ms
------------------------
Thread Count = 13
Thread Type: PLATFORM Time: 4002 ms
Thread Type: VIRTUAL Time: 8004 ms
------------------------
Thread Count = 16
Thread Type: PLATFORM Time: 4007 ms
Thread Type: VIRTUAL Time: 8004 ms
------------------------
Thread Count = 23
Thread Type: PLATFORM Time: 4004 ms
Thread Type: VIRTUAL Time: 8003 ms
------------------------
Thread Count = 25
Thread Type: PLATFORM Time: 4007 ms
Thread Type: VIRTUAL Time: 12009 ms
------------------------
Thread Count = 32
Thread Type: PLATFORM Time: 4008 ms
Thread Type: VIRTUAL Time: 12008 ms
------------------------
Pinning of virtual
threads with synchronized
blocks significantly hinders performance when concurrent threads exceed the available cores. This is due to the carrier
thread’s inability to yield the processor even if a thread is blocked or waiting, leading to an increase in execution time which is proportional to the blocking duration and the extent to which thread count surpasses core count. In this case, the total execution time for virtual
threads became 8 seconds
and 12 seconds
as the number of concurrent threads exceeded 12 and 24 respectively, while the the total execution time for platform
threads remained constant at 4 seconds
.
The same behavior is seen if a blocking network call is made in the synchronized
block.
While virtual
threads can offer performance benefits in high-throughput workloads involving long-running I/O requests, it’s crucial to avoid using synchronized
blocks or native methods within those to prevent performance degradation.
The source code is available here for reference: https://github.com/Abhishekvrshny/java-synchronized-threads-benchmark