Pinning: A pitfall to avoid when using virtual threads in Java

Abhishek Varshney
5 min readDec 23, 2023

Virtual threads debuted as a preview feature in Java 19 and achieved General Availability with Java 21

To fully grasp virtualthreads and their potential pitfalls in Java, we’ll begin with an overview of the core threading concepts.

Kernel Threads v/s User Threads

Kernel threads operate as integral threads of execution within the operating system’s kernel itself, holding these key characteristics:

  • Kernel Domain: They reside entirely within the kernel’s address space, directly executing kernel code.
  • Independent Scheduling: The kernel’s scheduler manages them autonomously, similar to regular processes.
  • Specialized Creation: They’re brought to life using specific kernel functions or APIs, such as kthread_create() and kthread_run() in Unix-like systems.

User threads, also known as lightweight threads, offer a more agile approach to multithreading, residing within the application’s user space and operating outside the kernel’s direct control. Here’s how they work:

  • Application-Level Management: They’re created and orchestrated by thread libraries within the application’s address space, independent of the kernel.
  • Custom Scheduling: Thread libraries implement their own scheduling algorithms to determine which user thread gets execution time on a kernel thread.
  • Efficient Context Switching: Transitions between user threads are typically faster than kernel threads or processes, as they bypass the overhead of kernel transitions.
Different models for mapping user-level threads to kernel threads

goroutines in go and virtual threads in Java are examples of user level threads

Platform Threads and Virtual Threads in Java

Platform threads and virtualthreads embody distinct approaches to thread management in Java, each with unique characteristics:

Platform Threads: One-to-One with OS Threads

  • Thin Wrappers: They serve as lightweight proxies for underlying OS threads.
  • Dedicated Resources: Platform threads are mapped 1:1 with OS threads.

Virtual Threads: Agile and Resource-Efficient

  • OS Thread Independence: They aren’t bound to specific OS threads, enabling a greater number of virtual threads than available OS threads.
  • Dynamic Reassignment: The Java runtime seamlessly reassigns virtual threads to available OS threads, optimizing resource utilization.
  • Blocking I/O Gracefulness: When a virtual thread encounters a blocking I/O operation, it’s suspended, freeing the OS thread to execute other virtual threads, reducing potential bottlenecks.

Pinning in Virtual Threads

The documentation, however, reveals a crucial scalability constraint known as pinning, which can hinder virtual thread performance in certain scenarios.

A virtual thread cannot be unmounted during blocking operations when it is pinned to its carrier. A virtual thread is pinned in the following situations:

The virtual thread runs code inside a synchronized block or method

The virtual thread runs a native method or a foreign function (see Foreign Function and Memory API)

Pinning does not make an application incorrect, but it might hinder its scalability. Try avoiding frequent and long-lived pinning by revising synchronized blocks or methods that run frequently and guarding potentially long I/O operations with java.util.concurrent.locks.ReentrantLock.

To assess the severity of virtualthread pinning, I investigated how their performance compares to platformthreads with synchronizedblocks. Can virtualthreads maintain their benefit, or do they become detrimental in such scenarios? Let’s uncover the truth.

Testing for Pinning in Virtual Threads with Synchronization

To investigate this behavior, I conducted an experiment involving a program that spawned varying numbers of concurrent threads, both platformand virtual, to execute tasks within synchronizedblocks.

This experiment was conducted for both platformand virtualthreads, each with a range of concurrent thread counts to observe performance differences.

int[] threadsCountList = new int[] {2, 4, 8, 11, 13, 16, 23, 25, 32};

The task executed was a simple Thread.sleep() with 4 secondsin a synchronized block.

public class TaskExecutor {

public void execute() throws InterruptedException {
synchronized (this) {
Thread.sleep(4000);
}
}
}

The ThreadPool creation and execution of tasks was done through a TaskManager class as shown below.

public class TaskManager {
private ThreadFactory threadFactory;
private int threads;

public TaskManager(ThreadType threadType, int threads) {
this.threads = threads;
switch (threadType) {
case VIRTUAL -> this.threadFactory = Thread.ofVirtual().factory();
case PLATFORM -> this.threadFactory = Thread.ofPlatform().factory();
}
}

public void start() {
try (ExecutorService executor =
Executors.newFixedThreadPool(this.threads, this.threadFactory)) {
for (int j = 0; j < this.threads; j++) {
TaskExecutor taskExecutor = new TaskExecutor();
executor.submit(
() -> {
try {
taskExecutor.execute();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
});
}
}
}
}

The experiment was run on my local laptop which has 12 available cores.

The Result and Its Interpretation

Total Available Cores 12
========================
Thread Count = 2
Thread Type: PLATFORM Time: 4012 ms
Thread Type: VIRTUAL Time: 4006 ms
------------------------
Thread Count = 4
Thread Type: PLATFORM Time: 4006 ms
Thread Type: VIRTUAL Time: 4005 ms
------------------------
Thread Count = 8
Thread Type: PLATFORM Time: 4002 ms
Thread Type: VIRTUAL Time: 4006 ms
------------------------
Thread Count = 11
Thread Type: PLATFORM Time: 4007 ms
Thread Type: VIRTUAL Time: 4006 ms
------------------------
Thread Count = 13
Thread Type: PLATFORM Time: 4002 ms
Thread Type: VIRTUAL Time: 8004 ms
------------------------
Thread Count = 16
Thread Type: PLATFORM Time: 4007 ms
Thread Type: VIRTUAL Time: 8004 ms
------------------------
Thread Count = 23
Thread Type: PLATFORM Time: 4004 ms
Thread Type: VIRTUAL Time: 8003 ms
------------------------
Thread Count = 25
Thread Type: PLATFORM Time: 4007 ms
Thread Type: VIRTUAL Time: 12009 ms
------------------------
Thread Count = 32
Thread Type: PLATFORM Time: 4008 ms
Thread Type: VIRTUAL Time: 12008 ms
------------------------

Pinning of virtualthreads with synchronizedblocks significantly hinders performance when concurrent threads exceed the available cores. This is due to the carrierthread’s inability to yield the processor even if a thread is blocked or waiting, leading to an increase in execution time which is proportional to the blocking duration and the extent to which thread count surpasses core count. In this case, the total execution time for virtual threads became 8 seconds and 12 seconds as the number of concurrent threads exceeded 12 and 24 respectively, while the the total execution time for platform threads remained constant at 4 seconds .

The same behavior is seen if a blocking network call is made in the synchronized block.

While virtualthreads can offer performance benefits in high-throughput workloads involving long-running I/O requests, it’s crucial to avoid using synchronized blocks or native methods within those to prevent performance degradation.

The source code is available here for reference: https://github.com/Abhishekvrshny/java-synchronized-threads-benchmark

--

--