Saturday, February 4, 2023

Batching: Efficiency under load

In this post, I wanted to make a quick observation about batching as an underused technique in distributed systems. Systems folks have long used batching as an effective way to increase the throughput of the system. This works by amortizing the overhead of an expensive action (such as device IO or a syscall) over multiple operations. Yet, outside of non-realtime systems, batching is rarely used within distributed services.

The common reasoning is that while batching can increase throughput, it also increases latency. But that doesn't need to be true. A great example of having our cake and eating it too can be found within software defined networking systems. Throughput and latency are critically important for such services, so they are written in systems languages like C, C++, or Rust, rely on lock-free data structures, and tend to use a variety of kernel-bypass techniques to avoid the overhead of sys calls.

A common way such services are implemented is by polling the network device driver queue in an infinite loop (usually with the aid of a framework like DPDK) and then performing their business function on individual packets as they arrive. Most of the time the system is lightly loaded, and the queue depth never exceeds one. However, if the load on the system increases, the arrival rate of new packets can temporarily exceed the rate at which worker threads can handle them. When that happens, the bounded queue builds up a short backlog of packets waiting to be processed. This dynamic isn't that different from how web-servers that make up most of our distributed systems operate.

This is where networking services tend to take advantage of batching. Instead of dequeuing individual packets, the polling thread will try to dequeue multiple packets and submit them for processing as a single batch. Individual stages of the application are then written in a way that takes advantage of such batching, amortizing expensive lookups and operations over multiple packets. Most of the time, the system is lightly loaded, and each batch contains a single packet. However, if the system begins building up a backlog, the batch sizes will increase, improving both latency and efficiency.

This creates a dynamic where the system becomes more efficient as the load increases, and that's such a desirable property! I believe that the same type of opportunistic batching could benefit our distributed systems as well, by improving peak efficiency without sacrificing latency.