Debugging Performance Issues in Node.js Streams using Drain and Pause Events
Node.js Streams are a powerful tool to process large amounts of data in an efficient manner. However, streams can also introduce performance issues, particularly when working with large data sets. Sometimes even the most efficient stream processes can be slow and code optimization doesn't seem to help the slightest. Before you start using multiple threads and using extra memory to cache processing, it's worth checking if your code is the actual bottleneck.
The first thing worth checking is if your program is actually using all the CPU it has access to. Adding multiple threads when your code uses less than 50% of the CPU often won't help and sometimes even make the whole process slower. To identify if your code is using all the resources, run it with:
/usr/bin/time node yourscript.js
# 0.06user 0.00system 0:01.61elapsed 4%CPU (0avgtext+0avgdata 39932maxresident)k
The output shows clearly that you're only using 4% - your code can process 25 times more data than it receives so it's worth figuring out what's slowing it down.
Fortunately, the Node.js API provides two useful events: "drain" and "pause", that can help you identify and resolve performance problems in streams.
Drain Event
The "drain" event is emitted when a Writable stream has completed writing all of its buffered data to the underlying resource. It is an indication that the stream is ready to receive more data. This event can help you to prevent buffer overflow, which is a common performance issue in streams.
For example, when working with a Writable stream, you can pause the stream if the buffer is full, and resume it once the "drain" event is emitted. This ensures that the stream does not overflow and the data is written to the underlying resource in a controlled manner.
const stream = fs.createWriteStream('file.txt');
stream.on('drain', function() {
console.log('Stream ready for more data');
});
A lot of drain events will mean that there's still more processing power and capacity to write the data to the destination faster. Improving writing speeds or the efficiency of your code will not speed up the process.
Pause Event
The "pause" event is emitted when a Readable stream is paused. This event is useful to identify when a Readable stream is overwhelmed and needs to take a break. For example, when working with a Readable stream, you can pause the stream when the buffer is full and resume it when the buffer has emptied.
var stream = fs.createReadStream('file.txt');
stream.on('pause', function() {
console.log('Stream paused');
});
A lot of pause
events here will mean that the writing is clogged up. The pauses usually come from the writable streams you piped your data into - improving your code is one option, but it's worth checking if your destination isn't slowing you down.
Counting Drain and Pause Events
The balance of "drain" and "pause" events can provide valuable insights into the performance of your stream. If the stream is emitting more "pause" events, it indicates that the write side is blocked and unable to continue writing. On the other hand, if the stream is emitting more "drain" events, it indicates that the read side is unable to keep up and is blocking the write side.
Let's assume that we have a couple of piped streams:
fs.createReadStream(pathToFile) // get data from disk - we don't know how fast
.pipe(tsGetExtraDataFromDB) // fetch some extra info from the Database
.pipe(net.createConnection({ port: 2301 })) // push the data to a tcp server
Knowing which of these streams hold you back is thankfully relatively easy, just add some simple counters:
var pauseCount = 0;
var drainCount = 0;
fs.createReadStream(pathToFile)
// You can move this line around...
.on("drain", () => drainCount++).on("pause", () => pauseCount++)
.pipe(tsGetExtraDataFromDB)
.pipe(net.createConnection({ port: 2301 }))
.on("finish", () => console.error({ drainCount, pauseCount }))
The above code will print out some information like this: { drainCount: 100, pauseCount: 29034 }
.
Balancing events = taking advantage
So what does that tell us. Well that depends where you put the event listeners and what the numbers say. In general there are three options:
drain > pause
If the drain count is higher, that means that the streams in above the line with the listeners are slower. In the above example this would mean that you can't read from disk faster than you're processing the data. In this case a faster SSD, using RAID or perhaps just gzipping the contents of your files if going to improve performance while not changing a line of your code. This makes sense when you're reading from slow disks, from S3 over the internet or perhaps an SD card.drain < pause
If the pause count is higher, that means that you can improve the streams below the listeners. In the above example that could mean the two streams - the transformer fetching data from database might be the issue, but then writing to a socket could also be slow. In this case you can move the line around to see if the pauses are happening on the next stream also - this will lead to the slowest stream in your pipeline.drain = pause
If the pause and resume count don't differ in magnitude (so there isn't a couple times more of any of these) that means, that at least in the place where the listeners are, there's very little to be done and the code is doing more or less ok. Moving the listeners to other lines might be sensible to find bottlenecks, but sometimes - when you read from slow sources and write to slow endpoints - the performance issue has nothing to do with your code.
Conclusion
In conclusion, the "drain" and "pause" events can be a valuable tool in debugging performance issues in Node.js streams. By using these events to control the flow of data and balancing them to understand the performance of your stream, you can ensure that the stream operates in an efficient and controlled manner, and resolve any performance problems that may arise.
I hope this guide was a nice read - if so, do check out the other articles.
If you're interested in streaming topics, do check out our Scramjet Cloud Platform and it's open source heart, the Transform Hub. Both of these can take away the burden of deploying long running stream based processors to the cloud or on your local infrastructure. You can be up and running in minutes, just send us your code and we'll keep it running while you can shut your laptop and go and grab a coffee.