I used to be responsible for several language agents at AppDynamics, including the Node.js agent. Here's what I learned (caveat: these notes came together in 2016-2017, and the Node world moves very quickly! The community is always adding new sources of telemetry and higher-level abstractions that make it easier to write good Node code.):
"Context" means the user request, route handler, middleware, 3rd-party module, helper, or callback where the problem is. (Many of the functions involved in a user request are anonymous, which means that further narrowing down context is a challenge).
The "best" signal for a particular problem is the one that detects the problem and rules out the most other possibilities (e.g., event loop max tick length can indicate several different problems, and there are better signals that would more quickly narrow down these specific problems).
| Node.js performance problem | What is the "best" signal for this problem? | What to ask next if these signal are observed? |
|---|---|---|
| Long-running external calls/DB calls |
|
|
| Memory leak |
|
|
| CPU intensive functions |
|
|
| libuv threadpool saturation |
|
|
| Large queue of timers, close events, or setImmediate() callbacks (these prevent the event loop from reaching the poll, timer, or close phases) |
|
|
| Recursive calls to process.nextTick() (these prevent the event loop from reaching the poll phase) |
|
|
| Unused CPU cores |
|
|
- Bert Belder's keynote @ Node.js Interactive 2016: https://youtu.be/PNa9OMajw9w
- https://nodesource.com/blog/understanding-the-nodejs-event-loop/
- https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/
- https://nodejs.org/api/fs.html
- https://nodejs.org/api/dns.html
- http://docs.libuv.org/en/v1.x/design.html
- https://github.com/libuv/libuv/blob/v1.x/src/unix/loop.c
- https://github.com/nodejs/node/blob/master/lib/internal/process/next_tick.js