Skip to main content

Trace generation

During the instrumented benchmark execution, we collect information about the execution of the called functions. This information is then used to generate a flame graph, bringing more granularity into the performance of your code.

Pre-requisitesโ€‹

FFI support

If you're using Foreign Function Interface, typically calling C/C++/Rust code from Python or Node.js, make sure to generate debug symbols for the foreign functions so that you can see them in the flame graph.

Reading Flame Graphsโ€‹

Flame graphs are a visualization tool for profiling software. They provide a graphical representation of your program's execution, making it easier to understand the runtime complexities involved.

Let's start with an example:

Here, each rectangle represents a function. The width of the rectangle is proportional to the amount of time spent in that function. The wider the rectangle, the more time was spent in that function. The vertical axis represents the call depth (a.k.a. stack depth), which represents the call hierarchy.

For this example, here is the call hierarchy:

  1. The root caller is the app function.
  2. app calls the init, handleRequest and terminate functions.
  3. handleRequest calls both authenticateUser and processData.
  4. processData calls foo which in turn calls bar.
Horizontal ordering

The horizontal order of the blocks is not significant and does not represent the order of execution. The blocks are sorted alphabetically by function name.

Aggregated function callsโ€‹

Functions calls are aggregated, so if a function is called multiple times, the time spent in all calls is aggregated into a single block.

Thus, the following code:

def bar():
pass

def foo():
bar()
bar()

def main():
foo()
foo()

Will generate the following flame graph:

Where the foo function is called twice and the bar function is called 4 times, but the time spent is aggregated into a single block for each function.

Self-costsโ€‹

In the previous example, we could see the global cost of each function call quite clearly. However, it can be tricky to find out how much time was spent within the function itself.

In the above illustration, we can see two types of self-costs:

  • (implicit) self-costs: the time spent in the function itself is the whole width of the rectangle since it doesn't call any other functions.
  • self-costs: the self-cost here is visible as the space not occupied by the children of the block.
Self-costs in interpreted languages

In Python or Node.js, the self-cost is the time spent in the function itself, but also the time spent by the interpreter. This means that a function will always have a self-cost, even if the function does nothing.

If we change a bit the example, adding a lot of computation directly in the main function:

def main():
for i in range(100):
# do some computation
foo()
foo()

Then the flame graph would look like this, with the self-cost of main being much bigger than before:

Viewing Flame Graphsโ€‹

On the pull request page, you can access the flame graphs of a benchmark by expanding it.

Example of flame graphs on a pull request page

Three types of flame graphs are available:

  • Base: flame graph of the benchmark base run
  • Head: flame graph of the benchmark run from the latest commit of the pull request
  • Diff: difference between the head and the base flame graphs