Skip to main content

Understanding the Metrics

Getting a benchmark execution speed measurementโ€‹

CodSpeed instruments your benchmarks to measure the performance of your code. A benchmark will be run only once and the CPU behavior will be simulated. This ensures that the measurement is as accurate as possible, taking into account not only the instructions executed but also the cache and memory access patterns. The simulation gives us an equivalent of the CPU cycles that includes cache and memory access.

Once we have the number of cycles for a benchmark, we transform it into an execution time measurement by using the following formula, where FREQUENCY is a constant set to the frequency (number of instructions executed per second) of a real CPU:

execution_time=cyclesFREQUENCYexecution\_time = \frac{cycles}{FREQUENCY}

We then calculate the execution speed of the benchmark by taking the inverse of the execution time:

speed=1execution_timespeed = \frac{1}{execution\_time}

This is the displayed metric in the CodSpeed reports.

Why choose execution speed over execution time?

A performance increase of a benchmark will increase its execution speed. Same for a performance regression. However, if execution time was used, a performance increase of a benchmark would result in a decrease in its execution time. This would be counter-intuitive.

Creating the Performance Impact Metricโ€‹

Getting the baseline report to compare toโ€‹

To create a performance impact, we need to compare the execution speed of the benchmarks against a baseline of those benchmarks' execution speed. Depending on the context of the run, the baseline report can be different.

Pull Requestโ€‹

When triggering a CodSpeed run on a pull request, the baseline report will be the report of the base commit of the pull request.

tip

To update the base branch of the pull request, for example, to have C instead of B as the base commit, you can rebase feat-branch on C.

caution

Prefer rebasing over merging. This will ensure that the resulting report will be completely accurate.

Branchโ€‹

When triggering a CodSpeed run following a push on a branch, the baseline report will be the report of the closest commit of the branch.

tip

In this example, a report already exists for the commit B of the main branch. A new commit C is pushed on the main branch. The baseline report for C will be the report of the commit B.

Benchmark performance impactโ€‹

The performance impact denotes an improvement or regression in performance of a benchmark. It is calculated by comparing the benchmark time measurement of the head commit with the time measurement of the base commit.

impact=speedโˆ’baseSpeedbaseSpeedimpact = \frac{speed - baseSpeed}{baseSpeed}

A negative performance metric means that the benchmark is slower than the previous commit. The closer its value is to -1, the slower it is.

โˆ’1<impact<0-1 \lt impact < 0

A positive performance metric means that the benchmark is faster than the previous commit. Its value can go up to +Infinity to denote massive speed improvements.

0<impact<+โˆž0 \lt impact \lt +\infty

Naturally, when the benchmark is as fast as the previous commit, the performance metric is 0.

Regression thresholdโ€‹

On the settings page of a project, you can set a threshold for a regression to be considered a regression. By default, this value is set to 10% (which is equivalent to 0.1). The value can be set from 0% to 50% by an admin of the project.

Commit performance impactโ€‹

To get the overall performance impact of a commit, we aggregate all the benchmark impacts. n being the number of benchmarks.

Regression threshold exceededโ€‹

If there is a regression above the threshold, the overall commit impact will be the biggest regression impact.

commitImpact=minโก0โ‰คi<nimpact[i]commitImpact = \min_{0\leq i\lt n} impact[i]
example

With impacts: [0.1, 0, -0.3] and a threshold of 0.25, the overall commit impact will be -0.3.

Improvement threshold exceededโ€‹

If there is an improvement above the threshold, the overall commit impact will be the maximum improvement impact.

commitImpact=maxโก0โ‰คi<nimpact[i]commitImpact = \max_{0\leq i\lt n} impact[i]
example

With impacts: [0.1, 0.3, -0.2] and a threshold of 0.25, the overall commit impact will be 0.3.

No threshold exceededโ€‹

Finally, in the remaining cases, a geometric mean is calculated from all the benchmark performance impacts.

commitImpact=(โˆi=0nโˆ’11+impact[i])1nโˆ’1commitImpact = \left(\prod_{i=0}^{n-1} 1+impact[i]\right)^{\frac{1}{n}} - 1
example

With impacts: [0.1, 0.3, -0.2] and a threshold of 0.5, the overall commit impact will be approximately 0.0459.

info

A geometric mean will give more relevant results than an arithmetic mean for this kind of measure as it will be less sensitive to outliers.

Performance impact Gaugeโ€‹

The performance impact gauge is a visual representation of the performance impact, displayed in multiple places of the CodSpeed UI.

Some examples of the gauge with their corresponding impact values:

-0.75
-0.2
0
0.3
1.5
info

To make it easier to spot regressions and improvements, the mapping between the actual performance impact value and the gauge is not linear.