mirror of
https://github.com/RIOT-OS/RIOT.git
synced 2024-12-29 04:50:03 +01:00
341 lines
15 KiB
Markdown
341 lines
15 KiB
Markdown
|
# Benchmark test for RIOT timers
|
||
|
|
||
|
This test is intended to collect statistics about the runtime delays in the
|
||
|
periph_timer implementation. This tool is mainly intended to detect problems in
|
||
|
the low level driver implementation which can be difficult to detect from
|
||
|
higher level systems such as xtimer.
|
||
|
|
||
|
## Testing periph/timer
|
||
|
|
||
|
The basic idea of the test is to generate a wide variety of calls to the
|
||
|
periph/timer API, in order to detect problems with the software implementation.
|
||
|
The test consists of a single thread of execution which will set random
|
||
|
timeouts on a periph_timer device. A reference timer is used to measure the
|
||
|
time it takes until the callback is called. The difference between the expected
|
||
|
time and the measured time is computed, and the mean and variance of the
|
||
|
recorded values are calculated. The results are printed as a table on stdout
|
||
|
every 30 seconds. All of the test scenarios used in this application are
|
||
|
based on experience from real world bugs encountered during the development of
|
||
|
RIOT and other systems.
|
||
|
|
||
|
### periph/timer functions tested
|
||
|
|
||
|
Both timer_set and timer_set_absolute calls are mixed in a random order, to
|
||
|
ensure that both functions are working correctly.
|
||
|
|
||
|
### Rescheduling
|
||
|
|
||
|
Some bugs may occur only when the application attempts to replace an already
|
||
|
set timer by calling timer_set again while the timer is running. This is dubbed
|
||
|
"resched" in this application. The application will issue rescheduling timer
|
||
|
calls in the normal operation of the test to catch this class of problems.
|
||
|
|
||
|
### Setting stopped timers
|
||
|
|
||
|
Another class of bugs is the kind where a timer is not behaving correctly if it
|
||
|
was stopped before setting a timer target. The timer should set the new target,
|
||
|
but not begin counting towards the target until timer_start is called. This is
|
||
|
covered by this application under the "stopped" category of test inputs.
|
||
|
|
||
|
## General benchmark behaviour
|
||
|
|
||
|
The benchmark application uses some techniques to improve the quality of the
|
||
|
tests.
|
||
|
|
||
|
### Avoiding phase lock
|
||
|
|
||
|
The CPU time used for generation and setting of the next timer target in
|
||
|
software is approximately constant across iterations. When the application flow
|
||
|
is driven by the timer under test, via a mutex which is unlocked from the timer
|
||
|
callback, the application will be "phase locked" to the timer under test. This
|
||
|
peculiarity may hide certain implementation race conditions which only occur at
|
||
|
specific phases of the timer ticks. In the test application, this problem is
|
||
|
avoided by inserting random CPU delay loops at key locations:
|
||
|
|
||
|
- After timer_stop
|
||
|
- Before timer_set/timer_set_absolute
|
||
|
- Before timer_start
|
||
|
|
||
|
### Estimating benchmark CPU overhead
|
||
|
|
||
|
An estimation of the overhead resulting from the CPU processing delays inside
|
||
|
the benchmarking code is performed during benchmark initialization. The
|
||
|
estimation algorithm attempts to perform the exact same actions that the real
|
||
|
benchmark will perform, but without setting any timers. The measured delay is
|
||
|
assumed to originate in the benchmark code and will be subtracted when
|
||
|
computing the difference between expected and actual values. A warning is
|
||
|
printed when the variance of the estimated CPU overhead is too high, this can
|
||
|
be a sign that some other process is running on the CPU and disrupting the
|
||
|
estimation, or a sign of serious problems inside the timer_read function.
|
||
|
|
||
|
## Testing xtimer
|
||
|
|
||
|
The benchmark application can be reconfigured for using xtimer instead of the
|
||
|
low level periph/timer interface. Use the Makefile target test-xtimer to build
|
||
|
a test for xtimer.
|
||
|
|
||
|
### xtimer functions tested
|
||
|
|
||
|
The application will mix calls to the following xtimer functions:
|
||
|
|
||
|
- `_xtimer_set`
|
||
|
- `_xtimer_periodic_wakeup`
|
||
|
- `_xtimer_spin`
|
||
|
|
||
|
These functions cover almost all uses of xtimer. The higher level functions
|
||
|
such as `xtimer_usleep` and `xtimer_set_msg` all use these functions internally
|
||
|
in the implementations.
|
||
|
|
||
|
## Results
|
||
|
|
||
|
When the test has run for a certain amount of time, the current results will be
|
||
|
presented as a table on stdout. To assist with finding the source of any
|
||
|
discrepancies, the results are split according to three parameters:
|
||
|
|
||
|
- Function used: timer_set, timer_set_absolute
|
||
|
- Rescheduled: yes/no
|
||
|
- Stopped: yes/no
|
||
|
|
||
|
### Expected results
|
||
|
|
||
|
It is difficult to set a general expected result which applies to all
|
||
|
platforms. However, the mean and variance of the differences should be small,
|
||
|
in relation to the tick duration of the timer under test. What is "small" also
|
||
|
depends the CPU core frequency, because of CPU processing time spent in the
|
||
|
driver code and in the test code. For example, when testing a 1 MHz timer and
|
||
|
comparing with another 1 MHz timer, the mean difference should likely be in
|
||
|
single digits on a Cortex-M CPU. Testing a 32768 Hz timer and referencing a 1
|
||
|
MHz timer on the other hand should have a mean difference in double digits, the
|
||
|
variance will also be greater because of the quantization errors and rounding
|
||
|
in the tick conversion routine. If the mean or variance of the difference is
|
||
|
exceptionally large, the row will be marked with "SIC!" to draw attention to
|
||
|
the fact.
|
||
|
|
||
|
### Interpreting timer_set_xxx statistics
|
||
|
|
||
|
The first part of the results compares the reference time elapsed against the
|
||
|
expected time. A negative value means that the timer alarm is triggered too
|
||
|
soon, compared to the expected time. A positive value means that the timer
|
||
|
alarm is triggered late, compared to the expected time.
|
||
|
|
||
|
#### Expected variance
|
||
|
|
||
|
The variance of the results should generally be as small as possible, except
|
||
|
for the case where the reference timer is running at a higher frequency than
|
||
|
the timer under test. There will always be a truncation error when setting a
|
||
|
timer target because of the discrete timer counts that can be set. The
|
||
|
benchmark will compute limits for the expected variance and show a message if
|
||
|
the variance is less than or greater than the expected variance.
|
||
|
|
||
|
##### Variance too low
|
||
|
|
||
|
A variance lower than expected indicates that there is a phase correlation
|
||
|
between the reference timer and the timer under test. This can only be
|
||
|
explained by a software bug or a hardware failure. The primary suspect will be
|
||
|
the reference timer.
|
||
|
|
||
|
##### Variance too high
|
||
|
|
||
|
A variance greater than expected indicates that there is some extra randomness
|
||
|
in the timeout length, which can be caused by other processes interfering with
|
||
|
the test, a software bug in the timer under test, or hardware failure.
|
||
|
|
||
|
##### Mathematical explanation
|
||
|
|
||
|
Let `X0` be the exact time that a timer_set call occurs, `X0` is a real number.
|
||
|
Let `T` be the desired timeout, `T` is a positive integer because the timer can
|
||
|
only count integer number of ticks. Let `x0 = floor(X0)` be the count that the
|
||
|
timer is currently on when the timer is set, `x0` is an integer. If `X0` is
|
||
|
uniformly random, then the truncation error `X0' = (X0 - x0)` will be a sample
|
||
|
from a continuous uniform distribution in the interval `[0,1)` ticks. The timer
|
||
|
target, `x1`, in the timer under test can be expressed as `x1 = x0 + T`. Let
|
||
|
`k` be a real number that satisfies `Y = kx`, where `x` is an interval in the
|
||
|
unit used by the timer under test, and `Y` is an interval in the unit used by
|
||
|
the reference timer, `Y` is a real number. The time measured by the reference
|
||
|
timer can be expressed as
|
||
|
`Y = k(x1 - X0) = k(x0 + T - X0) = k(X0 - X0' + T - X0) = k(T - X0')`.
|
||
|
`Y` will be a sample from a uniform random distribution in the interval
|
||
|
`[kT, k(T + 1)]`. The variance of `Y` is defined as
|
||
|
`Var(Y) = (k(T + 1) - kT)^2 / 12`.
|
||
|
|
||
|
### Interpreting timer_read statistics
|
||
|
|
||
|
A separate table is displayed for statistics on timer_read. This table compares
|
||
|
the expected timer under test time after a timer has triggered, against the
|
||
|
actual reported value from `timer_read(TIM_TEST_DEV)`. A positive value means
|
||
|
that the TUT time has passed the timer target time. A negative value means
|
||
|
that, according to the timer under test, the alarm target time has not yet been
|
||
|
reached, even though the timer callback has been executed.
|
||
|
|
||
|
### A note on BOARD=native
|
||
|
|
||
|
When running on native, the application will be running as a normal process in
|
||
|
a multi process system which means that most results will have a much greater
|
||
|
variance than expected and the resulting differences will also be much larger
|
||
|
than on a bare metal system. Be careful when drawing conclusions on results
|
||
|
from native.
|
||
|
|
||
|
## Configuration
|
||
|
|
||
|
The timer under test and the reference timer can be chosen at compile time by
|
||
|
defining TIM_TEST_DEV and TIM_REF_DEV, respectively. The frequencies for
|
||
|
these timers can be configured through TIM_DEV_FREQ and TIM_REF_FREQ. For
|
||
|
example, to compare timer device 2 running at 32768 Hz against a reference timer
|
||
|
on timer device 0 (default) running at 1 MHz (default) on Mulle:
|
||
|
|
||
|
CFLAGS='-DTIM_TEST_DEV=TIMER_DEV\(2\) -DTIM_TEST_FREQ=32768' BOARD=mulle make flash
|
||
|
|
||
|
### Default configuration
|
||
|
|
||
|
reference timer: TIMER_DEV(0)
|
||
|
timer under test: TIMER_DEV(1)
|
||
|
timer frequency: 1 MHz
|
||
|
|
||
|
Define USE_REFERENCE=0 to compare a timer against itself. Be aware that this
|
||
|
may hide a class of errors where the timer is losing ticks in certain
|
||
|
situations, which a separate reference timer would be able to detect.
|
||
|
|
||
|
### Example output
|
||
|
|
||
|
Below is a short sample of a test of a 32768 Hz timer with a 1 MHz reference:
|
||
|
|
||
|
------------- BEGIN STATISTICS --------------
|
||
|
Limits: mean: [-10, 41], variance: [58, 99]
|
||
|
Target error (actual trigger time - expected trigger time), in reference timer ticks
|
||
|
positive: timer is late, negative: timer is early
|
||
|
=== timer_set running ===
|
||
|
interval count sum sum_sq min max mean variance
|
||
|
1 - 2: 25705 455646 2002373 2 35 17 77
|
||
|
3 - 4: 25533 457326 1973191 2 35 17 77
|
||
|
5 - 8: 25412 451046 1970014 2 35 17 77
|
||
|
9 - 16: 25699 458563 2017198 2 36 17 78
|
||
|
17 - 32: 25832 457768 2018909 1 35 17 78
|
||
|
33 - 64: 25758 440454 1978830 1 35 17 76
|
||
|
65 - 128: 25335 414320 1942196 0 34 16 76
|
||
|
129 - 256: 25254 367689 1979616 -3 32 14 78
|
||
|
257 - 512: 25677 285822 2020736 -7 31 11 78
|
||
|
TOTAL 230205 3788634 18459378 -7 36 16 80
|
||
|
...
|
||
|
|
||
|
The statistics above show that the timer implementation introduces a small
|
||
|
delay on short timers. Note however that it is not clear whether this delay is
|
||
|
in timer_set, or timer_read, but the combined effect of all error sources make
|
||
|
the timer overshoot its target by on average 17 microseconds. The difference
|
||
|
between min and max is close to the expected difference of approximately
|
||
|
1/32768 s = 30.51 µs. The lower min and mean for the longer timer intervals are
|
||
|
most likely caused by a drift between the 1 MHz clock and the 32.768 kHz clocks
|
||
|
inside the CPU, and is expected when using two separate clock sources for the
|
||
|
timers being compared.
|
||
|
|
||
|
Below is a short sample of a test where there is something wrong with the timer
|
||
|
implementation, again a 32768 Hz timer tested with a 1 MHz reference:
|
||
|
|
||
|
=== timer_set_absolute resched ===
|
||
|
interval count sum sum_sq min max mean variance
|
||
|
1 - 2: 152217 82324291 13339518497 38 1071 540 87635 <=== SIC!
|
||
|
3 - 4: 152199 82254909 13301794832 38 1071 540 87397 <=== SIC!
|
||
|
5 - 8: 152023 82085454 13264873203 38 1071 539 87256 <=== SIC!
|
||
|
9 - 16: 152156 82236539 13265351119 38 1071 540 87183 <=== SIC!
|
||
|
17 - 32: 152639 82606815 13267666812 38 1071 541 86922 <=== SIC!
|
||
|
33 - 64: 151883 81836155 13268661551 37 1070 538 87361 <=== SIC!
|
||
|
65 - 128: 152114 82151495 13228100922 36 1070 540 86962 <=== SIC!
|
||
|
129 - 256: 152363 81806042 13278055359 34 1069 536 87148 <=== SIC!
|
||
|
257 - 512: 152360 81235185 13303811951 29 1066 533 87318 <=== SIC!
|
||
|
TOTAL 1369954 738536885 119197349719 29 1071 539 87008 <=== SIC!
|
||
|
|
||
|
We can see that the variance, the maximum error, and the mean are very large.
|
||
|
This particular timer implementation needs some work on its timer_set_absolute
|
||
|
implementation.
|
||
|
|
||
|
## Configuration details
|
||
|
|
||
|
Configuration macros used by the application are described below
|
||
|
|
||
|
### Settings related to timer hardware
|
||
|
|
||
|
#### TIM_TEST_DEV
|
||
|
|
||
|
Timer under test. Default: `TIMER_DEV(1)`
|
||
|
|
||
|
#### TIM_REF_DEV
|
||
|
|
||
|
Reference timer used for measuring the callback time error. Default: `TIMER_DEV(0)`
|
||
|
|
||
|
#### TIM_TEST_FREQ
|
||
|
|
||
|
Frequency of the timer under test, used when calling timer_init during the
|
||
|
application startup. Default: `1000000`, 1 MHz
|
||
|
|
||
|
#### TIM_REF_FREQ
|
||
|
|
||
|
Frequency of the reference timer, used when calling timer_init during the
|
||
|
application startup. Default: `1000000`, 1 MHz
|
||
|
|
||
|
#### TIM_TEST_CHAN
|
||
|
|
||
|
Timer channel used on the timer under test. Default: `0`
|
||
|
|
||
|
#### USE_REFERENCE
|
||
|
|
||
|
Shorthand for setting the reference timer to the same as the timer under test.
|
||
|
Default: `0`
|
||
|
|
||
|
### Settings related to result processing
|
||
|
|
||
|
|
||
|
#### DETAILED_STATS
|
||
|
|
||
|
Keep statistics per timer target offset value if set to 1, otherwise keep
|
||
|
statistics only per scenario. Default: `1`
|
||
|
|
||
|
#### LOG2_STATS
|
||
|
|
||
|
Only used when `DETAILED_STATS == 1`. Statistics are grouped according to
|
||
|
2-logarithms of the timer offset value, e.g. 1-2, 3-4, 5-8, 9-16 etc. This
|
||
|
reduces memory consumption and creates a shorter result table for easier
|
||
|
overview. Default: `1`
|
||
|
|
||
|
#### TEST_PRINT_INTERVAL_TICKS
|
||
|
|
||
|
The result table will be printed to standard output when this many reference
|
||
|
timer ticks have passed since the last printout. Default: `((TIM_REF_FREQ) * 30)`
|
||
|
|
||
|
### Settings related to timer input generation
|
||
|
|
||
|
#### TEST_MIN
|
||
|
|
||
|
Minimum timer offset tested for `timer_set_absolute`, in timer under test
|
||
|
ticks. Default: `1` for timers < 1 MHz, `16` otherwise, to prevent setting a
|
||
|
timer target in the past
|
||
|
|
||
|
#### TEST_MIN_REL
|
||
|
|
||
|
Minimum timer offset tested for `timer_set`, in timer under test ticks.
|
||
|
Default: `0`
|
||
|
|
||
|
#### TEST_MAX
|
||
|
|
||
|
Maximum timer offset tested, in timer under test ticks. Default: `128`
|
||
|
|
||
|
#### SPIN_MAX_TARGET
|
||
|
|
||
|
The CPU busy wait loop will be calibrated during application start up so that
|
||
|
the maximum random spin length is approximately equal to this many timer under
|
||
|
test ticks. Default: `16`
|
||
|
|
||
|
#### TEST_UNEXPECTED_STDDEV
|
||
|
|
||
|
Mark the output row if the computed variance implies a standard deviation which
|
||
|
is greater than this value. This value will be automatically recomputed to
|
||
|
compensate for the variance resulting from the truncation in the tick
|
||
|
conversion if the reference timer is running at a higher frequency than the
|
||
|
timer under test. Default: `4`
|
||
|
|
||
|
#### TEST_UNEXPECTED_MEAN
|
||
|
|
||
|
Mark the output row if the computed mean absolute difference is greater than this
|
||
|
number of timer under test ticks. This value will be automatically recomputed
|
||
|
to compensate for the error resulting from the truncation in the tick
|
||
|
conversion if the reference timer is running at a higher frequency than the
|
||
|
timer under test. Default: `2`
|