- Use static inline function for `mutex_try_lock()`
- The implementation is trivial enough with the inline-able IRQ API to just
always be inline-ed
- Rename `_mutex_lock()` to `mutex_lock()` and drop the blocking parameter
- This was possible to the stand-alone `mutex_try_lock()` implementation
- This yields a measurable performance bump
Currently it is not possible to check if a message was sent over a bus
or if it was send the usual way using `msg_send()`.
This adds a flag to the `sender_pid` if the message was sent over a bus.
`MAXTHREADS` is currently set to 32, so there is still plenty of room in
the PID space. (`kernel_pid_t` is `int16_t`)
The message type for bus message type is already accessed through a getter
function, so it's just consistent to do the same for sender_pid.
Verified that each warning generated by -Wcast-align is indeed a false positive
and used an (intermediate) cast to `uintptr_t` to silence the warnings.
container_of() is safe to use in regard to alignment requirements, when used
correctly. Using `uintptr_t` instead of `char *` for applying the offset results
in -Wcast-align not complaining.
Separate thread names from DEVELHELP so thread names can be
enabled in non-development/debug builds when required/desired.
THREAD_NAMES will be enabled by default then DEVELHELP is set to 1.
Allocate and initialize a thread-local block for each thread at the
top of the stack.
Set the tls base when switching to a new thread.
Add tdata/tbss linker instructions to cortex_m and risc-v scripts.
Signed-off-by: Keith Packard <keithp@keithp.com>
---
v2:
Squash fixes
v3:
Replace tabs with spaces
v4:
Add tbss to fe310 linker script
- Add `byteorder_bebuftohll()` to read an 64 bit value from a big endian buffer
- Add `byteorder_htobebufll()` to write an 64 bit value into a big endian buffer
This commit reverses the runqueue_cache bit order when the architecture
has a CLZ (count leading zeros) instruction. When the architecture
supports CLZ, it is faster to determine the most significant set bit of
a word than to determine the least significant bit set. Unfortunately
when the instruction is not available, it is more efficient to determine
the least significant bit set.
Reversing the bit order shaves off another 4 cycles on the same54-xpro.
From 147 to 143 ticks when testing with tests/bench_sched_nop.
Architectures where no CLZ instruction is available are not affected.
Apparently clang doesn't like static variables / functions being accessed called
from inline function (-Wstatic-in-inline). This commit results in the same
binary being generated while making clang happy.
Replace accesses to `sched_active_thread`, `sched_active_pid`, and
`sched_threads` with `thread_get_active()`, `thread_get_active_pid()`, and
`thread_get_unchecked()` where sensible.
- Add `thread_get_active()` to access the TCB
- Add `thread_get_unchecked()` as fast alternative to `thread_get()`
- Drop `volatile` qualifier in `thread_get()`
- Right now every caller of this function does this. It is better to
contain this undefined behavior to at least one place in code
The `clz` instruction pretty much implements getting the most significant bit
in hardware, so use it instead of the software implementation.
This reults in both a reduction in code size as in a speedup:
master:
text data bss dec hex filename
14816 136 2424 17376 43e0 tests/bitarithm_timings/bin/same54-xpro/tests_bitarithm_timings.elf
+ bitarithm_msb: 3529411 iterations per second
this patch:
text data bss dec hex filename
14768 136 2424 17328 43b0 tests/bitarithm_timings/bin/same54-xpro/tests_bitarithm_timings.elf
+ bitarithm_msb: 9230761 iterations per second
Big endian buffers on big endian systems are already in big endian byte order,
so no byte shuffling is needed. However, byte buffers might be unaligned, so
copy operations that are safe with unaligned memory accesses need to be
used.
It can be desirable to not have the boot message printed each time
(e.g. logs are transferred over a wireless link on battery) while
still retaining the ability to receive INFO level logs.
This adds the option to disable the boot-up message (and also to customize
it if that is desireable).
An interrupt serviced during the idle sleep can re-request a context
switch while the scheduler is already going to switch contexts after the
idle sleep. Thi sched_context_switch_request should thus be cleared
after the idle sleep and not before where it could be modified during
the idle sleep and get out of sync.
In the case that the no_thread_idle feature is active, the
runqueue_bitcache is checked twice in the case no thread is available to
schedule. This changes the inner while loop to a do-while loop to save
one check from the initial loop iteration, saving a cycle or so in the
idle case.
If a board was flashed via USB bootloader, a crash means the user has to
perform a procedure to manually enter the bootloader again for recovery.
To allow for easier recovery, automatically lauch the bootloader on crash
if `DEVELHELP` is enabled.
Those macros are all about convenience. However, always using 64 bit makes casts
nececcary that goes against the idea of having a convenience macro.
E.g. when printing a frequency in KHZ one might want to do
printf("freq: %lu kHz\n", freq / KHZ(1));
leads to an error
> error: format '%lu' expects argument of type 'long unsigned int', but argument 2 has type 'long long unsigned int'
Now we would have to cast - `%llu` is not available with newlib-nano and wholly
uneccecary.
Only use 64 bit artithmetic where necessary (GHZ, GiB), not for smaller units.
Receiving threads must not modify the contents of the message as this
is racy by definition.
Also make `msg_bus_post()` accept `const void*` instead of `char *`.
irq_% are not inlined by the compiler which leads to it branching
to a function that actually implement a single machine instruction.
Inlining these functions makes the call more efficient as well as
saving some bytes in ROM.
This commit removes the #error from the thread_flags header.
This #error makes the usage of
if(IS_USED(MODULE_THAT_DEPENDS_ON_THREAD_FLAGS)) pattern harder,
because the error is triggered each time the header is included.
If a module uses any thread_flags function it will fail in link time
anyway.
From the ARMv7-M ARM section B3.5.3:
Where there is an overlap between two regions, the register with
the highest region number takes priority.
We want to make sure the mpu_noexec_ram region has the lowest
priority to allow the mpu_stack_guard region to overwrite the first N
bytes of it.
This change fixes using mpu_noexec_ram and mpu_stack_guard together.
bitarithm.h is not needed for the interface of shed but may cause conflicts
due to different definitions of SETBIT and CLRBIT
common implementations are: (value, offset) xor (value, mask) bitarithm
implements the later
frac.c and nrf52/usbdev.c use bitarithm.h but where missing the include
sam0/rtt.c defined a bit using mask from bitarithm,
changed that to the soulution used in sam0/rtc.c