19074: cpu/esp8266: build the SDK bootloader from source r=benpicco a=gschorcht
### Contribution description
This PR is a takeover of PR #17043, which is rebased to the current master and includes some corrections that became necessary after rebasing.
**Copied from description of PR #17043:**
We had four versions of pre-built bootloaders for the esp8266 with different settings of logging and color logging. These bootloaders were manually built from the SDK and shipped with RIOT-OS source code. However there are more settings that affect the bootloader build that are relevant to the app or final board that uses this bootloader. In particular, flash size and flash speed is important for the bootloader to be able to load an app from a large partition table at the fastest speed supported by the board layout and flash chip.
Another example is the UART baudrate of the logging output from the bootloader. The boot ROM will normally start at a baud rate of 74880 (depending on the crystal installed), so it might make sense to keep the UART output at the same speed so we can debug boot modes and bootloader with the same terminal.
This patch builds the `bootloader.bin` file from the ESP8266 SDK source code. The code is built as a module (`esp8266_bootloader`) which at the moment doesn't generate any object code for the application and only produces a `bootloader.bin` file set to the `BOOTLOADER_BIN` make variable for the `esptool.inc.mk` to flash.
The code needs to be compiled and linked with custom rules defined in the module's Makefile since the `bootloader.bin` is its own separate application.
The `BOOTLOADER_BIN` variable is changed from a path relative to the `$(RIOTCPU)/$(CPU)/bin/` directory to be full path. This makes it easier for applications or board to provide their own bootloader binary if needed.
As a result of building the bootloader from source we fixed the issue of having a large partition table.
### Testing procedure
Use following command to flash the application with STDIO UART baudrate of 115200 baud.
```
BAUD=74880 USEMODULE=esp_log_startup make -C tests/shell BOARD=esp8266-esp-12x flash
```
Connect with a terminal programm of your choice (unfortunatly `picocom` and `socat` don't support a baudrate close to 74880), for example:
```
python -m serial.tools.miniterm /dev/ttyUSB0 74880
```
On reset, the `esp8266-esp-12x` node shows the ROM bootloader log output
```
ets Jan 8 2013,rst cause:2, boot mode:(3,7)
load 0x40100000, len 6152, room 16
tail 8
chksum 0x6f
load 0x3ffe8008, len 24, room 0
tail 8
chksum 0x86
load 0x3ffe8020, len 3408, room 0
tail 0
chksum 0x79
```
as well as the second-stage bootloader built by this PR (`ESP-IDF v3.1-51-g913a06a9ac3`) at 74880 baudrate.
```
I (42) boot: ESP-IDF v3.1-51-g913a06a9ac3 2nd stage bootloader
I (42) boot: compile time 11:25:03
I (42) boot: SPI Speed : 26.7MHz
...
I (151) boot: Loaded app from partition at offset 0x10000
```
The application output is seen as garbage since the `esp8266-esp-12x` uses 115200 as baurate by default.
To see all output at a baudrate of 74880 baud, you can use the following command:
```
CFLAGS='-DSTDIO_UART_BAUDRATE=74880' BAUD=74880 USEMODULE=esp_log_startup make -C tests/shell BOARD=esp8266-esp-12x flash
```
If the application is built without options, the ROOM bootloader output will be 74880 baud and the second stage bootloader and application output will be 115200 baud.
### Issues/PRs references
Fixes issue #16402
Co-authored-by: iosabi <iosabi@protonmail.com>
Co-authored-by: Gunar Schorcht <gunar@schorcht.net>
Instead of using a fixed position of the image file in the flash, the variable `FLASHFILE_POS` is used which allows to override the default position of the image in the flash at 0x10000.
The parameters for parity and stop bits was confused, resulting in
the following compilation error with GCC 12.2.0:
/home/maribu/Repos/software/RIOT/cpu/esp_common/periph/uart.c: In function '_uart_config':
/home/maribu/Repos/software/RIOT/cpu/esp_common/periph/uart.c:394:61: error: implicit conversion from 'uart_stop_bits_t' to 'uart_parity_t' -Werror=enum-conversion]
394 | if (_uart_set_mode(uart, _uarts[uart].data, _uarts[uart].stop,
| ~~~~~~~~~~~~^~~~~
/home/maribu/Repos/software/RIOT/cpu/esp_common/periph/uart.c:395:42: error: implicit conversion from 'uart_parity_t' to 'uart_stop_bits_t' -Werror=enum-conversion]
395 | _uarts[uart].parity) != UART_OK) {
| ~~~~~~~~~~~~^~~~~~~
cc1: all warnings being treated as errors
This swaps the parameters.
If module `core_mutex_priority_inheritance` is enabled, the scheduling has to be active to lock/unlock the mutex/rmutex used by FreeRTOS semaphores. If scheduling is not active FreeRTOS semaphore function always succeed.
For ESP32x, the operations on recursive locking variables have to be guarded by disabling interrupts to prevent unintended context switches. For ESP8266, interrupts must not be disabled, otherwise the intended context switch doesn't work when trying to lock a rmutex that is already locked by another thread.
Dynamic allocation and initialization of the mutex used by a newlib locking variable must not be interrupted. Since a thread context switch can occur on exit from an ISR, the allocation and initialization of the mutex must be guarded by disabling interrupts. The same must be done for the release of such a locking variable.
When FreeRTOS semaphores, as required by ESP-IDF, are used together with `gnrc_netif`, RIOT may crash if `STATUS_RECEIVE_BLOCKED` is used as a blocking mechanism in the FreeRTOS adaptation layer. The reason for this is that `gnrc_netif` uses thread flags since PR #16748. If the `gnrc_netif` thread is blocked because of a FreeRTOS semaphore, and is thus in `STATUS_RECEIVE_BLOCKED` state, the `_msg_send` function will cause a crash because it then assumes that `target->wait_data` contains a pointer to a message of type `msg_t`, but by using thread flags it contains the flag mask. This situation can happen if the ESP hardware is used while another thread is sending something timer controlled to the `gnrc_netif` thread.
To solve this problem `STATUS_MUTEX_LOCKED` is used instead of `STATUS_RECEIVE_BLOCKED` and `STATUS_SEND_BLOCKED`
To reduce the required RAM in default configuration, the BLE interface is used as netdev_default instead of ESP-NOW. Further network interfaces can be enabled with the modules `esp_now`, `esp_wifi` or `esp_eth`.
A if `netdev_driver_t::confirm_send()` is provided, it provides the
new netdev API. However, detecting the API at runtime and handling
both API styles comes at a cost. This can be optimized in case only
new or only old style netdevs are in use.
To do so, this adds the pseudo modules `netdev_legacy_api` and
`netdev_new_api`. As right now no netdev actually implements the new
API, all netdevs pull in `netdev_legacy_api`. If `netdev_legacy_api` is
in used but `netdev_new_api` is not, we can safely assume at compile
time that only legacy netdevs are in use. Similar, if only
`netdev_new_api` is used, only support for the new API is needed. Only
when both are in use, run time checks are needed.
This provides two helper function to check for a netif if the
corresponding netdev implements the old or the new API. (With one
being the inverse of the other.) They are suitable for constant folding
when only new or only legacy devices are in use. Consequently, dead
branches should be eliminated by the optimizer.
The former FLASH_MODE_{DOUT,DIO,QOUT,QIO} defines are replaced by the corresponding CONFIG_FLASHMODE_{DOUT,DIO,QOUT,QIO} and CONFIG_ESPTOOLPY_FLASHMODE_{DOUT,DIO,QOUT,QIO} as used by the ESP-IDF. This is also needed for the migration of defining flash mode in Kconfig.
ESP32x SoC use either Xtensa cores or RISC-V cores. The Xtensa vendor code has to be compiled only for ESP32x SoCs that are Xtensa-based. Therefore, MODULE_XTENSA has to depend on HAS_ARCH_ESP_XTENSA instead of HAS_ARCH_ESP
Fixed delay values are replaced by calculated delays measured in CPU cycles in I2C software implementation. The advantage is that for each ESP SoC only a clock calibration offset has to be specified. The delay measured in CPU cycles are then then derived from current CPU frequency for the given bus speed. The disadvantage is that the calculated delays are not as precise as the predefined fixed delays.