To prevent reordering of accesses to the interrupt control register when link
time optimization (LTO) is enabled, memory barriers are needed. Without LTO
calls to the external functions irq_disable(), irq_restore(), irq_enable() and
irq_is_in() have the same affect as compiler barriers, as the compiler is unable
to prove that reordering of memory accesses is safe (from a single-threaded
point of view). With LTO the compiler can easily prove that reordering is safe
from a single-threaded point of view: Thus, the compiler may move memory
accesses wrapped in irq_disable(), irq_restore() across those calls.
The memory barriers will have no effect on non-LTO builds.
Citing the doc of irq_enable():
@return Previous value of status register. [...]
On atmega however the new value of the status register is returned, not the one
prior to enabling interrupts.