This implementation is optimized for a little code and data size, not
for speed. IMO the code is more readable than in the reference
implementation.
The biggest advantage of ChaCha over other stream ciphers is the very
little data usage with only 64 bytes of context, and its good encryption
speed.
Also part of this PR is pseudo-random number generator, that just
returns the keystream of a randomly initialized ChaCha context.