12 February 2018
The bit-babbler 0.8 software release mostly brings improvements to interfacing with external systems. There were again no significant changes or needed bug fixes to the core functionality, just some more polishing to support a broader range of uses and environments.
A simple but significant tweak enables the
and QEMU hook to support any valid libvirt guest domain name, not just
those which are also valid as shell variable names. This change will not
break any existing configuration, it's a pure extension which allows you
to use something like
DOMAIN_NAME_foo="Fußball" to declare
that the configuration options for identifier
apply to the libvirt domain
Fußball instead of to one named
foo. This has become more important with libvirt itself actively
fixing bugs that prevented it from being able to reliably use such names.
Note that if you had previously installed the libvirt QEMU hook from an earlier version of this package, and you need this functionality, then you will need to manually update it to use the new revision from this release. Due to the fact that libvirt supports only a single QEMU hook file, for everything that the local admin might want to do in it, we can't safely do that automatically as part of the normal install process for the rest of the software.
The larger change in this release is that
seedd can now
also read its configuration options from a file instead of having to
pass them all explicitly on the command line. Originally we'd elected
to avoid having configuration files for this, partly to keep the code as
simple and easy to audit as possible, but also because the more options
that something provides to choose from, the more probable it becomes
that someone might choose poorly, or accidentally make some mistake they
hadn't intended. So we'd aimed to keep the number of options to a
minimum overall too. But the evolved reality, on two fronts, is that we
do now have enough options, that can be genuinely needed to cater for
real use cases, to make defining them in a configuration file become a
convenient and less error prone alternative to repeatedly typing them on
a command line. And that for users who will run this as a daemon under
systemd, it seems wrong to require them to jump through the
hoops needed to safely modify its configuration, just to tweak
some options which
seedd is run with (let alone needing to
jump through different hoops to do the same thing on different systems).
While there are some people who might
be comfortable doing that, it seems like the sort of FAQ-bait which we
long ago learned to always try hard to avoid. Particularly for ordinary
users who may not be familiar with the too-often surprising interactions
that can occur between the
unit which normal users should almost never need to modify, and to keep
our own configuration supplied by something simpler and more tightly
focussed on portably doing the one job that it is required to perform,
regardless of what process starts and manages the daemons that are
running on your system.
We're not going to get tangled up in the question of whether or not you should be using systemd. There are intelligent people (yeah, and sadly also some bozos…) with strong opinions at both extremes of that spectrum, and so like any other issue of portability, our job here is to support all of them as equally as we can, to the best degree that it is possible to do so.
For the people who… let's go with
enamoured with systemd, all you really need to know about it here is
that we don't depend on it, we won't depend on it, we aren't going to
force you to have it on your system, and nobody is going to be missing
any functionality from our software as a result of their own choices
regarding that. We still ship a SysV init script in the Debian
packaging, though since it isn't portable, other platforms still do need
to provide their own solution for that, as they always have before now.
And we will still welcome tested patches to optimally support other
platforms, also as we always have. In the same respectful way that we
are now providing fuller support for systemd to the people who do fall
somewhere on the other side of that spectrum, wherever they might stand
between using it and zealously evangelising it. There's no build or
runtime configuration option needed to turn it on or off. If
seedd is started by systemd, it will Work As Expected, and
if it is started by something other than systemd, it will still Work As
Expected, no matter what is providing your init process at the time.
Nobody gets a short straw here, and nobody has to ride in second class
due to this.
But now we are going to talk about systemd support for a bit. Because it is new with this release, at least as part of the package that you can download from us here. And there are a few special snowflake things too which are out of the ordinary enough to note, and overall things are a bit different now with respect to it, so anybody who had already added a unit locally or for their own distro packaging deserves the courtesy of some pointers to what has changed, before we accidentally stomp on each other's toes in that space.
The 0.8 release installs two service unit files by default, but it does not try to enable either of them. That is task is left to the local admin, or to policy determined by the distro packaging and its environment when the packaged version is installed.
seedd.service unit is responsible for
seedd daemon as early as possible in the boot
sequence, using configuration options read from the
/etc/bit-babbler/seedd.conf file. It will not try to start
it if that file does not exist (and it will fail to start if there is an
error in that configuration file). Typically it will actually be started
so early that
udev may not have announced the USB devices
to the system yet, but it will be ready to make use of them immediately
at the soonest opportunity when they do become available. The Debian
packages will enable this unit by default when systemd is playing the
role of init, and it is the equivalent of the existing SysV init script
which will do the same job when it is not. If
is set in the environment (as it will be when started by systemd with
seedd will send start up and
shut down progress notifications to that socket. It will limit the
seedd to the minimum required for normal
functioning when it is feeding seed entropy to the OS kernel and
providing a control socket for querying QA results (which is a superset
of those needed for all its other functionality).
udp-out option is used, or if the control socket
is provided on a TCP address rather than as a Unix Domain socket, then
you'll probably also want to enable the
since it will now also typically be started before any network
interfaces come up. It is not an error for there to be no devices
available at all when the service is started – in the same way
that the SysV init script always has, the daemon will happily still idle
and wait for them to be hotplugged. However changes to network
interfaces are not actively monitored and responded to in this code, and
these sockets are not managed by systemd (because they are not a trigger
for it to be
activated, they are secondary service interfaces
that it provides).
The ordering dependencies will ensure that
started before all ordinary services, and before as many things as
possible which may need good seed entropy (it will be started at the
same point in the boot sequence where systemd's own
systemd-random-seed attempts to load the seed entropy that
it saved to disk at the last clean shutdown), but the
seedd.service unit itself will not prevent the boot from
continuing (otherwise) normally if
seedd fails to start, or
if it can find no (properly working) device to obtain QA-verified fresh
entropy from. This is what most people would normally want to happen if
they are expecting a machine will only have a BitBabbler device plugged
into it occasionally, or if they do still expect to be able to perform
other ordinary tasks on it at any time when one is not, even if the
supporting software for them is still installed and remains configured
and running, waiting for devices to be hotplugged.
seedd-wait.service unit is provided for
people who really do need stronger guarantees of good seed entropy at
every boot. This one should not normally be enabled by default in
distro packaging that is intended for general use. It provides a
pass/fail sequence point which can be used to delay or prevent other
services, or even the whole machine, from starting normally until good
fresh seed entropy is able to be acquired.
The softest guarantee that it provides is obtained by simply enabling
it. This will potentially pause the boot sequence, delaying everything
which has been scheduled to start after
(which is most services other than
udev), until the OS
kernel pool has been seeded with a minimum quantity of QA checked fresh
entropy from the available BitBabbler device(s). It is still just a
soft guarantee, because it will only wait for up to 30 seconds to obtain
the needed seed entropy, after which it will timeout and fail, but then
allow the rest of the boot sequence to proceed normally. This means
that in normal operation, services which need good entropy will not be
started until after the kernel has definitely been seeded – but
that failure or absence of a BitBabbler device at boot will not entirely
prevent the machine from booting, it will just delay it until the
timeout expires. If you normally expect that a BitBabbler device will
always be attached, and have things which do benefit from having good
entropy immediately as soon as the system is booted, but which don't
depend on it so critically that they should not run without new seed
material obtained from the TRNG, then this is a reasonable balance to
meet those requirements. The extra time that it may add to booting when
a device is normally available should be minimal (and can be monitored
for each boot using
systemd-analyze plot). Mostly it
will depend on how quickly your system makes the USB devices available
A hard guarantee can be obtained for individual services or groups of
services by declaring them to have a
seedd-wait.service, which will prevent them from
starting at all if the initial seed entropy is not obtained. Or if the
entire system really should be prevented from booting normally if this
fails, then a failure of this service unit can be used to divert the
boot to an alternative to the normal default target. For example, by
using something like
OnFailure=emergency.target, the system
will boot into
single user mode if this test should fail for any
If you do plan to use the
seedd-wait.service in any of
the stronger configurations than just enabling it, then you probably are
still going to need to carefully read and understand at least some of
systemd's 240+ manual pages, for the version of systemd running on your
system, to properly get your head around all the subtleties of how its
dependencies are really calculated and intertwined in practice. And you
should carefully test the success, failure, and recovery restart
behaviour of what you do declare, because it can be surprisingly easy
(even for people who are familiar with all this) to accidentally create
dependency loops which systemd won't warn you about until it just
decides to not start something critical,
like your network. And there can be conditions where a service will
still be started even when something that it
failed, if there isn't also a direct and explicit ordering relationship
linking them too.
My personal favourite to date was inspecting a system which had
switched to the
single user mode
– from the comfort of the shells of the multiple remote users that
it had allowed to remain logged in after the switch… It turns
out that ending up in that state is a fairly trivial thing to do,
anything which reverts to the
emergency.target after early
boot can enable it to happen (and this isn't a simple bug as such, but
emergent feature of other design decisions). There are
lots of options here for ways you can configure your system to behave in
the event of a failure to obtain guaranteed good initial seed entropy
– but if it is Mission Critical, just be careful to test that
everything you do really does what you actually expect it to in all
circumstances, if you haven't already retuned your natural intuition to
precisely match how systemd actually behaves when satisfying its
calculated dependencies based on what you declared to it.
And of course, as we promised above, none of this extra functionality
that we've made easy to implement is actually systemd specific. The
seedd-wait.service is just a oneshot service wrapper around
making a call to
bbctl --waitfor, which is what does
the actual work of waiting for QA checked entropy to be provided to the
OS kernel. It can be used in the same ways we've described above from
any other init or service supervision system if desired. It's easy to
talk about using it with systemd here, because in theory what we've
described above should work the same way on any system where it is the
init process – but if people have recipes for other systems which
they think are worth sharing, or other requirements than what is already
possible to do with this now, then we'll be happy to include those for
other users too. For most people though, the appropriate thing to do is
going to largely depend on the exact requirements of the system that
they are doing it on anyway, but the overview here should give you some
ideas for what is possible if and when you need it.
24 July 2017
The bit-babbler 0.7 software release is now officially tagged and uploaded. For most existing users there is no strongly compelling urgency to update to this one, once again it's mostly portability improvements for new platforms and for new releases of previously supported platforms.
This one brings confirmed support for MacOS (tested on El Capitan
and Sierra), OpenBSD (tested on 6.1), and for the FreeBSD 11 release.
It also fixes a corner case seen by RHEL/CentOS 6 users which occurs if
they use a later version of
libusb than what was shipped
by default, with the default kernel (which contains partial backports
of later usbdevfs API functionality). And there's workarounds for a
systemd update which hit the shuffle button on filesystem
locations of system utilities, and for a quirk in
doesn't always really mean
Existing FreeBSD 10 users will probably notice the most significant
changes if/when they update to FreeBSD 11, since it added USB hotplug
support and so we also now support it on that platform too, via their
own implementation of the
libusb interfaces. USB hotplug
is not perfect there yet – there are notable delays (of around 4
seconds) before device notifications are sent by it, and again when
trying to close our connection to receiving them, and unplugging any
device which is actively in use can result in a deadlock occurring in
their own (apparently not threadsafe) implementation of
libusb_bulk_transfer(), but we have workarounds in our
code which should limit the effects of that from annoying users too
much until these problems do get fixed in FreeBSD itself.
We also had to disable a few of the optimisations which are normally
-O2 on FreeBSD 11, since with its version of
gcc they appear to miscompile some seemingly-harmless
constructs in a way that breaks proper stack unwinding and exception
handling. [If there are any FreeBSD developers reading this, we can
gladly provide all the gory details of these things to anyone interested
in fixing bugs in future releases of that OS. We've not seen this with
the default toolchains on any other OS release. For anyone casually
interested, there are more details about all this in the package
The OpenBSD 6.1 release likewise had a few system-specific bugs and
quirks we needed to work around. Most were just normal platform
variability, or related to its limited support for locales other than
en_US.UTF8, but we did discover a
significant bug in its
vfprintf() implementation, which
is also apparently not actually threadsafe in practice. POSIX says that
this function may be a thread cancellation point, and on OpenBSD 6.1 it
is implemented as one. However if a thread does get cancelled there,
then it can result in this leaving its internal
_thread_flockfile mutex locked, which means any future
call to it (or any other system function requiring that lock) will
Since we can't really test for the presence of that bug in any way useful to us, the current workaround for it is to simply always disable thread cancellation explicitly before calls to that function and instead test for cancellation requests ourselves outside of it. That at least is a complete, and otherwise future-safe, mitigation for this one. But it's still a bug in OpenBSD that other code could also hit. [So if any of its developers are reading this and need more information than that to find it, we'll be here for you too!]
Many thanks again to all the people who made requests for their preferred platforms and diligently tested the release candidates on them and reported any issues they saw. And for your patience while we shook out all the issues we could find on them before actually pushing this one out more widely as a formal and public release.
19 June 2017
Can I use the BitBabbler to generate random numbers between 1 and
N, for some value of N, is a question we've been asked often enough
now that it should probably get an entry in the FAQ.
But since many
people who might see that question would immediately think
it can, it's a random number generator, and not actually read the
answer – and since the actually correct answer is a little more
detailed than that, and since historically a lot of people really have
done that poorly, sometimes with quite significant consequences …
It seemed like it was worth putting something a little more detailed up
about it, which people who do need to do this, and are thinking about
it possibly for the first time, might be able to find. And thus dodge
the easy trap to fall into.
The BitBabbler itself just generates a potentially infinite stream of random bits. Since that equates to a potentially infinitely large number, then clearly it can generate random numbers in any arbitrary range that you like. Where it gets a bit more tricky is if you want all of those numbers to have an equal probability of being selected.
If the range of numbers you want is a perfect power of 2 in size, then this really is very trivial, you just take the number of bits you need directly from the BitBabbler's output. If you want a number between 0 and 7, or between 1 and 8, or between 3 and 10, you can just grab 4 bits from the BitBabbler, add the starting value of your range to that, and you're done. Every number in that range will have an equal probability of being selected.
By far the more common case when people ask this question though, is that they want some range of numbers which isn't a perfect power of 2. And a statistically significant number of the people who try to solve that problem for the first time will immediately reach for the obvious, easy, and wrong, solution of using the modulus function.
That's almost equally trivial to do, you just take the random numbers that you have, in whatever range they might be (so long as it is larger than the range you want), and clamp them to the range that you do want. And at first glance that works perfectly. Except for the bit about them all being equally probable. The problem seen there should be quickly obvious if we take a more careful look at what that really does.
Say you want numbers in the range 0 to 4 (or 1 to 5, it's the same problem here). You'll need at least 4 bits from the BitBabbler to obtain at least that many values, and if you clamp that range with a modulus of 5, then what you get looks like this:
|0 or 5||0||0.25|
|1 or 6||1||0.25|
|2 or 7||2||0.25|
You correctly get output only in the range desired, but the values 3 and 4 have only a 1 in 8 chance of being selected, while the other values instead all have a 2 in 8 probability. None of them have the 1 in 5 chance (probability 0.2) which would normally be expected. So it's clearly not the ideal distribution of outcomes for most purposes where random numbers are needed.
There are a few ways to avoid this problem when you need uniformly distributed random numbers in an arbitrary range. Each has its own set of pros and cons. In the case at hand though, where you have a plentiful supply of entropy available, then possibly the simplest option, which is easy to implement correctly and has a trivial proof of its correctness, is to do the following:
Using the same example as above, this would give you a 3 in 8 chance of needing to retry each attempt, but every value returned will have the same 1 in 5 chance of being any given number in the desired range.
There are other ways to do this which
waste less of the raw
entropy to make each selection, but if that's really a major concern
for some particular use case, then you almost surely have other
considerations which much be taken into special account too. So we'll
leave those as an exercise for the reader to research. This one is
only slightly less intuitive than using a modulus, while being
comparably simple to implement without programming error, and roughly
as fast to execute.
And for the benefit of people who just want a simple program to run
which will do exactly this for them, we've added a new example to our
software which does it in a general way for any selected range of
numbers, and which includes a self-test mode to both test the actual
implementation there and to reassure anyone that the results really do
look like they would expect them to for the range they want. You'll
find that in
doc/examples/random_int.pl of our next release.
6 December 2016
Supporting any platform where people needed entropy has always been an important part of this for us. The software was written with easy portability in mind – but as always for anything non-trivial, especially where interfacing to hardware is involved, we know this still means that some tweaking can be needed, both for new platforms and as existing ones gain new functionality (like USB hotplug support which still isn't as widely available on every OS yet as you might reasonably have expected it would be). We've mostly been doing that on demand, as people have requested support for their preferred system(s). In some cases though, we aren't always able to test those ourselves, and we do need those people to help by reporting what they see if things don't already Just Work out of the box for them.
MacOS has been one of the latter variety. We'd had a few queries about whether it was known to work there – but the last Apple device I'd owned was an Apple][ (which with some irony was what set me on the path of All Things Open Source from an early age – it being such an awesome enabler to have the computer's full schematics and the source to its ROM available in the back of it's manual), and nobody else here currently owns a Mac either. It wasn't until recently that a few of those people did in fact step up to report on some real testing with it.
So thanks to their help, we can now confirm this really is known to work under MacOSX. The needed changes to the software were mostly minor, we did already have it working on FreeBSD and the differences to that weren't major. Just some small things missing or done slightly differently.
One thing MacOS did have which FreeBSD 10 didn't, was a
documented interface for feeding entropy to the kernel. Apple's
manual page described this and the SecurityServer daemon which used it.
Adding support for that was simple enough, but what was missing from
it was a documented way to know when its kernel actually wanted more
entropy. Fortunately, the source for the kernel is freely available,
so the next step was to grab that and see if there was in fact an
undocumented way to do it, since it seemed probable that any sensible
implementation of Apple's own SecurityServer would also want to have
something like that available to it …
But what we instead found in there was a very different kind of revelation which caught all of us by surprise. It turned out that things didn't actually work in the way which Apple's documentation had indicated they would at all. And the more that we looked, the wider that gap quite evidently was.
If you examine the kernel source for the 10.12 Sierra release
(xnu-3789.1.32), you'll find that in the file
bsd/dev/random/randomdev.c there is indeed code to handle
a write to the
random_write() function takes chunks of up to 256
bytes at a time from what we give it and then passes them off to the
write_random(). That function is found in the
osfmk/prng/random.c where we see it has the following
Oops. We'd been noting since the beginning that there was little point in feeding bits to the Windows CryptoAPI, since with it being a black box there was no way to know what if anything it would do with them, and that (only half-jokingly) it was quite possible it would simply throw them away and do nothing at all with them. But it was still a lot more stunning to really find ourselves looking at an auditable implementation which was in fact doing exactly that. And almost equally surprising that nobody else had already been pointing at this anywhere else that we could see.
So we got curious for some kind of explanation as to how and when this came to be the apparently Forgotten Work In Progress that we were now staring at in dissipating disbelief.
An archaeological dig into the publicly available record would seem
to show that
random_write was added in xnu-201 (MacOS 10.1),
along with using Yarrow to replace the
MINSTD LCG which in
10.0 was the only source of kernel
random numbers. Things then
remained that way, with writes to
/dev/random being the
only source of kernel entropy (aside from initial seeding using the
system clock at boot), until xnu-2782.1.97 (MacOS 10.10) when the
kernel Yarrow implementation was moved to
refactored to be used as a pluggable PRNG
In that release, the initial random seed is now taken from a
device-tree property set up by the boot loader, mixed with an initial
clock timestamp again. This was the first release of the MacOS kernel
to harvest entropy directly from system interrupts, and it does so by
mixing in the lower 32 bits of the TSC cycle counter, reading it at the
time they are handled. It does this indiscriminately for all interrupts
that are raised on the
master CPU. It presumably then also got
rid of the SecurityServer daemon, since this release also added the
write_random that we see now.
From there, things seem to be essentially unchanged right up to the present time (the 10.12 Sierra release) in the publicly available source.
So basically, unless you patch your kernel, there really isn't any point to feeding bits directly to it on MacOS either. Like the users on Windows systems, if you need strong entropy, you'll want to take it directly from the BitBabbler device itself rather than filtering that through the OS kernel's own pool.
But we can at least now definitely confirm that the BitBabbler devices and our supporting software have been tested and verified on the El Capitan and Sierra releases of MacOS, and that all other modes of operation are indeed working reliably and exactly as they are expected to.
The code to feed a MacOS kernel will still be included in our software, for the benefit of anyone who is keen to experiment with patching their kernel to make use of it.
It's probably worth noting that if you do wish to do that, then it
isn't enough to just uncomment the disabled
code that already exists in
write_random. If that wasn't
already obvious, it would be quickly enough when it failed to compile.
What is currently in there looks a lot like a quick cut'n'paste sketch
of what it ought to do, which was then commented out to get it all to
compile, and ostensibly forgotten about again when a release deadline
From an untested, by-eye analysis though, it should be enough to just
buffer. For future versions
of the kernel it may be desirable to wrap the call to
ccdrbg_reseed with the
PRNG_CCDRBG macro, as
is done in the
Reseed() function, but so long as this is
still using the (deprecated) Yarrow PRNG, that macro is a noop anyway.
Similarly, checking the return value from
isn't strictly needed there, since calls to the
yarrow_reseed() function will always return
CCDRBG_STATUS_OK, but some future implementation could
possibly fail to perform a reseed request and return a real error
It does seem fairly clear from what we've seen that the MacOS kernel would benefit from having a more reliable source of good entropy than just the CPU clock count when interrupts occur. Especially since, in some cases at least, the number of relatively predictable periodic interrupts could easily dominate any more randomly occurring ones.
But for people who really do want or need the strongest guarantees,
this may still not be enough. While trying to confirm that
write_random could reasonably be patched, we started to
notice a few more things in the MacOS kernel which didn't look quite
right at second glance either …
Two things are certain, Death and Taxonomy errors.
It is commonly claimed, and seemingly accepted as gospel truth, that
uses the Yarrow PRNG algorithm. And indeed, if you take
a casual look at its kernel source, you will find things that are
referred to there as
Yarrow. On a closer inspection though, it
quickly becomes obvious that almost all of the significant components
that are described as essential parts of the design in the
appear to be missing from it.
write_randomwhich is either deliberately disabled, or simply doesn't actually do anything at all in practice.
What is actually implemented instead is more like this:
RESEED_BYTES) have been output by the generator a reseed is forced using the current content of that circular buffer (without consideration as to whether any new entropy at all has been added to it since the last reseed). The circular buffer is then churned by XORing each 32bit word in it with the one before it in the buffer.
So how did this happen? How did something claiming to implement Yarrow manage to look almost nothing at all like the published algorithm aside from using SHA1 in its entropy accumulator stage? To answer that, we needed to dig deeper into the origins of Yarrow itself.
page, we find not only the Yarrow paper, but also a link to some source
code that is noted to implement
an older version of Yarrow, not the
one specified in the paper. And a quick look at that makes all the
pieces of this puzzle start to rapidly fall into place.
Although MacOS switched from an LCG to
Yarrow well after the
formal paper defining it was published, they instead built their
implementation of it on a copy of this experimental prerelease source
from 1998, which was being used to test and refine the ideas behind
what would ultimately become the Yarrow CSPRNG.
Even today, the core functions of what they are shipping are still mostly verbatim from what was in the Counterpane 0.8.71 source, but there are a few notable differences, beyond simply relicensing that public domain software under their own terms. Changes which mutate this into something significantly different again from the design that Counterpane had been initially experimenting with …
While the original was designed to take entropy input conservatively
from multiple sources, the MacOS version only provides a single entropy
source. The MacOS version then also disables the entropy estimation
functionality (since it couldn't run the embedded
code in kernel space). It disables the
slow poll sourcing of
entropy. It disables the checking of whether there is sufficient
entropy in the pool to safely perform a reseed (and instead always
forces them to occur arbitrarily). And it seemingly ignores the
documented warning noted in the original:
The biggest concern in the current design is the frequency with which reseed will be possible. For the suggested threshold value of 100 bits, only 12 bytes of output are guaranteed to be absolutely secure under this system. If this much entropy can not be acquired quickly enough (remembering that we are using a very conservative estimate of our entropy), the outputted keys,hashes,etc. could possibly be attacked more efficiently by brute-force cracking the generator state.
The current assumption (read: hope) is that those who are demanding values from the PRNG at a high rate are also producing entropy at a similar rate, or will be willing to wait longer for their values and allow a slow poll. This will need to be examined in light of the results of the testing of the quality of our current entropy sources, which is still underway (more details upon request).
Given the changes made to strengthen the design of Yarrow that are in the published paper (having separate fast and slow pools, using a strong block cipher for the output generator, emphasising the importance of not optimistically estimating entropy) it would seem that at least some of their remaining concerns could not be confidently dismissed.
Either way, we are still well short of demonstrating a trivial
starvation attack (or any other) on the MacOS
device at this point, but there's certainly plenty of low hanging fruit
for anyone who did want to pursue a more detailed analysis of it. I'd
certainly be interested in seeing if XORing the TSC with itself as the
only source of entropy amplifies any real correlation which may occur
with that in practice under some conditions. Analysing a raw dump of
that could be an easy and interesting thing to explore. But what we do
know without doubt is that there's now decades of new research to draw
from since any of this was even anywhere near being anything like best
And I'd certainly be newly cautious of the advice commonly given,
unlike Linux, it's safe to just read as much entropy as you
want from this device. An easy fix here is simple and obvious
though. Fortuna was published in 2003 as a replacement for Yarrow that
eliminated even more of the concerns its authors had with reliably
doing this securely. And that was before
SHA1 itself fell under a
more strongly proven cloud of suspicion too.
I can't say for certain why Apple have not yet cared to give more of their attention to this, but it's possible that a shouty comment found in their source could be a clue:
We've since learned that the
random(4) manpage was
actually patched in Sierra, removing references to the SecurityServer
and to the ability to write to
/dev/random to contribute
more entropy to the system. It also dropped the warning about the
quality of its output being dependent on a sufficient supply of good
entropy being available (a condition which hasn't actually changed).
And it adds a curious new recommendation … that using
arc4random(3) function should be preferred instead.
Which can only be described as an interesting suggestion to make in
late 2016 given that
RFC 7465 was
published in early 2015, and that known weaknesses in RC4 date back
to at least 1995.
The obvious initial hope upon reading this was that, like
OpenBSD 5.5, they had in fact replaced the actual implementation
of it with something a bit stronger than RC4 (OpenBSD switched theirs
to ChaCha20) – however if you believe
and Apple's online manual pages again, then
as of February 2015
at least, they were indeed still using RC4.
So I'm going to stop looking now. I think I am sufficiently convinced by exploring this that disaster fatigue is definitely a Real Thing. If anyone has some better news for Mac users about any of this, I'd certainly be glad to share that here as a further update on it in the future.
22 November 2016
The bit-babbler 0.6 software release is now available. This one mainly contains portability fixes for more platforms and for systems still using older versions of udev. For existing users where the previous releases have been working fine, the only possibly interesting changes in this one are a fix for the normalisation of QA statistics when processing large amounts of entropy on 32-bit systems, and a fix to the framing sanity check that is needed if the device is plugged into a USB 1.x port, for anyone who happens to still actually have one of those.
If you're using this on a 32-bit system and are likely to pull more than 2GB out of the device between restarts of the software, then we do recommend you update to this release, but otherwise you're unlikely to notice any real difference.
Many thanks to all the people who tested this on different systems and gave us good feedback on their needs and experiences. It's nice to have so many people share our interest in checking this all over as diligently as possible.
18 January 2016
We've grown a lot of love for using virtual machines over the last few years. The number of things that they make easier and better is far too long to list here. But dealing directly with hardware is not yet one of them. And gathering good entropy in them has notable issues too.
Part of the trouble with obtaining real entropy in a virtualised environment, is that a VM usually is deliberately isolated from the hardware on the host. Which means that most of the physical sources of unpredictable events which the kernel will normally try to collect entropy from, are all now mediated by separately scheduled software – posing a big question about just how unpredictable they really still might be. And that's before we wonder what sort of correlations might occur with other VM guests that are running on the same host. The generally accepted solution is that one way or another we need a defined mechanism to import unique entropy from outside the software running the VM, and there's a few ways we can do that.
Things have surely improved since we last looked at it (the Linux
kernel commit logs certainly say they have), but our initial attempts
virtio-rng as a way to import entropy from the host
machine into guests left a fair bit to be desired. Like, not crashing
the system would have been nice. As would not greedily draining the
host completely dry even when the guest was ostensibly idle. But lots
of things related to this are a work in progress, and I don't have much
of a right to grumble there, since we didn't dig deeply into debugging
it further, or report it beyond commenting on it in IRC and not seeing
much interest in more details. Which isn't ideal, but fixes to the
kernel or QEMU would take time to become widely available, and we needed
an answer which could work with what we had there and then. It's an
unfortunate truth that we won't live long enough to chase every bug we
see in someone else's code, and we can't just tell all of our users
you need to be on a bleeding edge kernel, so we need to
pick our pursuits wisely,
and be diligent at doing our bit to keep our own house clean of them
on all of the systems that our users really do need to support.
But the free software model works if everyone has the right measure of patience with others and scratching their own itches, so we needed a different plan to tie us over until that was ready for more general use. And we had a related itch we'd already started scratching at.
Since we'd already started experiments on what would become the
BitBabbler hardware, the best answer there for us also already seemed
clear. We just needed to be able to use them directly inside guest
machines too. Which of course then made the
VMs dealing directly
with hardware problem become very much our problem too (though we
already had an existing interest in that for our
telephony hardware and other
things as well).
I'm not going to go into too much detail on that here either, mainly because it's a Long Story, and if you really want to hear it (or even if you don't!) you'll find it in the documentation of how to set this up in the software package. And because this time I do plan to find the time to open another discussion about how we can improve this with the libvirt developers, so I don't want to get sick of repeating it before that has fully run its course.
The short version is, we now have a pretty close approximation to full USB hotplug functionality in libvirt managed KVM/QEMU virtual machines. It's not exactly what I'd call pretty on the inside, but it is easy to use and administer, and more robust and reliable than the previous set of hacks which we were using for this, and it makes BitBabbler devices assigned to guest machines behave just like you'd expect them to when using them from the host. No matter when you plug them in or remove them, or when you start or stop the guest.
The next step now is to try and get the missing functionality that we need supported more directly by libvirt, and we at least have a clear demonstration of why it's needed there and what sort of awful things people need to do if it isn't, which hopefully will help with that. Or at the very least we have something people can point at when explaining how silly we've been to miss the obvious easy answer that we should have been using instead. Then we can fix that and everyone wins from refining an example of current best practice.
But in the meantime, if this is something you need too, install the
bit-babbler 0.5 release, and have
a look at the
bbvirt(1) man page and/or the
virtual_machines document (in the
directory of the source, or
directory that the Debian binary packages install). You'll find the
longer story there, and a quick-start guide to getting it up and
running as painlessly as possible.
6 January 2016
Evolution. It's life's unrolling game where either you grow into your environment, or it grows all over you. Where even the rocks end up different to how they started – regardless of whether they'd ever gathered any moss or not.
Our starting point was wanting a cryptographically secure,
high-quality entropy source that wouldn't starve and stall the system
under heavy demand. So that naturally shaped our initial assumptions
in both the supporting software and the hardware. The early focus in
the software was largely on keeping throughput up, latency down, and
having a regular supply of fresh entropy still being drawn from the
hardware, analysed for anomalies, and mixed into the pools, even when
it wasn't all being consumed from them (since otherwise, it would just
going to waste).
But there's also another species of important uses here too, where
the primary interest is avoiding a different kind of waste. Wasted
power. And I don't mean in the
What have the Romans ever done for
It's not like we actually draw a lot of it, even under peak usage, but in commercial data centers small numbers can have large multipliers, and there's also a growing interest in very low power home servers and in optimising them to be as efficient as they possibly can be. Where even if the current drawn by the device itself is low, waking the CPU to read from it when the system would otherwise be completely idle is still a cost that some people would, quite reasonably, like to avoid.
The good news is, doing a good job of catering for that type of use too really isn't a very big stretch from where we already were. The BitBabbler hardware itself has support for being idled into a very low power consumption mode (on the scale of microamps). Kernel support for suspending USB devices and controllers at runtime is a thing. And the frequency at which we opportunistically refresh the entropy pools when they aren't being drained was already being set by internally configurable options. So mostly we just needed to expose some more knobs to let people select the desired behaviour that most suits their own use case.
And this is exactly what the first set of changes in the
bit-babbler 0.5 software release
add. If you install it using the Debian packages, there are new
udev rules which will enable the kernel autosuspend mode
for BitBabbler devices, and if you pass the
option to the daemon it will be much more conservative about reading
from the devices when there isn't demand for entropy, and release them
when they are idle so that the OS can suspend them (along with any
controllers or hubs they are connected to).
If you want more direct control, the options which that is an alias for are all individually configurable too. There's a few caveats to using it still – some USB controllers don't handle being suspended as well as they probably should, and if you're doing this with the devices connected to an XHCI (USB3) port, then using a recent kernel is advisable. But it's working well enough to push out for broader testing.
22 December 2015
The bit-babbler 0.4 software release is now tagged and uploaded. If you're using the packages which ship with Debian, it should be available from the mirrors for Sid by the time you read this, and should migrate to Stretch in about 5 days time if nobody finds something silly we missed.
This one was originally planned to be just a few minor tweaks to get it building for the BSDs (and built for the Debian kFreeBSD port), but the best laid plans and all that … Getting it to build was easy, getting it to work proved to be a somewhat more involved task. The kFreeBSD port had packages for libftdi built with libusb-0.1, but if you'd assumed, like we did, that this implied they actually worked there, then you'd have been about as surprised as we were when they simply didn't at all. Everything builds and runs, it just can't see any USB devices – which isn't much use to any of us.
It turns out that FreeBSD is one of the few targets which the libusb source that most platforms use isn't directly ported to. Mostly because the FreeBSD developers have their own USB library (which is not-confusingly-at-all also named libusb), but which fortunately does also provide an API that is compatible with the libusb in use elsewhere.
So after some gnashing of teeth, a quick trip through all the stages of grief, and a hasty rescheduling of all the other things I had planned for that week, we decided to bite the bullet and switch to using the libusb-1.0 API directly, and the platform native implementation of it.
With the benefit of hindsight now, there seems to be no doubt that this was time well spent. By taking direct control over the device ourselves rather than going through the libftdi abstraction, we've been able to simplify things considerably, improve the error reporting and handling if things go wrong, be more efficient with getting data out of the device so CPU usage is reduced and maximum throughput is increased by notable margins, and we've further minimised the barriers to porting this to new platforms now too.
By using libusb-1.0 we can support some features more widely that were previously only available when built with libudev, like being able to identify devices by their physical address on the USB bus, and having hotplug support – and we get better support and lots of bugfixes for platforms other than Linux. So this unanticipated cake turned out to have plenty of delicious icing on it.
Of course a major refactoring like this isn't entirely without risk and this new code hasn't yet had as much time in long term testing as the previous releases did, but it's been running on all the servers here for a few weeks now without obvious trouble, and it makes building this for Windows users much easier, makes using it on BSD possible, and fixes a few minor issues on some of the more obscure architectures that the Debian buildds shook out, so we think it's ready to get some broader testing by more people and on more of the platforms that they want to use.
18 December 2015
The idea of there being a test, which when run just once doesn't actually give you a right or wrong, pass or fail result; where any single result that it outputs could be an indication of a good or bad outcome; and where the only way to know which is which is to run the test many times, and then run it again on its own results … isn't something that's necessarily intuitive to people who haven't seen that sort of thing before and had time to think a bit about why that's how it works.
So as we've had more people taking an interest in digging deeper into the details of this, it's become apparent that this was something we'd probably touched on a bit too briefly for anyone who isn't already familiar with the nature of statistical testing methods. And since it seems like a fundamentally important detail which warrants more than just a footnote on the FAQ page, we've instead added a better introduction to this to the description of the tests from the ENT suite, though it's not specific to only the Chi-square test included there.
If you already know how goodness of fit and significance testing works, then there's probably not a lot we've said there which will be news for you, we have tried to keep it as simple and accessible as possible – though if you'd like to proof read it and point out anything that we can say more clearly / better / less wrongly, that would be a welcome contribution too! It's tricky to explain this briefly in a way that's both still readable to people who it's new for, and formally correct for people skilled in the arts, so there might be some loose language there we could still tighten up or improve on. But as a starting point it should at least give people some extra clues to run with and search for if they do want to learn more about that.
25 November 2015
Adrenaline. It's such a simple molecule, but it puts caffeine to shame in the effect it can have on our minds and bodies, and sometimes you don't even need to get up out of your seat to make it.
And apparently it doesn't matter how much of your life you've spent
overdosing on it by putting your body into places and situations that
people without a taste for it would rather avoid, and teaching your mind
to deal with that. It can still consume you with its effects almost
completely whenever your mind says, quietly or otherwise,
might be dangerous and what you're about to do next could get you into
some trouble that you don't want to have …
Are you sure you really want to do this? There's still time to
just turn around and back away quietly. If you think that would be
better … Is that what you really think?
The nagging voice of self-doubt. It can be both good for you or bad for you depending on when you listen to it. And adrenaline amplifies it from quiet nagging to insistent urging that won't be ignored, however you try.
For some strange reason, publishing new software still often does
that to me. Not always, but if it's something critical, where a mistake
could cause real data loss, or where it's used in
Failure Is Not An
Option situations or other deployments with real consequences for
not achieving that goal, or even just something that's completely new,
then definitely more often than not. And we have plenty of those sort
of users for our telephony gear, and for some of the other software
I've authored or maintain, so it's not like I don't get enough practice
at doing this.
strange because you can put me in the open door of an
aircraft at 15,000 feet, about to step out of it, alone or with a group
of some of the most completely crazy (and fun!) people that you're ever
likely to meet, and it's easy to be totally calm and controlled about
what I expect to happen next and what I need to do to make that all
happen according to plan.
But put my finger over the button that is about to upload a piece of software, that I'm responsible for making, with the potential for ill consequences to other people – and I might as well be locked in a cage with a sleeping tiger. I know what I need to do is get out of there alive, but is what I'm going to do next going to wake it up, grumpy, startled, or hungry, before I do?
The difference between those two situations probably is almost that simple. In one case I (think) I know everything that is going to happen next, and whether it does or not will be determined largely by what I do.
In the other there is a far more pure uncertainty.
No matter how careful I am, no matter how careful I've been, what's going to happen next is a function of what happens in other people's minds. Not my own. And nothing in the world is really as scary as that. No matter how many times I put out some new piece of software, and no matter how many times it's either well received or simply goes almost entirely unnoticed, and nothing terrible actually happens, it's still an act of stepping, irrevocably, into a great new unknown. A world of fresh surprises and problems to learn how to avoid.
And I still love the rush that comes with that, whatever it is that brings it on. It never gets old.
Which is all perhaps just a long winded way of saying bit-babbler 0.3 is now in the incoming queue for Debian Stretch. But I'm writing under the influence of adrenaline and I know it well. My heart is racing. My mind is flitting wildly through every possible thing we might have forgotten. And we now await your judgement. Whenever and however it comes. At the time and mood of your choosing.
We've done all the preparation for this that we can reasonably think of doing. And now we've stepped out of the door with it into the airflow. There's no going back. All that remains is to see if we really can land it safely, and not hurt anyone else.
16 November 2015
Just a quick heads-up for the people using USB passthrough to make
their devices available inside libvirt managed virtual machines. If
you're doing this on a system with
cgroups enabled, or
have updated your host machine to one (like if
is now your init system), then you'll need to make sure you've also
added the devices you are passing through to the
cgroup_device_acl array which is defined in
wherever that is done on your system).
There's more detail about all of that in the documentation for
configuring virtual machines in the software package – but if
you're wondering why passthrough suddenly stopped working after you
updated your host machine, this is probably the reason. The USB
devices aren't in the default set that the VMs are granted access to
cgroup access control is enabled. At least not until
we get the extra support discussed here
included in libvirt (at which point it should manage the needed
cgroup ACL itself).
Update: If you're using the
release or later, and
bbvirt to manage the passthrough,
then you don't need to do this anymore. The needed permission will be
managed automatically by libvirt.
5 November 2015
It's a lovely aphorism, for things that aren't mission-critical and for developers who prefer to outsource testing to their end users rather than spend their own precious time on boring things like that – but when it comes to hardware it's really more of a euphemism for wasting a lot of time and money on product recall and replacement.
So we've been rather publicity-shy, until we'd convinced ourselves that we'd checked, and checked the checks, and stuck our own fingers into all the places where something might lurk that would bite them – because the prize for being the first to have a massive embarrassing recall has long ago been won, and there really is nothing more tedious than having to rework large numbers of units because you'd made a stupid mistake that would have been easy to find and avoid if you'd actually tested it. We've been there in years gone by with other hardware, and it's not a place we ever want to go back to, even for a quick visit.
But we're well into testing the new production sized run of White devices that we did recently, and the results we're seeing for those are so far nicely consistent with what we saw from the prototype run. We've had some really good feedback from an excellent and diverse group of early adopters who already found us and wanted devices for their own, and so we're starting to feel a bit more comfortable that if there's something we've still missed, it's not going to be an instant show-stopper that will be a major pain to remedy.
The software is performing well, and we're not getting any requests that hint at it needing some sort of major redesign to really be useful for a wide range of people and applications. The sort of wish list things that are currently on the horizon should all fit into it quite well without disruption to anyone who has already deployed it if they update.
complaint we're now getting is along the lines of
why isn't this actually in Debian yet? Which is a good sign that
it really is time to fix that very shortly now. It's not that hard to
build your own packages of it, but it's still a lot more convenient to
not have to. Thanks to everyone who has been patient with us over that
and has given us good feedback on the version 0.2 snapshots. When we
get through the last of what's still pending for that (which isn't much
now), we'll tag version 0.3 and push that one out for inclusion in the
Debian Stretch release.
15 October 2015
It looks like we're going to need external power for the USB hubs if we want to run more than about 60 devices in most of the machines that we presently have set up for testing them – which wasn't an entirely unexpected limit with all things considered. The good news is, the hubs we bought have a socket for external power. The bad news is, after looking inside them, that socket is connected directly to the VBUS rail coming from the host motherboard, with no isolation for either it or the data line pull-up when they are running self-powered.
And where by
running, I mean
for the brief instant between
when you plug it in and when things probably go badly downhill from
there. Who lets these people design and build things for others to
use … Don't cross the streams isn't a hard design rule to
The happy news, is we can run 60 BitBabbler White devices, all streaming random bits out at the default maximum rate, in the same machine as we have four Octal ISDN cards running high load callgen testing. That's about 5.5 million phone calls a day on 960 telephony channels, 1.2 terabytes a day of audio processed, and 30 gigabytes an hour of raw entropy, all happily purring away together on a fairly cheap consumer-grade motherboard that we bought off the shelf a few years ago from the local computer store.
We like to use low-end hardware for routine stress testing, because if it all works peachy there, then scaling things up further, to be Carrier-grade and Web-scale like all the cool kids are, is just a matter of throwing as many dollars at the problem as it takes to feel like you are. We know our system won't sweat it.
28 September 2015
So the long term testing of the first hundred units we made has still been looking really good. Beyond what I'd even dared to hope for in fact. When doing the initial tuning to see just how fast we could clock bits out of these, we found some devices could be pushed notably harder than others before the quality of their output would start to degrade. But even with the fairly conservative defaults that we settled on, I'd been expecting that a few of the devices at the opposite end of that spectrum might eventually show some sign of weakness in their output if we just let them run for long enough and accumulated enough trillions of bits for that to finally become statistically significant in one or more of the QA tests.
But that hasn't been the case. We've been monitoring them continuously, graphing statistics on the QA tests with munin, and so far we've had exactly zero devices fail the long term testing. Which would normally be a really surprising result for just about any hardware project – until I'm reminded exactly how many prototypes we did actually build before doing this run, and the hell we put them all through before settling on a design to sample in larger quantities. So yeah, maybe not quite so surprising, but still a very pleasing result for the first batch.
This of course leaves us with only one sensible course of action. Build and test more of them! We've had lots of interest in the White devices, so we've ordered the parts to do another run of 500, and ordered a stack of extra USB hubs to fill with them once they come back from our fab. The next big test will be to see just how many of them we can cram into each machine in the rack before smoke starts pouring out of something or circuit breakers start tripping.
15 September 2015
And if we had any lingering doubt whatsoever in the eternal truth of that, it would have been quickly dispelled once more Windows users came along who actually did want to use the native build there!
Having not owned or had to write software for Windows now for … well, let's not think about how many years ago that was now; and since wine support for USB devices is still basically non-existent, it wasn't really much of a shock that there were a few teething problems still to sort out with that. But it was pleasantly close, and with the help of a very patient user who relayed details of what did and didn't work on their system we got this actually tested and confirmed working as expected there.
I'd almost forgotten just how many
amusing idiosyncrasies it
has with respect to otherwise standard functions, and either I really
have forgotten or it appears to have grown even more of them since I
wrote code for it last, but so far it appears to be working well and
we haven't had any new reports of trouble there yet.
28 July 2015
Well that didn't take long. Sorry BSD people, but the Windows users asked us to support their platform before you did. Which surprised me a little, but the squeaky wheel gets the grease, and so the first round of portability tweaks goes to them.
The response we've had from people so far has actually been rather awesome, thanks to all of you for the kind words of appreciation about the effort we've put into this and the suggestions for things it would also be useful to support. It caught us a bit off guard really, we'd barely had the website up for a week, and hadn't really told anybody about it except for our bank and shipping company (who wanted to see it before they'd talk to us about using the BitBabbler name with our accounts), when the first few people already started emailing us asking if we still had any we could sell. So getting the website completed, and posting updates here, has sort of played second fiddle to improving the software further in response to plenty of new user feedback.
The first major change was adding the ability obtain entropy directly
from the device output pool via a UDP socket too. Having only options
to send it to
stdout or to the kernel was fine for our own
needs, but neither of those were going to be much use for anyone wanting
to use this on Windows. This is also useful for more than just those
people though, since it means you now don't have to choose between using
a device to feed entropy to the kernel or reading raw bits from it
directly, you can just timeshare it to do both simultaneously if you
ever need that.
It also means you don't actually have to run it on Windows to use
the entropy from it in Windows applications. And so the first round of
porting this to Windows stopped without it ever actually being tested
there, with the BitBabbler instead running on a small ARM board, with a
minimal install of Debian on it, feeding entropy to Windows applications
over a private network segment. But the architectural changes that were
needed for that got done, and it was successfully building with the
mingw-w64 toolchain from Debian Stretch.
We've now added a
udev rule and a system group to the
Debian package, so that normal users without elevated privilege (other
than being placed in the
bit-babbler group) can access the
device directly. We got a lot of requests from people who wanted good
random numbers for purposes other than feeding entropy to the kernel,
so this will make things a bit more flexible and user-friendly for them