Simulations, games, encryption, and statistical analyses all need random numbers. But just how random is your random number generator? DieHarder can help.

The Universe seems to be a pretty random place. It should therefore come as no surprise that *random numbers* are very useful to people performing computations to describe, study, simulate, or otherwise muck about with Universes, real or imagined.

Unfortunately, *real* random numbers aren’t terribly easy to come by. Computers, after all, compute. No matter the problem, a computer executes an *algorithm* that takes some sort of input â€” numbers of some form â€” and performs *completely deterministic things* to make new numbers. There is nothing *random* about any output generated by a computer program. (Even *hardware* RNGs often are measurably non-random.)

The notion of a software *random number generator* (RNG) is therefore a logical *oxymoron*. In fact, software RNGs are often called “pseudo-random number generators” to remind us of that fact.

Sometimes the “pseudo” part doesn’t matter much. For example, random numbers are used in computer games to ensure that each playing experience is unique. Yet the random numbers need not be perfect. As long as the numbers are random “enough” that players cannot observe any systematic pattern and exploit it in game play, everybody’s happy. There are other applications that are relatively tolerant of poor quality RNGs. In many cases, fast and relatively unpredictable are sufficient.

There are many other places, however, where “unpredictable” per se is *not* enough. Physicists, statisticians, and mathematicians studying processes using Monte Carlo simulation or generating Markov Chains of one sort or another need a supply of numbers that are *theoretically* random, as random as they can be. Otehrwise, the work may yield *the wrong answers!* *Encryption* is another place where randomness is crucial. According to information theory, an encrypted document should appear to be perfectly random, utterly lacking in meaningful *patterns* a would-be-cracker might exploit to crack the code.

So fine, you’re writing an application where the quality of random numbers used matters. Perhaps the speed of generation matters, too. How do you select the *best* RNG available and *test* it to verify its quality and speed? *DieHarder* to the rescue!

Testing Random Numbers and the Null Hypothesis

Before jumping in to DieHarder, let’s take a moment to understand how a RNG tester works. Suppose you have a “candidate” RNG that produces (say) unsigned integers that you want to test. Intuitively, if you take a whole sequence of those integers and assemble in them into one long bitstring *N*- bits long, you’d expect to find roughly as many 1s as 0s. The actual frequencies of 1s and 0s can be use as a *quantitative measure* of the quality of the generator.

In the case of 0s and 1s this is pretty simple. According to the *Central Limit Theorem*, if *N* is large, we expect the number of 1s (say) to be *normally distributed* with a mean of *N/2* and a standard deviation of \sqrt{N}. Without going into all the details, suppose you generate 10,000 bits from a candidate RNG to test, and then just count up the 1s.

If you get anywhere from 4,800 to 5,200 1s, you wouldn’t be too surprised. A deviation as large as 200 from the “expected” value of 5000 happens one in twenty such samples. On the other hand, if you got 4,700 or 5,300, you’d be a bit concerned. Such a large deviation happens roughly three times in a thousand, so it would be a bit surprising to get that result on the very first try. If you got (say) 4,000 1s, you’d almost certainly conclude that the RNG being tested is *biased* and produces more 0s than 1s.

This is all quite understandable. If you think of the 0s and 1s as being heads and tails of a supposedly “random” coin flip, well, you could get rich from a coin that produced 4,000 heads for every 6,000 tails, where you knew it and your opponent didn’t.

Now that you understand the basic idea of *quantitative* testing, let’s summarize it as an algorithm. (If you are confused by any of the terms used below, remember, *Wikipedia* is your friend.) You:

**Make the ***null hypothesis* and assume that the RNG to be tested is a perfect RNG.
**Pick a statistic** that can be generated using a stream of presumed random numbers, such as the expected (mean) number of 1s in the example above. One needs to be able to compute the probability of obtaining any *deviation* of this statistic given the null hypothesis, using the properties of a normal distribution or a chi-squared distribution or even more sophisticated methodology.
**Run a trial.** Generate a string of presumed random numbers, use them to evaluate the test statistic, and convert the result to a ” *p*- value,” the probability of getting this particular result given the null hypothesis.
**Iterate** to accumulate many independent *p*- values from the RNG stream. The distribution of these *p*- values should be *uniform*.
**Perform a Kolmogorov-Smirnov-Kuiper** (KSK) test on these *p*- values to determine how uniformly distributed they in fact are. This too yields a *p*- value for the overall test run.
**Accept or reject** the null hypothesis.

A large number of such tests can be constructed, and each may discover distinct flaws among RNGs such that no single test is likely to be the best for all cases. That’s all there is to it!

DieHard and the NIST STS

Random number testing is hardly new. In fact, it predates modern computers– the general algorithm given above was described in papers by Kendall and Babington-Smith back in 1938, along with a number of target statistics. The tests were applied by hand to candidate sequences of digits. Random number generation and testing is, unsurprisingly, also discussed extensively by (who else?) Donald Knuth in *Volume 2, Semi-numerical Algorithms* of the famous *The Art of Computer Programming* series, where the full history of random number generation and testing is explored in more detail than can be presented here.

However, the man who is widely viewed as the father of serious random number testing for computational purposes is George Marsaglia (a man who clearly has a sense of humor). Marsaglia wrote, and named, the *DieHard Battery of Random Number Tests* (http://www.csis.hku.hk/~diehard), a suite of tests that can bring many software RNGs– and a few hardware RNGs– to their figurative knees. DieHard is considered the “gold-standard” of computational RNG testers.

However, DieHard has several flaws that have kept it from being as widely used as it might be:

- It was originally written in monolithic Fortran.
- In some cases, its target statistics were themselves numerically evaluated via simulation with the best computers and RNGs available– fifteen to twenty years ago.
- DieHard lacks a copyright notice or license in its otherwise openly published code.
- Target statistics that are sometimes themselves known via simulation with the best computers and RNGs available– fifteen to twenty years ago.
- Input is limited to a fixed-size file, where the file must contain 10 million bytes worth of (presumably) random bits for all the tests to run.
- There aren no user-selectable parameters to study
*more* than 2.5 million unsigned integers at a time or otherwise increase the “resolution” of the tests.
- There is a lack of consistency in the way the final decision is made, based on many independent runs of one test, or a single run of another.

To be frank, selecting *files* of candidate random numbers on the basis of “passing Diehard” (the use to which Diehard was apparently engineered) *guarantees* that the files of random numbers thus selected are *not* random in aggregate. You can correctly study and select *RNG* s with a tool like this and then accept the output of the best RNG you find, whatever that output might be. You accept or reject a particular *sequence* output by any RNG (good or bad) only at great peril of introducing bias.

In addition to DieHard, RNGs could be tested using the *Statistical Test Suite* (STS, http://csrc.nist.gov/rng). This uses the same general test algorithm described above, but focuses more on bitlevel randomness for *cryptographic* applications. The STS is fully open source, too, but also has flaws that have prevented it from being widely adopted.

At the time of this writing, there are easily a hundred software RNGs available that have been implemented in open source code or described in the literature. There are over sixty of them in the *Gnu Scientific Library* (GSL) alone. A modern CPU and RNG can generate well over a billion “random” bits per second.

DieHard’s standard input file size for an *entire test run* represents less than 0.1 second’s worth of production on pretty much any system likely to be used in a numerical computation such as a simulation. A simulation on a modern compute *cluster* can consume as many as 10^18 random numbers, around a trillion times as many as are represented in a file this size. There might well be important failures in RNGs that are required to produce 10^18 random unsigned integers that are not revealed by any test conducted on only 2.5 \times 10^6 unsigned integers.

For a RNG tester to be able to provide meaningful guidance to researchers with a wide range of RNGs available that *all “pass” DieHard*, it is necessary to be able to *increase* the power of the tests and test *much longer* sequences of random numbers.

What’s needed is a fully open source, parametrically variable RNG tester, one designed to test *RNG* s, not *particular sequences of numbers in a file*, one that would make weak RNGs that still “passed DieHard” die *harder*.

DieHarder

Here are the initial design criteria for DieHarder:

- It must (eventually) re-implement the tests in DieHard and the STS, permitting the “resolving power” to be parametrically increased.
- It must be
*extensible* in a variety of ways. In particular, it should be easy to add *new* tests within the common and consistently implemented general test framework.
- It must
*integrate* with the software RNGs being tested wherever possible so that one could run a test for a full day if there was ever any point in doing so, and of course to be extensible here as well– easy to add new RNGs.
- It must be written in the
*C* programming language and developed on a Linux-based platform. It must be *good* C code: modular in design, packageable, well documented, portable, version controlled, and so on.
- It must be fully GPL, publicly available code, version controlled from the very beginning to document the entire development process and to facilitate community participation.

This plan was actually fairly easy to accomplish, given a decent toolset and templates. The GSL RNG framework “instantly” provided a consistent interface to sixty-plus RNGs, all ready to test and time. It’s fairly easy to add RNGs of your own to the GSL at runtime and call them using its standard interface, so DieHarder includes some additional, home-grown RNGs.

The code also includes 23 functional tests. Currently, all the tests from DieHard, two tests from the STS, two new tests (one of which is a fiendish and powerful extension of an STS test that promises to one day “eat” a half-dozen or so of the existing tests and make them obsolete) are implemented, with more under development. There is even an all-important *timing* test, since somebody interested in implementing a GSL RNG in an actual computation is likely to be as interested in how *fast* the RNG is as how *good* it is– given the choice of two equally good RNGs, the faster one is the obvious choice.

The entire test suite can be run with a set of default parameters against any of the RNGs, with the proviso that if one uses the GSL-compatible file interface to input file-based random numbers, it is up to the user to be sure that the file is large enough not to run out! Individual tests applied to individual RNGs can be *parametrically varied* (where and how the statistic being generated permits it). In particular, one can always increase the number of *p*- samples from the default of 100 (insufficient in many cases to consistently discover a failure) to (say) 1,000, where failure might be unambiguous and consistent instead of marginal and occasional.

How Random is Random?

Let’s look at the output from testing two RNGs from the GSL â€” *mt19937_1999* (an “excellent” one) and *ran0* (a simple, linear congruential generator, known to have hyperplanar correlations). Let’s test both with the “Minimum Distance” DieHard test, which more or less directly looks for hyperplanar correlations in two dimensions.

Because `ran0`

often *passes* the DieHard test when run with the default parameters (sometimes with a weak final value of *p*, but not one *so* weak that one would generally reject the null hypothesis), let’s run it to collect 500 *p*- values instead of the DieHard default of 100. This illustrates the increased power of DieHarder to identify weak generators unambiguously. *Figure One* has the results.

*FIGURE ONE: The results of testing the *ran0* random number generator with *DieHarder

rgb@metatron|B:1071$ dieharder -g 16 -d 12 -p 500
#=============================================================
# Run Details
# Random number generator tested: ran0
# Samples per test pvalue = 8000 (test default is 8000)
# P-values in final KS test = 500 (test default is 100)
#=============================================================
# Histogram of p-values
# Counting histogram bins, binscale = 0.100000
# 120| | | | | | | | | | |
# | | | | | | | | | | |
# 108| | | |****| | | | | | |
# | | | |****| | | | | | |
# 96| | | |****| | | | | | |
# | | | |****| | | | | | |
# 84| | | |****| | | | | | |
# | | | |****| | | | | | |
# 72| | | |****| | | |****| | |
# |****| | |****| | | |****| | |
# 60|****| | |****| | | |****| | |
# |****| | |****| | | |****| |****|
# 48|****| | |****| | | |****| |****|
# |****| | |****| |****| |****| |****|
# 36|****| | |****|****|****| |****|****|****|
# |****| | |****|****|****| |****|****|****|
# 24|****|****| |****|****|****| |****|****|****|
# |****|****|****|****|****|****| |****|****|****|
# 12|****|****|****|****|****|****|****|****|****|****|
# |****|****|****|****|****|****|****|****|****|****|
# |--------------------------------------------------
# | 0.1| 0.2| 0.3| 0.4| 0.5| 0.6| 0.7| 0.8| 0.9| 1.0|
#=============================================================
# Results
Kuiper KS: p = 0.000000
Assessment: FAILED at < 0.01% for Diehard Minimum Distance (2d Circle) Test

You can see in the histogram in *Figure One* that *p* is not terribly uniform. The final result relays that a “good” RNG would have produced the observed string of numbers produced by *ran0* no more often than one time in a million! This more than justifies rejecting the null hypothesis *and ran0* itself, at least in any code where its weakness might matter, especially given that *mt19937_1999* is almost as fast and *much* better, as demonstrated in *Figure Two.*

*FIGURE TWO: The results of testing *mt19937_1999* with DieHarder*

#=============================================================
# Run Details
# Random number generator tested: mt19937_1999
# Samples per test pvalue = 8000 (test default is 8000)
# P-values in final KS test = 500 (test default is 100)
#============================================================
# Histogram of p-values
# Counting histogram bins, binscale = 0.100000
# 100| | | | | | | | | | |
# | | | | | | | | | | |
# 90| | | | | | | | | | |
# | | | | | | | | | | |
# 80| | | | | | | | | | |
# | | | | | | | | | | |
# 70| | | | | | | | | | |
# | | | |****| | | | | | |
# 60| | | |****| | | | | | |
# | | | |****|****| | | | | |
# 50| |****| |****|****|****| | | | |
# |****|****| |****|****|****|****| |****| |
# 40|****|****|****|****|****|****|****|****|****|****|
# |****|****|****|****|****|****|****|****|****|****|
# 30|****|****|****|****|****|****|****|****|****|****|
# |****|****|****|****|****|****|****|****|****|****|
# 20|****|****|****|****|****|****|****|****|****|****|
# |****|****|****|****|****|****|****|****|****|****|
# 10|****|****|****|****|****|****|****|****|****|****|
# |****|****|****|****|****|****|****|****|****|****|
# |--------------------------------------------------
# | 0.1| 0.2| 0.3| 0.4| 0.5| 0.6| 0.7| 0.8| 0.9| 1.0|
#=============================================================
# Results
Kuiper KS: p = 0.125826
Assessment: PASSED at > 5% for Diehard Minimum Distance (2d Circle) Test

This is a very comfortable margin of “passing”, by the way. It is a mistake to conclude that “bigger *p* is better”. *p* should be distributed *uniformly*, so if you run the test 100 times with a “perfect” RNG, expect to get a final *p* less than 0.01 one time.

RDieHarder

One advantage of an open source project is that you pick up friends along the way. People start to use the tool you’ve built. Some of them discover problems and report them, helping you to improve it. Others see how the tool could be changed and made useful in a whole new way.

This was what happened when Dirk Eddelbuettel, the Debian maintainer for *R* and the GSL, came across DieHarder while looking for a tester for his *R* interface to the “true random” numbers produced by `http://www.random.org`

, which uses atmospheric noise as a source of entropy.

After trying it out and reporting various bugs and so on, he asked for a packaged version of DieHarder for Debian and an interface for R. After all, he reasoned, *R* has a slew of powerful statistical tools that one could use to analyze the output of a RNG tester beyond what was already written, but has no integrated RNG tester. Thus, *libdieharder* came into being. A Debian dieharder package came into being. Independent RPMs for both libdieharder and dieharder appeared at about the same time (and should hopefully make it into Fedora Core sometime this summer.

Indeed, Dirk produced RDieHarder and it actually works, pretty much perfectly! You can read a Learned Paper on the result (with Dirk doing most of the writing) that Dirk may present at the inaugural North American useR! meeting. One can generate DieHarder RNG test results directly into *R* data structures and plot them or analyze them further using *R*‘ s powerful command set.

Conclusion

People who consume a lot of random numbers code and are inspired to test them and benchmark comparative speed at the same time take note: DieHarder exists, works pretty well already and is getting better all the time. In a matter of a few months one should be able to install it directly in any major distribution via *apt-get* or *yum,* although you may prefer to build it from source to be able to monkey around with its guts a bit. DieHarder provides a consistent and user-modifiable interface that you can use to test RNGs for randomness. It’s not perfect, but it’s not bad, and is easy to augment or change.

The more interesting conclusion for the rest of you may be the unsurprising observation that *the open Source development process works!* A GPL *does* matter, especially on tools that are intended to support research and indeed be objects of or vehicles for research in their own right. There are tens to hundreds of at least occasional DieHarder users around the world already, and quite a number of them have said that they, too, were made uneasy by the lack of a license on the original DieHard program, and in any event, found its monolithic source to be too unreadable and disorganized to be easily modified.

In contrast, the modularity and open source process associated with DieHarder has enabled what is the *real* object of the exercise to begin. For the first time, it is possible for mathematicians and statisticians to *freely experiment with, criticize and contribute to* a single RNG tester with modular test sources, a uniform call harness for tests, and the full range of statistical and numerical functions available within the GSL immediately at hand.

Instead of writing an entire program to apply a potential new RNG test against at most a handful of RNGs, one can simply add a test to DieHarder and consistently apply it seconds later to *any one of* seventy-plus (and growing!) RNGs. If one has written a new candidate RNG, instead of writing an entire program to generate a handful of short files to feed to DieHard, one can wrap the algorithm in a GSL RNG interface (examples are provided in the DieHarder interface sources) and *immediately* apply twenty-plus (and growing!) tests, with full control over all aspects of each test that can be controlled. In all cases, one can contribute successful new code (tests or RNGs or user interfaces) back to the DieHarder project and/or the GSL for others to use as well.

An Open Source project like this may never be finished, and the developers can always use help. The literature abounds with tests that have been proposed and never implemented or implemented a single time by a single graduate student for a single thesis and then forgotten. There is a real opportunity for (say) a graduate student in statistics to paticipate and learn a *lot* about RNGs and Markov Processes and RNG tests while maybe putting their name in front of every future user of DieHarder.

## Comments on "Live Random or DieHarder"

That is very interesting, You are a very professional blogger. I’ve joined your rss feed and sit up for searching for more of your excellent post. Additionally, I’ve shared your website in my social networks

You really make it seem really easy together with your presentation however I find this matter to be actually something that I think I’d by no means understand. It kind of feels too complicated and very vast for me. I am having a look forward to your next submit, I will try to get the hang of it!

Helpful info. Fortunate me I found your web site by chance, and I’m shocked why this twist of fate didn’t happened in advance! I bookmarked it.

naturally like your web site however you have to check the spelling on quite a few of your posts. A number of them are rife with spelling issues and I find it very troublesome to inform the reality nevertheless I’ll definitely come back again.