Creating a Pseudorandom Number Generator: A Step-by-Step Guide
Written on
Chapter 1 Understanding Pseudorandom Number Generation
In a previous article titled "How to Derive Randomness from an Unbalanced Source," I discussed techniques to extract randomness from various origins. Now, we will focus on constructing a pseudorandom number generator (PRNG) from the ground up. But first, why is randomness essential?
Random numbers are not significant in isolation; rather, their value lies in their generation method. Modern systems—ranging from communication protocols to cryptography and gaming—depend heavily on these numbers for security and unpredictability.
Imagine a game where a virtual coin is flipped, and players wager real money on the outcome. If the coin is genuinely fair, players can trust the game. However, if players can predict the results with high accuracy based on previous flips, the integrity of the game is compromised.
Natural Randomness Sources
While nature provides some randomness sources, utilizing them can be prohibitively expensive. Macroscopic randomness can stem from climate fluctuations or cosmic microwave background variations, but acquiring a system to measure these changes can cost hundreds of millions of dollars.
On a microscopic level, the chaotic nature of particles offers randomness, yet managing such sources is equally costly and complex. Thus, it is often more feasible to create a PRNG on our devices—computers, smartphones, etc. Having a compact method to generate random sequences is beneficial, as external factors could manipulate or replicate any randomness derived from environmental conditions.
A computer operates through a CPU that runs instructions, which are inherently deterministic. This raises the question: how can we generate randomness in an environment devoid of it?
Defining a PRNG
To answer this, we must first define what a pseudorandom number generator is. A PRNG, when initialized with a seed, generates a sequence of bits that is statistically indistinguishable from a sequence produced by a true random source.
Indistinguishable means that no polynomial-time algorithm can determine whether a given sequence was generated randomly or calculated. Therefore, a PRNG is essentially an algorithm executed in polynomial time on a deterministic Turing machine, producing a function G where:
- l is a monotonically increasing function, ensuring the output length exceeds the input (seed).
The theory behind PRNGs suggests that the likelihood of distinguishing between a real random source and a PRNG diminishes as the seed length increases. Thus, a PRNG takes a seed as input and outputs a longer string, making it challenging to discern its origin.
Constructing the G Function
To create an effective PRNG, we don't need to define a different function G for each possible length. Instead, we can start with a simple PRNG, H, and build any required PRNG in the form of G. By iterating the function H multiple times, we can generate a longer bit sequence effectively.
The H function should be a one-way permutation, making it hard to reverse-engineer. A widely recognized one-way permutation is modular exponentiation. Given a prime number p and an integer x, finding such x involves solving the discrete logarithm problem—an unsolved computational challenge, which forms the basis of many public-key cryptography systems.
The Last Bit
We’ve established a function that takes k bits and outputs k+1 bits, where the extra bit serves as a hardcore bit for a function f. This hardcore bit is easily computable if x is known but challenging to derive if only f(x) is available.
By defining a new operation between two k-bit strings, we can derive a hardcore bit for the g function. Thus, our PRNG H is formally defined with the ability to generate indistinguishable sequences.
Building the PRNG
Now that we have the H and G functions defined, we can construct a reliable PRNG. The main function G will output a longer bit string, and the length of the initial seed must be even to facilitate the division into two halves.
Keep in mind that a longer seed produces a less predictable sequence of bits.
A Python Implementation
To simplify bit manipulation, the PRNG can be implemented in Python. This version operates on strings for better readability but can be optimized for bit operations.
Using modular exponentiation as the one-way permutation and defining l(k) as a polynomial function allows us to generate output sequences.
The only input required is the initial seed, which must be a binary string of limited length. While larger seeds can enhance output quality, it’s vital to redefine the polynomial function l(·) and select a suitable generator.
This first video, "Pseudo-Random Number Generator From Scratch in Python," provides a detailed walkthrough of constructing a PRNG in Python.
The second video, "Build a Pseudo Random Number Generator that Follows a Specific Sequence in Ruby," explores creating a PRNG with specific output patterns.