Random procedural generation

11 Feb 2021

Randomly generating procedural content

Procedurally generated content is often seen in games, but also in movies, simulations and design. Generating content procedurally means letting algorithms produce content, and introducing randomness into those algorithms results in unpredictable or unique content. When randomness is used, it is important to use it in a controlled manner. This article goes into some techniques that can be used to randomize content properly, and interactive plots are used to demonstrate the methods.

The challenge

When randomly generating content, it can be challenging to not make things too random. Consider the example of generating random colors for an environment; when colors are completely random, things will not look pretty. The same goes for placing content, there must be boundaries to prevent the randomizer from generating things that make no sense.

Random versus weighed sampling
Figure 1: A weighed distribution of plant species on the left, versus a completely random distribution of plant species on the right.

To illustrate some of the challenges of procedurally generating content, this article goes into several techniques that I've used to generate content for the game Koi Farm; everything in this game is procedurally generated, so it serves as a good example.

Randomizers

The basis of randomly generated content is a randomizer, or more specifically, a pseudorandom number generator, since true random does not typically exist within the realm of computer programming. A pseudorandom number generator generates numbers that are indistinguishable from random numbers to a human, which is good enough for most applications.

Randomizers differ in quality, meaning that some randomizers generate sets of numbers that are less predictable than others. For procedurally generating game content, high quality randomizers are typically not needed, but a randomizer that determines the winners of a lottery for example shouldn't be too predictable to prevent abuse. An example of a very good (unpredictable) randomizer is the Mersenne Twister; an example of a low quality but very fast randomizer is a linear congruential generator.

Controls
Function$y=x^n$

A practical example: frequency

A practical example of using random numbers is to model occurrence frequency. Consider a video game where items can be found; some are very common, others a bit rarer, and some are very hard to find. A randomizer can be used to determine which item should appear in a chest, taking the occurrence frequency of items into account.

The graph on the right visualizes a way to do this:

  • The horizontal axis is the output of a randomizer. When the randomizer runs, it picks any linearly distributed point on the horizontal axis.
  • The vertical axis is divided into segments, where each segment stands for a certain item.
  • A line is plotted through the graph. When "rolling the dice", the $y$ value of the function falls inside one of the vertical categories. The item for that category is then created.

This function is simply $y=x^n$, where the exponent $n$ can be changed to change the occurrence frequency distribution. Increasing $n$ makes the common items more common, and the rare ones rarer. The plot shows the chance a certain category is chosen as a percentage.

Figure 1 shows another application of occurrence frequency: three types of plants exists (grass, cattails and leafy plants), the right half of the scene distributes those plants randomly. The left half of the image changes the occurrence frequency of the plants, making grass the most common plant by far. Making the more complex plants rare makes the scene much more interesting and logical to look at.

Controls
Function$y=\begin{cases}{\frac{1}{2}(2x)^n}&\text{for }x<\frac{1}{2}\\{1-\frac{1}{2}(2-2x)^n}&\text{otherwise}\end{cases}$

Mixing

Another use case is mixing things to create something new. I've used this method for determining the outcome of crossbreeding koi in Koi Farm: when two fish breed, offspring in most cases looks like either the mother or the father. In rarer cases though, a mutation occurs that looks like something in between. This sigmoid function shows the distribution of probabilities in this case. When a random number is plotted on the horizontal axis, the $y$ value falls in the first or last category in most cases, and there is a small chance it falls in any of the categories between them.

Controls
Function$y=(4(x-\frac{1}{2})^3*(1-sin(\pi x))^n+\frac{1}{2})^{\log_{0.5}p}$

Plateau shaped distributions

Another interesting shape for distributions is a plateau shaped function. In this case, most $y$ values lie on a preset plateau value, and rarer $y$ values are below or above the plateau. The interactive example shows such a function; the $p$ parameter determines the height of the plateau, while the $n$ parameter can be increased to make the plateau area wider.

A fitting application for this function is the distribution of objects in a procedurally generated forest, where every category on the $y$ axis stands for a type of plant, and larger objects have a larger $y$ value. The biggest category can be grass for example. The categories under the plateau can then encode even smaller plants, like moss and small flowers, while the categories above the plateau stand for bushes and trees. Grass will be the most common plant in the environment, while several bigger and smaller plants are rarer.

The source code for these plots can be found on GitHub under the MIT license.

Two dimensional distributions

The three examples before are one dimensional, they apply a single random number to a distribution. In two dimensional applications, it is also useful to transform random values to improve their quality. The example on the right shows a distribution plot, where a number of points are randomly scattered in an environment. Randomly scattered points can be used in several ways:

Controls
  • An environment can be populated according to these points, where items are created at their positions.
  • If the approximate shape or color of an object needs to be known, but it would be too computationally expensive too go through all the available data, it can be probed at random points to make an educated guess. It is important to use a high quality distribution here to prevent reading the same point multiple times, or skipping over big areas.

The plot shows points that are placed completely random by default. This means that every time a point is added, its position is randomized without checking where other points are. This results in a very unpredictable distribution with both dense clusters and empty areas. When placing items in an environment, this is not ideal; it's better to populate the environment completely without leaving empty spaces.

Random points in a grid

The interactive example has a "Grid" option. This is an alternative way to distribute points, which leaves less open areas. This distribution is created by dividing the area up into grid cells, after which a point is placed in a random position inside each grid cell. This guarantees that there are no big empty spaces. It does however not prevent clusters from forming: if points are created in neighboring cell corners, they still cluster.

Random versus Poisson sampling
Figure 2: Plant placement according to Poisson sampling on the left, versus completely random placement on the right.

Poisson sampling

To prevent clustering while also preventing the formation of open spaces, Poisson sampling can be used. This is also an option in the interactive example. In short, Poisson sampling ensures that points are always placed at a minimum distance from each other, while every new point is never placed further away from an existing point than twice that distance.

Figure 2 shows a comparison of Poisson sampling and completely random placement of plants in Koi Farm: Poisson sampling is used on the left half of the image, while random placement is used on the right. Both halves contain the same amount of plants, but many empty spaces can be seen on the right. Poisson sampling clearly distributes the plants better than a random distribution does. In addition to better looks, better distributions can also increase the performance of games that use it for content placement. After all, more effectively placed content fills the environment more effectively, so less content is required for an environment to look full.

The source code for the spatial distributions can be found on GitHub under the MIT license.

Conclusion

Randomization can be very useful in procedural generation if the randomness is controlled. Implementing occurrence frequency for created objects and carefully defining how they are distributed can lead to great results. A nice analogy for random generation is sculpting: the randomizer is a bland block of marble, but beautiful things can be created by taking some of the randomness away and guiding the randomizer towards more interesting outcomes.