A Mental Model for Video

Having an understanding of the underlying principles of how video works can help be able to troubleshoot issues, have better control over the look of video you are shooting and help achieve the results you are looking for. Video, in the broadest technical sense, is a long list of numbers that change over time. To change any aspect of a video is done by a number of mathematical operations to that long list of numbers. This article will attempt to break down this idea into some reasonable components.

Video can be thought of as a grid of pixels. The grid of pixels is often measured by the height and width of the grid. People call this resolution. The more pixels, the higher the resolution. Commonly, video is presented in a 16x9 aspect ratio. Meaning, for every 16 horizontal pixels, there are 9 vertical pixels. More and more, we are exposed to different aspect ratios as video on mobile phones isn't constrained to specific resolutions. A single set of pixels is a frame and there are many frames in a video.

A grid of pixels

These pixels are made up of numbers that represent the color of the pixel. In black and white video, there is a single number that represents how bright the pixel is. In color video, there are three numbers that, when combined together, create a color. That color can be thought of as having two properties, its luminance and its chrominance. The luminance describes how bright the pixel is, the chrominance defines what and how much color the pixel has.

Those pixels, if thought of as numbers, are generally values between 0 and 255. This means that in black and white video, each pixel has 256 (zero is considered a value) possible brightness levels. Depending on the color space that the video is represented by, the actual number of values available to represent a pixel varies. Some color spaces use only values 16 to 235, others, use the whole range, 0 through 255.

255 may seem arbitrary if you don't understand how computers represent numbers. Computers use series of bits to represent numbers. A bit, as you likely know, is a single value, either a 1 or a 0. 1 generally refers to the bit being "on" and 0 generally refers to the bit being "off". The more bits that are used to represent a number, the larger the number that is able to be represented. This article won't dive into the base 2 number system or binary as there is plenty written about that elsewhere on the internet.

Most video systems today use 8 bit numbers to make up the values of the pixels. 2 to the power of 8 is 256. We are starting to see 10 bit systems emerge, but these tend to be on the acquisition side and not the delivery side.

Pixels and their values

To understand how more bits makes for higher quality video, consider a gradient, going from black to white. In the real world, there would be an infinite number of different shades of grey in between black and white. In the digital world, we can't represent all of these shades. We have to sample the gradient at regular intervals to represent the color at that point. Effectively, we have to break that infinite gradient down into steps.

Using more bits to represent lumanince

Color video is a bit more complex to think about. While in black and white, you can think of a pixel as being represented by a single number, color video often has 3 numbers to represent a single pixel. This is a simplified way of thinking about a color pixel, but will suffice for now.

With 8 bit video, you now have three 8 bit numbers that represent a pixel. To represent a pixel, you need 24 bits. In RGB (red, green, blue) video, there is a number that represents how much red is in the pixel, how much green and how much blue. When combined together, these three components create a color.

How three numbers combine to make a pixel, the center of the overlapping area is the resulting color. Note that the color may render differently on your device.

In this case, we have the ability to represent 16,777,216 colors with three 8 bit numbers. If we go up to 10 bit numbers, we now have the ability to represent 1,073,741,824 colors. That is 64 times the number of colors in a 10 bit system versus an 8 bit system. The human vision system is estimated to be able to distinguish between 10,000,000 and 100,000,000 different colors. There isn't much need to need more than 10 bit video systems.

Like with luminance, chrominance has to be broken down into individual steps. We can't represent every shade of color inbetween two shades.

A higher bit depth for color removes gradient artifacts

Video, of course, is not static. It changes over time. Each frame of video is made up of one of these grids of pixels. The number of frames per second of video is called various things, like refresh rate, frames per second, hertz, etc. There are a number of different frame rates for different purposes. Broadcast video is either 60 frames per second in the United States, or 50 frames per second in the rest of the world. The frame rate of broadcast video generally matches the number of hertz of the power system in the country.

Different frame rates look and feel much different. Film was commonly shot at 24 frames per second, which is now the standard for cinematic video. Broadcast systems are over double that, at 50 or 60 frames per second. Most phones have a 60 hertz screen. Gaming computers often have refresh rates of over 120 hertz.

Below is an example of the difference between 15, 30 and 60 frame per second motion. You can see that the higher the frame rate, the smoother the motion. Higher refresh rates are not always better, however. Films are still shot at 24 frames per second as the content takes on a certain desirable quality. A large portion of the feeling of video is based on the refresh rate.

The first waves are refreshing from top to bottom at 15, 30 and 60 frames respectively.

It's worth nothing though that faster refresh rates are not always better. The overall look and feel of video is heavily influenced by the frame rate. Movies have their distinct look because they are shot at 24 frames per second. Soap Operas have their distinct look because they are shot at 60 hertz. Like the way color is represented in video, refresh rate plays a big role in the overall style of the video.

While this article is limited in its scope, video is a complex and often difficult to consider topic. Ultimately, video is a representation of light by way of numbers. Knowing how these numbers are arranged and what they make up is valuable in being able to work efficiently with video.