Movies are arguably the most synthetic medium for storytelling, combining music, visual design, dialogue, performance, and plot into one package. Each one of these elements might be considered a platform with which to create a narrative, but it is the combination of individual components which ultimately separates the great movies from the merely mediocre.
It is for this reason that I was excited when I saw Spotmaps (based on an idea by Brendan Dawes). Spotmaps are a way to represent the visual content of a movie as a series of colors. Each cell in a spotmap represents the mean color over 1 second’s worth of frames. One reads a spotmap sequentially, from the top left to the bottom right. Essentially, a spotmap condenses a movie to a simple progression of dominant colors.
Something about these spotmaps grabbed my attention. One of the principal advantages of this encoding of a movie is that it is computationally tractable: while a naïve analysis of a movie would have substantial difficulty in identifying characters from their backgrounds or understanding the significance of dialogue, reducing a movie to its colors compresses all the visual information into a simple matrix. I decided to use these spotmaps to analyze the movies, quantitatively.
From a spotmap to a matrix
I’ll briefly explain the means by which I turn this into numbers.
Colors can be visualized as inhabiting a three-dimensional space, with axes defined by the three primary colors (red, green, and blue). Each cell in the spotmap is turned into three numbers, representing the red, green, and blue content of that cell. So, for instance, black represents the absence of color; it lies on one corner of the color space, while white falls on the opposite corner.
Here’s an example of what I’m talking about, but scaled up, so as to represent the color content of a whole spotmap (in this case, WALL-E) in three-dimensional space:
I like this representation of the color palette, because it shows the spatial relationships between colors. Take a look at this spotmap of the excellent anime Ghost in the Shell.
There’s this spur of green jutting out from the main color axis, completely distinct from most of the other colors. Turns out, upon visual inspection of the spotmap, that the green segments are contiguous within the movie, forming islands of bright color.
But it gets better. Noticing this segmentation in terms of the colors, I went back and re-watched the movie. Immediately, I saw an interesting pattern emerge: the very first scene in which the main character (and principal badass), Major Kusanagi, envisions the world is tinted neon green. Without delving too far into the sci-fi nerdery that is my fondness for Ghost in the Shell, Kusanagi is an android, with access to a more complex set of visual information than a human. In this case, the animators are using the color shift to drive a difference in perspective and underscore the novel way in which Kusanagi sees the world.
Ghost in the Shell is a pretty colorful film, overall. By this I mean that the average color per frame is relatively high. I went ahead and computed this number, which I call “color intensity” (but which might also be called color brightness or lightness; see appendix for details), for a few movies, as shown below (click the picture to expand to readable size):
You can see that Ghost in the Shell groups pretty well with other animated and action movies, whereas Cabin in the Woods brings up the low end of the spectrum, perhaps as expected (given that it’s a horror movie).
Another measurement I was interested in looking at in these movies is what I call the color speed (see appendix for mathematical definition). Intuitively, one can think of this as the amount by which any two sequential frames change color. Mathematically, it is computed as the distance in color space between sequential points. I reasoned that this statistic would capture some of the “pace” of a movie: that is, the speed with which scenes and camera angles change (at least insofar as those changes also change the mean color).
If you compute the color speed for a few of my favorite movies, it looks like this:
Now you might notice a few interesting trends here. One is that action movies tend to have a much faster color speed than other types of movies (outliers notwithstanding). Intuitively, this finding makes a lot of sense to me: consider the abundance of ‘splosions in action movies, which appear in spotmaps as abrupt transitions from grey background to orange/red. Look at this fight scene from Die Hard 2: notice the rapid shifts from the whiteness of the airplane to the darkness of the night, mediated by moving between cameras, as John McClane heroically beats a couple of guys up and then, per action movie cliche, destroys an airplane with a fuel trail and a lighter.
One of the biggest outliers on this list is the underrated Man on Fire. It has far and away the highest color speed of any spotmap I’ve analyzed yet. Intriguingly, this matches both my own subjective observations and the critics’ reviews. While I enjoy the movie for its kinetic pace and interesting cinematography, a large contingent of reviewers couldn’t stand its “bleariness” and “ADD-fueled insanity”. This quantitative measurement indicates that Man on Fire really was a good bit quicker than other movies, faster even than most action movies. I suppose, though, that there is no accounting for taste: whereas I saw the pace and rapid jump-cuts of the movie as reflecting the manic vengefulness of its hero, John Creasy, most critics labeled it a cheap trick (or in the words of one Rex Reed, “hyperthyroidal”).
Another statistic I tried out was related to how many different colors inhabited the palette of a given movie. More colorful movies, let’s say WALL-E, will have higher color spreads; at the far end of the spectrum, a black and white movie would have very little. In colorspace, color spread is a function of how much of the space is occupied.
Here’s what I found.
Nothing mindblowing here. Movies like Tangled and WALL-E are quite colorful, presenting interesting mixtures of blues and reds, while So I Married an Axe Murderer stakes its claim in a rather more restricted region of colorspace. One obvious fact is that animated films tend to be much more colorful than any other movies. No doubt this pattern is reflective of both differences in technical limitations as well as the intended audience. Another property worth mentioning is that this graph is qualitatively similar to that shown above, for mean color intensity. This also makes some sense: the most colorful movies also tend to have the most different kinds of colors (i.e. color spread).
So after computing these statistics for several movies, I began to appreciate that there are some substantial differences between movies of different genres. Action movies tend to be reasonably colorful, but very quick; dramas tend to be dark, and moderately paced; and finally, comedies have plenty of color, but are slow as mud. Each of these differences makes sense intuitively, although I wouldn’t necessarily have picked comedies to be so slow.
I wanted to push it further though. I wanted to know how informative the differences between genres are: to what extent can you predict a movie’s genre based SOLELY on a few simple statistics computed from its color profile? After computing the statistics for a sample of 30 movies in three different genres (action, drama, comedy), I trained a machine learning algorithm called a Support Vector Machine to identify movies’ genres based on these statistics alone (randomly separating my dataset into 25 training, 5 testing). The results are pretty neat:
The first column represents the accuracy using color intensity alone; the second represents spread alone; the third is speed alone; and the fourth is all three variables (intensity+spread+speed) together. While each individual statistic contributes a little to the accuracy, the combination of all of the variables together is able to call a movie’s genre correctly a staggering 68% of the time. Given three categories, one would expect a classification accuracy at random of only 33%. That the algorithm is able to call genre accurately with only three clumsily-defined variables speaks to the idea that the spotmaps’ colors really are representing core aspects of the movies they are drawn from.
This is a start
The extent to which these quantitative differences in color statistics, computed independently of any human viewing, can recapitulate certain core characteristics of each movie, surprises me. As I noted early on, cinema is a rich and complex art form, made of sound and speech and shape and color. Using only one aspect of this tableau, color, we can discover interesting facts about films’ pace, characterization, and genre.
Going forward, I view this as a sort of proof of concept for further analyses. There’s a great many directions to be taken, in terms of analyzing more genres, more movies, and deeper questions. I have often wondered what separates a finely crafted film from a clunker; or what particular spectrum of differences there are between movies from the 80’s and the 90’s. No doubt one can articulate subjective answers to these questions, but here represents an opportunity to gather together a bit of empirical evidence and bring it to bear on art.
Appendix: Mathematical definitions of color statistics.
For a spotmap matrix X, there are three columns R, G, B, which correspond to the intensity of each primary color (red, green, and blue). There are a number of rows equal to the number of seconds in the movie’s runtime. Example 3-second movie:
R G B
0 .5 0
1 1 1
0 0 0
(The 1st second’s color would be a forest green, like the color of the third bar in the bargraph on machine learning accuracy (above); the 2nd second’s color would be pure white; and the 3rd would be pure black.)
Every cell than thus be specified Xij, with a time in seconds (i) and a primary color (j).
For this matrix, we can compute the color intensity as the mean of all values in the matrix, i.e. the sum of all Xij / number of all the cells in the matrix. For the example spotmap, it would be (0+.5+0+1+1+1+0+0+0)/9 = .389. Values for most movies are usually in the range of .1-.25.
The spread is equal to the Euclidean distance between 1000 randomly selected pairs of rows in the matrix (think of Pythagoras’ theorem, but with three numbers instead of two). The example matrix is too small for 1000 different random selections, but I can compute it for the 1st and 3rd row: the value would be .5.
Finally, color speed is equal to the mean Euclidean distance between successive rows in the spotmap matrix. For a matrix with n rows, there are thus n-1 Euclidean distances between rows; in this case, I would compute the speed as the average of (the distance between row 1 and row 2 [which is 1.5], and between row 2 and row 3 [1.73]). The color speed of this matrix would thus be (1.5+1.73)/2 = 1.62. Typical values of color speed are between .02 and .04.