Movies are arguably the most synthetic medium for storytelling, combining music, visual design, dialogue, performance, and plot into one package. Each one of these elements might be considered a platform with which to create a narrative, but it is the combination of individual components which ultimately separates the great movies from the merely mediocre.

It is for this reason that I was excited when I saw Spotmaps (based on an idea by Brendan Dawes). Spotmaps are a way to represent the visual content of a movie as a series of colors. Each cell in a spotmap represents the mean color over 1 second’s worth of frames. One reads a spotmap sequentially, from the top left to the bottom right. Essentially, a spotmap condenses a movie to a simple progression of dominant colors.

Something about these spotmaps grabbed my attention. One of the principal advantages of this encoding of a movie is that it is computationally tractable: while a naïve analysis of a movie would have substantial difficulty in identifying characters from their backgrounds or understanding the significance of dialogue, reducing a movie to its colors compresses all the visual information into a simple matrix. I decided to use these spotmaps to analyze the movies, quantitatively.

**From a spotmap to a matrix**

I’ll briefly explain the means by which I turn this into numbers.

Colors can be visualized as inhabiting a three-dimensional space, with axes defined by the three primary colors (red, green, and blue). Each cell in the spotmap is turned into three numbers, representing the red, green, and blue content of that cell. So, for instance, black represents the absence of color; it lies on one corner of the color space, while white falls on the opposite corner.

Here’s an example of what I’m talking about, but scaled up, so as to represent the color content of a whole spotmap (in this case, *WALL-E*) in three-dimensional space:

I like this representation of the color palette, because it shows the spatial relationships between colors. Take a look at this spotmap of the excellent anime *Ghost in the Shell*.

There’s this spur of green jutting out from the main color axis, completely distinct from most of the other colors. Turns out, upon visual inspection of the spotmap, that the green segments are contiguous within the movie, forming islands of bright color.

But it gets better. Noticing this segmentation in terms of the colors, I went back and re-watched the movie. Immediately, I saw an interesting pattern emerge: the very first scene in which the main character (and principal badass), Major Kusanagi, envisions the world is tinted neon green. Without delving too far into the sci-fi nerdery that is my fondness for *Ghost in the Shell*, Kusanagi is an android, with access to a more complex set of visual information than a human. In this case, the animators are using the color shift to drive a difference in perspective and underscore the novel way in which Kusanagi sees the world.

*Ghost in the Shell* is a pretty colorful film, overall. By this I mean that the average color per frame is relatively high. I went ahead and computed this number, which I call “color intensity” (but which might also be called color brightness or lightness; see appendix for details), for a few movies, as shown below (click the picture to expand to readable size):

You can see that *Ghost in the Shell* groups pretty well with other animated and action movies, whereas *Cabin in the Woods* brings up the low end of the spectrum, perhaps as expected (given that it’s a horror movie).

**Color Speed**

Another measurement I was interested in looking at in these movies is what I call the color speed (see appendix for mathematical definition). Intuitively, one can think of this as the amount by which any two sequential frames change color. Mathematically, it is computed as the distance in color space between sequential points. I reasoned that this statistic would capture some of the “pace” of a movie: that is, the speed with which scenes and camera angles change (at least insofar as those changes also change the mean color).

If you compute the color speed for a few of my favorite movies, it looks like this:

Now you might notice a few interesting trends here. One is that action movies tend to have a much faster color speed than other types of movies (outliers notwithstanding). Intuitively, this finding makes a lot of sense to me: consider the abundance of ‘splosions in action movies, which appear in spotmaps as abrupt transitions from grey background to orange/red. Look at this fight scene from *Die Hard 2*: notice the rapid shifts from the whiteness of the airplane to the darkness of the night, mediated by moving between cameras, as John McClane heroically beats a couple of guys up and then, per action movie cliche, destroys an airplane with a fuel trail and a lighter.

One of the biggest outliers on this list is the underrated *Man on Fire*. It has far and away the highest color speed of any spotmap I’ve analyzed yet. Intriguingly, this matches both my own subjective observations and the critics’ reviews. While I enjoy the movie for its kinetic pace and interesting cinematography, a large contingent of reviewers couldn’t stand its “bleariness” and “ADD-fueled insanity”. This quantitative measurement indicates that *Man on Fire* really was a good bit quicker than other movies, faster even than most action movies. I suppose, though, that there is no accounting for taste: whereas I saw the pace and rapid jump-cuts of the movie as reflecting the manic vengefulness of its hero, John Creasy, most critics labeled it a cheap trick (or in the words of one Rex Reed, “hyperthyroidal”).

**Color spread**

Another statistic I tried out was related to how many different colors inhabited the palette of a given movie. More colorful movies, let’s say *WALL-E*, will have higher color spreads; at the far end of the spectrum, a black and white movie would have very little. In colorspace, color spread is a function of how much of the space is occupied.

Here’s what I found.

Nothing mindblowing here. Movies like *Tangled* and *WALL-E* are quite colorful, presenting interesting mixtures of blues and reds, while *So I Married an Axe Murderer* stakes its claim in a rather more restricted region of colorspace. One obvious fact is that animated films tend to be much more colorful than any other movies. No doubt this pattern is reflective of both differences in technical limitations as well as the intended audience. Another property worth mentioning is that this graph is qualitatively similar to that shown above, for mean color intensity. This also makes some sense: the most colorful movies also tend to have the most different kinds of colors (i.e. color spread).

**Machines alone**

So after computing these statistics for several movies, I began to appreciate that there are some substantial differences between movies of different genres. Action movies tend to be reasonably colorful, but very quick; dramas tend to be dark, and moderately paced; and finally, comedies have plenty of color, but are slow as mud. Each of these differences makes sense intuitively, although I wouldn’t necessarily have picked comedies to be so slow.

I wanted to push it further though. I wanted to know how informative the differences between genres are: to what extent can you predict a movie’s genre based SOLELY on a few simple statistics computed from its color profile? After computing the statistics for a sample of 30 movies in three different genres (action, drama, comedy), I trained a machine learning algorithm called a Support Vector Machine to identify movies’ genres based on these statistics alone (randomly separating my dataset into 25 training, 5 testing). The results are pretty neat:

The first column represents the accuracy using color intensity alone; the second represents spread alone; the third is speed alone; and the fourth is all three variables (intensity+spread+speed) together. While each individual statistic contributes a little to the accuracy, the combination of all of the variables together is able to call a movie’s genre correctly a staggering 68% of the time. Given three categories, one would expect a classification accuracy at random of only 33%. That the algorithm is able to call genre accurately with only three clumsily-defined variables speaks to the idea that the spotmaps’ colors really are representing core aspects of the movies they are drawn from.

**This is a start**

The extent to which these quantitative differences in color statistics, computed independently of any human viewing, can recapitulate certain core characteristics of each movie, surprises me. As I noted early on, cinema is a rich and complex art form, made of sound and speech and shape and color. Using only one aspect of this tableau, color, we can discover interesting facts about films’ pace, characterization, and genre.

Going forward, I view this as a sort of proof of concept for further analyses. There’s a great many directions to be taken, in terms of analyzing more genres, more movies, and deeper questions. I have often wondered what separates a finely crafted film from a clunker; or what particular spectrum of differences there are between movies from the 80’s and the 90’s. No doubt one can articulate subjective answers to these questions, but here represents an opportunity to gather together a bit of empirical evidence and bring it to bear on art.

Appendix: Mathematical definitions of color statistics.

For a spotmap matrix X, there are three columns R, G, B, which correspond to the intensity of each primary color (red, green, and blue). There are a number of rows equal to the number of seconds in the movie’s runtime. Example 3-second movie:

R G B

0 .5 0

1 1 1

0 0 0

(The 1st second’s color would be a forest green, like the color of the third bar in the bargraph on machine learning accuracy (above); the 2nd second’s color would be pure white; and the 3rd would be pure black.)

Every cell than thus be specified Xij, with a time in seconds (i) and a primary color (j).

For this matrix, we can compute the color intensity as the mean of all values in the matrix, i.e. the sum of all Xij / number of all the cells in the matrix. For the example spotmap, it would be (0+.5+0+1+1+1+0+0+0)/9 = .389. Values for most movies are usually in the range of .1-.25.

The spread is equal to the Euclidean distance between 1000 randomly selected pairs of rows in the matrix (think of Pythagoras’ theorem, but with three numbers instead of two). The example matrix is too small for 1000 different random selections, but I can compute it for the 1st and 3rd row: the value would be .5.

Finally, color speed is equal to the mean Euclidean distance between successive rows in the spotmap matrix. For a matrix with n rows, there are thus n-1 Euclidean distances between rows; in this case, I would compute the speed as the average of (the distance between row 1 and row 2 [which is 1.5], and between row 2 and row 3 [1.73]). The color speed of this matrix would thus be (1.5+1.73)/2 = 1.62. Typical values of color speed are between .02 and .04.

Hey Rob — enjoyed the post. I wanted to make a few comments and suggestions, because this is actually quite a good idea. There are some similarities to a lot of modern research in spectroscopy, since statistics on a color space is not all that much different from any sort of analysis on any sort of spectral space. Of course, in quantum mechanics, the spectrum is quantized and defined on an infinite-dimensional Hilbert space. Analyzing colourspace is clearly a continuous technique, at least within the region of Euclidian 3-space is defined for colors (I guess the 3-sphere at the origin bounded by (0,0,255) (0,255,0) and (255,0,0)). But of course all we have to do in a quantized spectrum is let the spacing between the quantized levels go to zero, and then we’re in exactly the same regime.

Anyway, I digress. There’s this concept in spectroscopy where we can correlate a spectrum with itself, basically take two spectra of the form (E, I(E)) where E is energy (or wavelength, color, whatever) and I(E) is the intensity of the spectral feature at energy E, and correlate them in a time dependent fashion and see how a given point (E,I(E)) correlates with another point (E’,I(E’)). In spectroscopy, we would excite a system to obtain a spectrum S and then, before the system reaches thermal equilibrium (and the spectrum “goes away”) we excite again in a specific way to form a new spectrum S’ — then we can make a matrix with elements (E,E’) that have values that are somehow proportional to the Pearson correlation between E and E’, I guess we’ll call it corr(E,E’). How this is done is complicated, and the actual time-dependent probability flow from (E,I(E)) to (E’,I'(E)) is rather chaotic in a real system — though there are ways we can represent this flow as a time-dependent linear vector on the unit sphere.

Anyway, this is basically just saying Pearson correlation matrices are useful. And of course, this is a square matrix (in the case of color, we can basically say the indices run along ROYGBV or whatever color basis set we please). And with sufficiently high tolerance, it is likely fairly sparse — there will be a number of colors that will be uncorrellated with other colors.

So let’s consider this algorithm:

1) Let’s define our data set (in this case, a movie), as a set S containingi x MxN matrices, where x is the number of steps you want to take to sample the colorspace of the film over time, and MxN is image size (basically the value of element (x_i)_m,n = (R,G,B), so the RGB matrix of a given pixel in the frame).

2) Now, (1) generates a TON of data, so let’s break it down a bit. Let’s instead stochastically choose a small sample of pixels in the ith matrix in S, and record their RGB values. I’m not sure how small this sample should be to be statistically significant, but I’m sure there’s an easy iterative way to find out! Let’s say we choose J pixels randomly.

3) Let’s define our tolerance so we have, say, N different colors in our RGB basis, so the total correlation matrix will be NxN. Therefore, for each one of the sampled pixels in the (i)th matrix, we can compute which one of the N colors in the basis this pixel is.

4) Now, let’s pick two colors out of the N colors in the basis, we’ll call it (X,Y). Now calculate this: if X is present in a given frame S_i’s sample of J pixels how often do we see Y? Well, we can form a data set with S rows that has the values (x,y), where x is the “intensity” of color X in a given frame S_i, and y is the “intensity” of color Y in a given frame S_j (basically we can calculate the intensity by summing up the number of pixels in the sample at of a given frame that have this color). Then we can just calculate the Pearson correlation coefficient for the complete set of (X,Y), which runs from -1 to 1.

5) In our output correlation matrix, at element (X,Y), we put the calculated Pearson coefficient as the value, and we do this for all the ordered pairs in the basis.

We can predict a few things: the correlation ordered pairs (X,X) = 1, and areas where there is correlation near 1 or -1 implies these colors are often seen together in the film.

Now here’s the open question: What does this imply? Do we see trends in color correlation in given genres? Or does it say something more interesting about the MOOD or TONE of the film (this could definitely be correlated with genre, too!)

We could easily gauge the kinetics of a film by plotting the correlation over time for a given (X,Y) from frame to frame. This can likely be done using a density matrix approach. Note that since we are quantizing our color theory by choosing a limited basis of N (hence truncating an infinite color space matrix into an effective finite-dimensional one), there’s a lot of simple techniques with this kind of formalism (or some kind of decomposition of film’s color flow as a time dependent problem into a frequency/color-dependent problem with the use of Fourier analysis). I am not any kind of expert, but these are the kinds of things that often appear in spectroscopic analysis, and I think they apply here too in useful fashion.

Well it took me a few re-reads, but I think I’ve got the idea now.

To clarify, it involves reading the whole color content of a movie’s frames, as opposed to the average color (as it is now), which is an awesome idea. In your scheme, you would randomly sample pixels from each frame. I think you’d get more mileage out of a non-random scheme though. For instance, imagine you can only take 4 pixel’s colors per second, rather than randomly choosing those 4, take the central pixel from each quadrant of the picture. In this way you could gather additional information about how colors are apportioned in the space of each frame (I’m thinking here of like Roger Deakins’ cinematography in True Grit [http://www.youtube.com/watch?v=KPY6FAnnALE], which emphasizes these horizontally divided shots wherein foreground and background are very different colors. That would be detectable with fixed sampling).

With regards to the idea of this matrix of color correlations, that’s neat. I could imagine horror movies having pairings of red and black, for instance. It would also be really neat to look at temporal recurrence of sets of colors: it could provide an easy way to read “settings” and scene changes from the color data alone (which is potentially doable from the data I have now, as well, but probably less accurately).