Movies into Data

A new pipeline

In previous entries, I discussed the idea of analyzing color data from movies in order to get some insight into how the content of movies is reflected in their color schemes.  I found that movies are relatively easy to distinguish by genre, and that if movies are any guide, the future will be electric blue.  It was all good fun.

But then the data ran out.  Spotmaps ceased to be made, and despite my best efforts to contact the party or parties responsible for their production in order to demand more, no more Spotmaps were forthcoming.  So I had to make my own data.


Building a better spotmap

I took the opportunity, in creating new Spotmaps, to attempt to improve the previous formula.  The original idea of a Spotmap was that each cell in a spotmap captured the mean color of an entire frame.  Frames constitute the whole image, and were captured about once per second, so there were on average about 6000 or more separate colors per movie.

One of the problems I noticed with this formulation, however, was that colors tended to be washed-out, rather pastel mixtures of all of the colors in a frame, rather than representing any single color within the frame.  Consider an example here:

A frame from a movie.  The upper left boxed color is the mean coloration of the whole frame.
A frame from a movie. The upper left boxed color is the mean coloration of the whole frame (except the black bars above and below, which are removed prior to any processing).

This is one frame of a movie, and in the upper left is the average color of said frame: but note that despite the tremendous variation of colors in the frame, the “mean color” isn’t actually found anywhere in the picture.  That’s because it represents a weighted combination of all of the colors, instead of any particular color within the frame.  Furthermore, coding the whole frame as a single color doesn’t allow me to explore how color is distributed throughout the screen, that is, the composition of a frame (I’ll come back to this momentarily).

So, when I made my spotmap capturing software, I programmed it to collect 144 different colors per frame, representing 144 regions (16 * 9) laid on the image in a grid.  I call each region a “pseudopixel” (but I’m open to better suggestions), because it is effectively a combination of all of the pixels in a region.  Here’s an image with some of the boxes drawn in (unmodified image is here):

Red lines represent the demarcations between pseudopixels.
Red lines represent the demarcations between pseudopixels.

Ideally, it would be nice to capture every single pixel and analyze all pixels simultaneously, but the math quickly becomes dizzying when you consider how much information that really is (a standard 1920 * 1080 HD movie frame contains a staggering 2 million pixels, each of which has three R, G, B color values).  When I actually collect the data, then, the pseudopixels looks like this:

A pseudo-pixel image.  Note that this is from a 30 pixel square capture, whereas the real algorithm captures rectangles 120*90.
A pseudopixel image. Note that this is from a 30 pixel square capture, whereas the real algorithm captures rectangles 120*90.

Recognize the image?  I’m cheating a little here, because the above image is actually capturing more pseudopixels per frame than I want to analyze, but I include this picture as a proof of principle to show that my scripts work.



I took my new spotmap creating pipeline for a spin on three different movies:  The Dark Knight; Samsara; and Ratatouille.  These were chosen precisely because they are quality representatives of three wholly distinct genres (and also, movies I watched recently): The Dark Knight, a dark and brooding action film; Samsara, a visually-astounding documentary; and Ratatouille, a Pixar-animated comedy.  I can’t claim that these three movies form any sort of robust sample, or that they cover the wide breadth of possible movies, but they are quite diverse nonetheless.  They’ll have to suffice for a test run.

A little background on each movie: The Dark Knight is the best Batman movie ever made, and if you haven’t heard of it, you have probably been trapped underneath Antartic ice for the last 10 years or something.  Ratatouille is one of my favorite Pixar movies, and it’s about a talking rat named Remy who wants to transcend his humble, murine beginnings to become the greatest chef in France.  Finally, Samsara is this amazing documentary about life on Earth.  It has no words, just images, and it is incredibly beautiful.  It was made by the guy who did Koyanisqatsi and Baraka, so if you’re into that sort of thing, and you can stand a movie with no particular narrative, I would highly recommend it.


Recapitulating the old

Using the data collected in the algorithm, I can build a spotmap (in this case, of Ratatouille):

Each square represents the color of frame; time goes from left to right..
Each square represents the mean color of a single frame; time goes from left to right..

Unfortunately, the original website didn’t collect data on these particular movies, so I have no point of comparison to prove that my algorithm works as well as theirs.  I’ll pick some movies with original spotmaps to compare to in the future, however.

I can also look at the three core characteristics I examined in my original post: color lightness (formerly intensity), spread, and speed.


  • Dark Knight: 0.1159368
  • Samsara: .259
  • Ratatouille: .203


  • Dark Knight: .28617
  • Samsara: 0.4436807
  • Ratatouille: .3743


  • Dark Knight: 26.5
  • Samsara : 19.7
  • Ratatouille: 38.9

These scores mostly confirm my intuition.  Summarizing: the Dark Knight is, well, a rather dark and gritty movie, so low color lightness.  On the other hand, it’s an action movie with high color speed, including the single quickest frame transition I’ve ever observed (spoiler alert: it’s an explosion.  It’s about as extreme as any frame transition can be.).  Ratatouille has the highest average color speed, and is relatively bright, as befits a kid’s movie.  Samsara is the slowest of the trio, but shows the greatest color spread (unsurprisingly).  Thus, the new pipeline largely recapitulates the results of the old analysis so far, but there’s more that can be done by sampling multiple colors per frame.


New modes of analysis

The simplest new analysis to be done is to compute the color speed on a per-pseudopixel basis, so that’s what I did.  Results are displayed as heatmaps.


Heatmap of Samsara's color speed.  Each cell represents a pseudopixel, wherein the color scales with the mean color speed in that pseudopixel.  Hotter colors means that colors change faster in that region of the frame.
Heatmap of Samsara’s color speed. Each cell represents a pseudopixel, wherein the color scales with the mean color speed in that pseudopixel. Hotter colors means that colors change faster in that region of the frame.
Same as above, but for the Dark Knight.
Same as above, but for Ratatouille.
Same as above, but for Ratatouille.

For color speed, there’s three distinct patterns for the three movies: Samsara spreads the color speed around in a sort of u-shape at the bottom of the frame; the Dark Knight clusters the color changes at the top of the frame; and Ratatouille keeps it towards the bottom, but with a seemingly more even distribution.  Notably, all of the movies seem to have more of the action (color speed) happening in the center of the frame, as opposed to the corners.  That makes sense.


Composition Effect

One last thing I tried examining was whether there’s any evidence for composition, in the visual artistic sense.  Consider this beautiful frame from Samsara:

Still frame from Samsara.  Temples in Bagan, Burma.
Still frame from Samsara. Temples in Bagan, Burma.

Notice how the darkness of the forest gives way to jutting crimson and gold peaks of the temples, which in turn give way to grey-blue and white of the mountains and then the sky.  I would argue that the image is beautiful at least in part because of how the brightness of the image varies with the height, so that one’s eye is drawn from the forest upwards through the temples and to the sky.  To get all philosophical on this pretty picture, I think there’s a deep metaphor here relating to how mankind seeks divinity out of nature, but let’s set that aside for a moment.  This image was obviously composed, that is, set up so that the brightness increased with the y-axis.  Do cinematographers prefer this sort of composition?  Using my pseudopixels, I asked whether color lightness of each pseudopixel tended to correlate with the height along the y-axis for 500 random frames of Ratatouille.  The same idea was also applied to the x-axis, that is, do colors increase in brightness moving left-to-right across an image?  I compared that data to an empirical null, namely a set of completely random pseudopixels drawn from the movie.

Histogram of composition indices.  Red is y-axis composition, green is x-axis composition, and blue is an empirical null distribution.
Histogram of composition indices. Red is y-axis composition, green is x-axis composition, and blue is an empirical null distribution.

We see that there is strong evidence that the cinematographers (or, actually, since it’s Ratatouille, animators) chose to structure their frames so that color brightness either increased or decreased with the y- and/or x-axis.  There are many more extreme values of correlation, both positive and negative, in the real data than in the permuted null.  There’s much more analysis here to be done, but it’s good to note that there is evidence of composition in these movies.



Can we draw any grand conclusions about genre and color and motion and composition?  Unfortunately, not yet, but rest assured that as I digitize more movies, such questions will become answerable.  Building a better pipeline for movie digitization ought to pay dividends in the long run, as I will soon cease to be data-limited.  Combined with the many inquiries relating to frame color composition which I can now investigate, I have a solid foundation to launch a more certain quantitative analysis of movie colors.




Have any suggestions for further analyses?  Want to take a look at my code?  Contact me below.

The Future Is Electric Blue

… and other insights from the color spectrum of science fiction movies

Science fiction has never been precisely defined.  It can encompass all manner of settings (past, present, and future) and can accommodate any genre (from romance to neo-noir).  It can range from the dorkiest of hard sci-fi space opera, in which every plot turn is dictated by imagined, yet rigorous, physics, to an otherwise classic story with a single, technological plot element.

Despite the diversity, science fiction is a well-defined category in practice: like obscenity and ducks, we know it when we see it.  It stands to reason that science fiction might be defined by a certain, diffuse set of attributes, perhaps none of which is sufficient to identify a work of sci-fi on its own, but which, together, drive an overall perception of sci-fi-ness.  I sought to look for a few of these attributes using my movie dataset.  As before, I was looking for certain aspects of the color spectrum in science fiction movies which distinguish those movies from non-science fiction.  Geared up for what I thought would be an arduous and difficult attempt at data-mining, I was promptly confronted with a rather obvious set of differences between sci-fi and non-sci-fi.

Science fiction can be identified by its color spectrum

Using a similar approach to that employed in my last post, I endeavored to classify movies using a machine learning algorithm according to their sci-fi status.  The dataset itself consists of mean color profiles for each of 79 movies (after filtering out the black & white movies, and those for which I could not determine the genre).  Briefly, I encoded each movie as being science fiction or not, and then computed a set of  summary statistics on the color spectrum in each movie.  Importantly, sci-fi and non-sci-fi movies did not differ in their distributions of genre (p = .12).  After running them through a repeated, randomized, training/testing cycle, I began to get an appreciation for just how accurately science fiction could be identified.

Science fiction can be accurately classified with only two variables.
Science fiction can be accurately classified with only two variables.

Surprisingly, it’s quite easy to achieve greater accuracy than chance (50%; the green line), and using only two variables: average hue, and the variance in hue.  I’ll come back and explain what those variables mean in real, visual terms, but I want to note that it is possible to achieve ~70-75% accuracy in identifying science fiction movies from non-science fiction movies.  This fact suggests that there really are some properties which distinguish the color spectrum of sci-fi films.

Science fiction has lower average hue, more hue variance

The two variables which so accurately classify science fiction are both related to hue, and hue is a weird concept.  It’s basically a compound measurement of color which doesn’t take into account the brightness of a color, only its relative red/green/blue-ness.  This is better pictured than explained, so here’s 1000 random colors, plotted according to their hues (on the y-axis):

Hue is a compound metric of color.
Hue is a compound metric of color.

You can see that blue falls towards the low end of hue (towards -2), red at the top (+2), and green somewhere in the middle.  Various concoctions derived from mixtures of these colors occur in bands between the primary colors. Note that colors of a given hue can be bright or dark or anywhere in between.

Science fiction movies are characterized by two factors: they have lower average hue, so they tend to be more blue/green than red; and greater variance in hue in the same movie (so more different kinds of colors).  Both of these differences are statistically significant in my dataset, if you’re into that sort of thing.  Consider this plot of average hue versus hue variance, where science fiction movies are blue and everything else is red:

Average hue and variance in hue accurately discriminates between science fiction and non-science fiction.
Average hue and variance in hue accurately discriminates between science fiction and non-science fiction.

One can clearly see that sci-fi movies tend towards the left side of the plot, characterized by bluer and greener average hues, and are also higher in the plot, connoting more variance in hues.  This plot provides a powerful insight into how the classifier algorithm works: if one were to simply draw a line at an average hue of -0.5, classifying those to the left of the line as science fiction, and those to the right as non-science fiction, one would probably achieve greater-than-random classification accuracy.

It’s also worth considering some of the science fiction movies which don’t follow the expected pattern.  I have labelled a few in the plot, including Buckaroo Banzai (labeled BB8), the original Star Trek movie, and Soylent Green.  Each of these is noteworthy because they are a bit older than most of the movies in my dataset.  It stands to reason that the signature of science fiction might change over time, leading to a classifier which inaccurately identified them.  The idea of changes in color patterns over decades is definitely worth investigating, and something I plan to do more of as the dataset expands; but for now it will have to stand as only a hypothesis.

The last movie I want to draw your attention to is eXistenZ, which is not very old (1999), and yet still out of character for a science fiction movie.  Why?

The Cronenberg Effect

eXistenZ and another movie, Scanners (1981), are interesting because they are both frequently misclassified by the algorithm.  In fact, these movies are so atypical that they shift the entire machine learning algorithm towards lower accuracy: examine the 4th column in the first graphic–note that simply by removing those 2 movies, constituting only 2.5% of the dataset, I am able to increase the algorithm’s accuracy by ~5%.  In addition to their atypical nature, they share another fascinating commonality.

Both movies were directed by David Cronenberg, a critically acclaimed but commercially unsuccessful director.  (I have seen relatively little of his work–neither eXistenZ nor Scanners–but I thought his movie Eastern Promises was fantastic).  Incidentally, Scanners is not only an atypical science fiction movie, it’s an atypical horror movie as well.  While most horror movies are dark and relatively plodding in pace, Scanners is bright and quick paced, more like an action movie.  Therefore in both cases we see Cronenberg rejecting genre norms of color palette to pursue atypical visual styles in his movies–an idea I shall term the Cronenberg effect.

I speculate that it may be this blatant disregard for visual style morés which make Cronenberg’s films both critical darlings and commercial bombs.  I can only offer a little pop psychology to back this idea up: movie-goers expecting to see a certain kind of movie will be put off when they are confronted with a color palette not befitting that genre, while critics may consider the juxtaposition of classic horror with non-genre standard colors as refreshing.  With only two movies of this nature in my dataset, it is too early to make a rigorous test of this hypothesis, but it’s something I look forward to pursuing as I gather more films.

The future is electric blue

All of this business about average hue and hue variance is fine, but I wanted to pin down something more concrete and visually obvious that separates sci-fi and non-sci-fi.  In order to do this, I set about looking for specific colors that are enriched in the science fiction movies relative to the non-science fiction.  I reasoned that these colors might function as effective markers of science fiction: somewhat subliminal hints to the audience of the kind of movie they are watching.  Without describing the methods in too much detail (see Appendix if interested), consider this set of the top ~150 colors which are enriched in sci-fi:

Representative set of colors enriched in science fiction vs. non-science fiction movies.
Representative set of colors enriched in science fiction vs. non-science fiction movies.

Naming colors is an inherently subjective exercise, but I submit to you that there are generally two kinds of colors which are present in this set: electric blues, and forest greens.

In fact, of the 159 colors which show enrichment in sci-fi movies, fully 92% are some shade of green or blue.  Leaving aside questions of nomenclature (and my [hopefully] pithy title), I think we can draw three general conclusions from this plot:

1) Red is not much favored in science fiction movies, at least relative to non-science fiction movies.

2) Bright blues, regardless of what you call them, are substantially over-represented.

3) Similarly with dark greens.

So there you have it, a testable prediction: insofar as sci-fi movies represent visions of the future, the future will be characterized by beautiful electric blues and deep, lush, greens.

In more seriousness, and after having thought about the dataset a bit more, I wonder if these colors are basically serving as “signs” (in the semiotic sense) of science fiction: essentially distinguishing marks which inform the audience that they are watching a science fiction movie, and prime them for appropriate shenanigans (spaceships and lasers and time travel, oh my!).  Directors, and/or cinematographers, may choose them in order to confer a feel of futurity which enables the suspension of disbelief necessary for sci-fi plot elements to work.  Conversely, creators of non-science fiction movies might avoid these colors in order to establish the authenticity of certain “present-day” or historical settings.  Alternatively, the choice of bright blues and dark greens may not represent any vision of the future so much as the conscious choice to find colors not frequently used in non-science fiction cinema.

Naturally, caveats apply.  First and foremost, it’s worth noting that 79 samples, while a sizable dataset for much scientific inference, represents only a vanishingly small subset of all the movies ever made (and is likely non-random in many respects).  It could therefore be the case that electric blue is only a sci-fi diagnostic character in this limited sample of movies.  This is a difficulty worth noting, but only ameliorated through more and better sampling. Secondly, as already noted, these characteristics of science fiction may only apply for movies which were created in the recent past; it is possible that older movies may have employed other signifiers of science fiction.

Finally, science fiction is a fluid and complex categorization, and some science fiction movies may disregard these colors entirely (as per the Cronenberg effect).  Nevertheless, I find it very interesting that these colors are not artificial.  What I mean by that is that these colors are not difficult to produce on camera without special effects.  If you had asked me before writing this to guess the top two colors which characterized science fiction, I might have guessed deep black (space!) or some kind of bright-orange/red (danger! lasers! suns!).  But these colors are, in fact, somewhat natural: blues the color of the evening sky, or a lake; green the color of trees.  What that says about the collective unconsciousness’ vision of the future, I don’t know (and probably nothing); but for my money, I profoundly hope that the future gets a little more electric blue.




Color enrichment analysis:

First, I quantized all of the colors by putting each continuous color measurement (red, green, and blue), which normally range from 0 to 1, into one of twenty bins (0-.05; .05-.1; and so on).  I then counted the number of colors in each bin for 1) all of the non-science fiction movies; and 2) all of the science fiction movies.  After that, I subtracted the resulting color abundances for the science fiction movies from the non-science fiction.  I considered any set of colors more abundant in the science fiction films as “enriched” in science fiction–there were 159 such colors (plotted above).


Cinema and Colorspace

Movies are arguably the most synthetic medium for storytelling, combining music, visual design, dialogue, performance, and plot into one package.  Each one of these elements might be considered a platform with which to create a narrative, but it is the combination of individual components which ultimately separates the great movies from the merely mediocre.

It is for this reason that I was excited when I saw Spotmaps (based on an idea by Brendan Dawes).  Spotmaps are a way to represent the visual content of a movie as a series of colors.  Each cell in a spotmap represents the mean color over 1 second’s worth of frames.  One reads a spotmap sequentially, from the top left to the bottom right.  Essentially, a spotmap condenses a movie to a simple progression of dominant colors.

Something about these spotmaps grabbed my attention.  One of the principal advantages of this encoding of a movie is that it is computationally tractable: while a naïve analysis of a movie would have substantial difficulty in identifying characters from their backgrounds or understanding the significance of dialogue, reducing a movie to its colors compresses all the visual information into a simple matrix.  I decided to use these spotmaps to analyze the movies, quantitatively.


From a spotmap to a matrix

I’ll briefly explain the means by which I turn this into numbers.

Colors can be visualized as inhabiting a three-dimensional space, with axes defined by the three primary colors (red, green, and blue).  Each cell in the spotmap is turned into three numbers, representing the red, green, and blue content of that cell.  So, for instance, black represents the absence of color; it lies on one corner of the color space, while white falls on the opposite corner.

Here’s an example of what I’m talking about, but scaled up, so as to represent the color content of a whole spotmap (in this case, WALL-E) in three-dimensional space:

WALL-E’s spotmap, plotted in the cube of RGB colorspace.

I like this representation of the color palette, because it shows the spatial relationships between colors.  Take a look at this spotmap of the excellent anime Ghost in the Shell.

Each axis represents one of the 3 primary colors: red, green, and blue.

There’s this spur of green jutting out from the main color axis, completely distinct from most of the other colors.  Turns out, upon visual inspection of the spotmap, that the green segments are contiguous within the movie, forming islands of bright color.

But it gets better.  Noticing this segmentation in terms of the colors, I went back and re-watched the movie.  Immediately, I saw an interesting pattern emerge: the very first scene in which the main character (and principal badass), Major Kusanagi, envisions the world is tinted neon green.  Without delving too far into the sci-fi nerdery that is my fondness for Ghost in the Shell, Kusanagi is an android, with access to a more complex set of visual information than a human.  In this case, the animators are using the color shift to drive a difference in perspective and underscore the novel way in which Kusanagi sees the world.

Ghost in the Shell is a pretty colorful film, overall.  By this I mean that the average color per frame is relatively high.  I went ahead and computed this number, which I call “color intensity” (but which might also be called color brightness or lightness; see appendix for details), for a few movies, as shown below (click the picture to expand to readable size):


The y-axis represents the mean brightness of colors. Bars are colored for aesthetic value, not for any reason pertaining to each movie.
The y-axis represents the mean color intensity. Bars are colored for aesthetic value, not for any reason pertaining to each movie.

You can see that Ghost in the Shell groups pretty well with other animated and action movies, whereas Cabin in the Woods brings up the low end of the spectrum, perhaps as expected (given that it’s a horror movie).


Color Speed

Another measurement I was interested in looking at in these movies is what I call the color speed (see appendix for mathematical definition).  Intuitively, one can think of this as the amount by which any two sequential frames change color.  Mathematically, it is computed as the distance in color space between sequential points.  I reasoned that this statistic would capture some of the “pace” of a movie: that is, the speed with which scenes and camera angles change (at least insofar as those changes also change the mean color).

If you compute the color speed for a few of my favorite movies, it looks like this:

Y-axis is the color speed (defined above). Note Man on Fire, about 50% faster than any other movie.
Y-axis is the color speed (defined above). Note Man on Fire, about 50% faster than any other movie.


Now you might notice a few interesting trends here.  One is that action movies tend to have a much faster color speed than other types of movies (outliers notwithstanding).  Intuitively, this finding makes a lot of sense to me: consider the abundance of ‘splosions in action movies, which appear in spotmaps as abrupt transitions from grey background to orange/red.  Look at this fight scene from Die Hard 2: notice the rapid shifts from the whiteness of the airplane to the darkness of the night, mediated by moving between cameras, as John McClane heroically beats a couple of guys up and then,  per action movie cliche, destroys an airplane with a fuel trail and a lighter.

One of the biggest outliers on this list is the underrated Man on Fire.  It has far and away the highest color speed of any spotmap I’ve analyzed yet.  Intriguingly, this matches both my own subjective observations and the critics’ reviews.  While I enjoy the movie for its kinetic pace and interesting cinematography, a large contingent of reviewers couldn’t stand its “bleariness” and “ADD-fueled insanity”.  This quantitative measurement indicates that Man on Fire really was a good bit quicker than other movies, faster even than most action movies.  I suppose, though, that there is no accounting for taste: whereas I saw the pace and rapid jump-cuts of the movie as reflecting the manic vengefulness of its hero, John Creasy, most critics labeled it a cheap trick (or in the words of one Rex Reed, “hyperthyroidal”).


Color spread

Another statistic I tried out was related to how many different colors inhabited the palette of a given movie.  More colorful movies, let’s say WALL-E, will have higher color spreads; at the far end of the spectrum, a black and white movie would have very little.  In colorspace, color spread is a function of how much of the space is occupied.

Here’s what I found.


Y-axis is the color spread, as defined above. Bar's colors are arbitrarily selected for aesthetic value.
Y-axis is the color spread, as defined above. Bar’s colors are arbitrarily selected for aesthetic value.


Nothing mindblowing here.  Movies like Tangled and WALL-E are quite colorful, presenting interesting mixtures of blues and reds, while So I Married an Axe Murderer stakes its claim in a rather more restricted region of colorspace.  One obvious fact is that animated films tend to be much more colorful than any other movies.  No doubt this pattern is reflective of both differences in technical limitations as well as the intended audience.  Another property worth mentioning is that this graph is qualitatively similar to that shown above, for mean color intensity.  This also makes some sense: the most colorful movies also tend to have the most different kinds of colors (i.e. color spread).


Machines alone

So after computing these statistics for several movies, I began to appreciate that there are some substantial differences between movies of different genres.  Action movies tend to be reasonably colorful, but very quick; dramas tend to be dark, and moderately paced; and finally, comedies have plenty of color, but are slow as mud.  Each of these differences makes sense intuitively, although I wouldn’t necessarily have picked comedies to be so slow.

I wanted to push it further though.  I wanted to know how informative the differences between genres are: to what extent can you predict a movie’s genre based SOLELY on a few simple statistics computed from its color profile?  After computing the statistics for a sample of 30 movies in three different genres (action, drama, comedy), I trained a machine learning algorithm called a Support Vector Machine to identify movies’ genres based on these statistics alone (randomly separating my dataset into 25 training, 5 testing).  The results are pretty neat:

Y-axis is the classification accuracy, based on 100 repetitions. 33% (the red line) is the expected "random" accuracy.
Y-axis is the classification accuracy, based on 100 repetitions. 33% (the red line) is the expected “random” accuracy.

The first column represents the accuracy using color intensity alone; the second represents spread alone; the third is speed alone; and the fourth is all three variables (intensity+spread+speed) together.  While each individual statistic contributes a little to the accuracy, the combination of all of the variables together is able to call a movie’s genre correctly a staggering 68% of the time.  Given three categories, one would expect a classification accuracy at random of only 33%.  That the algorithm is able to call genre accurately with only three clumsily-defined variables speaks to the idea that the spotmaps’ colors really are representing core aspects of the movies they are drawn from.


This is a start

The extent to which these quantitative differences in color statistics, computed independently of any human viewing, can recapitulate certain core characteristics of each movie, surprises me.  As I noted early on, cinema is a rich and complex art form, made of sound and speech and shape and color.  Using only one aspect of this tableau, color, we can discover interesting facts about films’ pace, characterization, and genre.

Going forward, I view this as a sort of proof of concept for further analyses.  There’s a great many directions to be taken, in terms of analyzing more genres, more movies, and deeper questions.  I have often wondered what separates a finely crafted film from a clunker; or what particular spectrum of differences there are between movies from the 80’s and the 90’s.  No doubt one can articulate subjective answers to these questions, but here represents an opportunity to gather together a bit of empirical evidence and bring it to bear on art.



Appendix: Mathematical definitions of color statistics.

For a spotmap matrix X, there are three columns R, G, B, which correspond to the intensity of each primary color (red, green, and blue).  There are a number of rows equal to the number of seconds in the movie’s runtime.  Example 3-second movie:
R   G    B

0   .5    0

1     1     1

0    0    0

(The 1st second’s color would be a forest green, like the color of the third bar in the bargraph on machine learning accuracy (above); the 2nd second’s  color would be pure white; and the 3rd would be pure black.)

Every cell than thus be specified Xij, with a time in seconds (i) and a primary color (j).

For this matrix, we can compute the color intensity as the mean of all values in the matrix, i.e. the sum of all Xij / number of all the cells in the matrix.  For the example spotmap, it would be (0+.5+0+1+1+1+0+0+0)/9 = .389.  Values for most movies are usually in the range of .1-.25.

The spread is equal to the Euclidean distance between 1000 randomly selected pairs of rows in the matrix (think of Pythagoras’ theorem, but with three numbers instead of two).  The example matrix is too small for 1000 different random selections, but I can compute it for the 1st and 3rd row: the value would be .5.

Finally, color speed is equal to the mean Euclidean distance between successive rows in the spotmap matrix.  For a matrix with n rows, there are thus n-1 Euclidean distances between rows; in this case, I would compute the speed as the average of (the distance between row 1 and row 2 [which is 1.5], and between row 2 and row 3 [1.73]).  The color speed of this matrix would thus be (1.5+1.73)/2 = 1.62.  Typical values of color speed are between .02 and .04.