How I got my job

I get asked one question in particular more than any other—how did you get your job at FiveThirtyEight? It’s a reasonable thing to ask. Data journalism is still so new and rapidly developing that I don’t think there’s any standard path into a position like mine. To whatever extent there is such a path, it probably runs through the same paths as traditional media jobs, either via J-school or from outlet to outlet.

That’s not the path I took. I started by getting my PhD in evolutionary genetics. I had a long-term ambition (since I was a kid) to get my PhD in something, and I felt passionate about understanding evolution in particular. I had the idea (along with many other people) to combine genomics/systems biology methods with evolutionary questions, and so went about finding an opportunity to do that.

I loved the first half of grad school. The first two years of most PhD programs are focused on learning the skills and theory within your discipline, before applying them later on to a research question. My program was an incredible intellectual environment, and I was able to test out ideas among a varied, brilliant group of students and professors.

At the same time, science can be stifling. Setting aside matters of intellectual curiosity, graduate school is also about getting you a job and launching you into (most frequently) an academic career. To that end, a great deal of it is devoted to the messy everyday business of doing science: publishing papers, applying for grants, going to conferences, making the appropriate contacts, and so on. Much of that everyday work isn’t about science at all. I know that academia isn’t unique in this. Like many careers, you have to grit your teeth and accomplish certain goals in a prescribed manner, even sometimes (for me, often) to the detriment of your broader, intellectual mission.

Around about halfway through graduate school, I became increasingly frustrated with that side of my job. I started looking around for a more creative outlet, one where I could ask interesting, data-centric questions without needing the payoff of a full-fledged academic paper to justify my efforts. I started a blog—this blog—and forced myself to do about one piece every two weeks on any topic that interested me. (That pace was calculated to be difficult and uncomfortable, but achievable.)

Perhaps the hardest thing about any regular, frequent writing assignment is finding enough material to sustain it. In search of topics for my blog, I turned to baseball, which has an abundance of available and well-curated data. I mixed a few baseball topics into my rotation, typically doing fairly simple modeling work.

I had neither expectation nor plan that this writing would lead to anything, but about a year into writing my blog, I received an email from Ben Lindbergh, who was then the Editor in Chief of Baseball Prospectus. He asked if I’d like to write for them. I said yes, reasoning that I’d be doing largely the same thing but getting paid a small amount for it.

I wanted to keep the same bi-weekly to weekly schedule, but again, I had no particular ambition to make a career out of my baseball writing. Frankly, I figured I’d be a spectacular failure, which is how I enter a lot of situations. Imagining that I’d be belly-flopping anyway, I decided to be bold in doing so, and try to take on topics too big or complex for others to attempt. I asked Ben if I could name my column Moonshot, partially as a sarcastic joke on myself, and partially in reference to the other, baseball meaning of the word.

I found myself loving the work at Baseball Prospectus. Instead of largely speaking into the echo chamber of my blog (or the broader, but still depressingly empty world of science), I was getting feedback—not only from Ben and the other wonderful writer/researchers at BP, but from the internet at large. (A paradox of internet writing tends to be that the fewer the people reading your work, the greater the percentage of the feedback that is positive.)

After a handful of articles, I was contacted by an MLB team about doing consulting, an opportunity at which I jumped. That seemed to legitimize my efforts, and so for the first time I started considering baseball-writing related careers, instead of just the default academic science path. With that said, I explored team-related opportunities and found them wanting. And with no immediate prospects at other media outlets, I tabled that prospect.

One year into my work at Baseball Prospectus, Nate Silver contacted me about the open baseball writer position at FiveThirtyEight. Apparently Ben, who had recently joined Grantland, had recommended me. I didn’t apply; I hadn’t considered myself good enough to have a shot. But following a few phone conversations, I had a contract offer from ESPN for part-time work that would be about the same time commitment as what I managed at BP.

At this point, I started to consider the idea of a career as a writer more seriously. The contract offer took place as I was finishing my PhD, in the last year of it. For those who have survived graduate school, you’ll recognize this chapter as the toughest time. (The difficulty was further compounded by working two jobs, as well as some changes in my personal life.) For me and for most people, this period is mostly about the part of graduate school I liked the least: finishing papers, pleasing faculty members, and lining up some kind of post-graduate opportunity to show that you are ready to receive your PhD.

Partially as a result of the misery of finishing my PhD and partially because it was the natural continuation of a longer arc in my life, I started scheming toward a career in journalism instead of science. I succeeded in getting a temporary postdoctoral fellowship while I made up my mind and surveyed my options. By the time I had to make a decision about going on to another fellowship, I was completely certain and ready for a change. I renewed my contract with FiveThirtyEight, quit my postdoc, and set about freelancing to build a more full journalistic resume.

I don’t know that there are any major lessons to take from my career path except: be extremely lucky. I certainly was: I owe my whole career to Ben Lindbergh finding my site on a list of Google results. What are the odds?

To the extent that there is a lesson, I’d say it’s that you should start a blog. This is the advice, puny as it is, that I give to nearly everyone. When I was still blogging anonymously, I thought of each piece as a lottery ticket. Like the lottery, the expected payoff on any given ticket is negative, but you can maximize your chances by getting more tickets. Most of my blog posts went basically unvisited, or perhaps by a handful of Facebook friends. Once I got to the front page of Reddit, but it never went any further beyond that.

The winning ticket, so to speak, was a random piece on baseball player injuries that happened to be posted around the same time that Ben was researching one of his articles on the same subject. I doubt it was the best piece on the blog at the time; I’m ashamed at the quality of it now. But it was Good Enough to lead to the next opportunity, which took me to the next opportunity, and so on. I can’t claim much credit for the progression except insofar as I was doggedly persistent in continuing to write on a regular schedule. The rest was good fortune.


The Future is Electric Blue, Part II

What do Luke’s Lightsaber, the main deflector dish of the USS Enterprise-D, and J.J. Abrams’ ubiquitous lens flare have in common?

What characteristic do the engine of the Millenium Falcon, Dr. Manhattan of the Watchman, and the diva of the Fifth Element share?

I’ll make it easier.  What color unites the N’avi of Avatar, the light-cycles of Tron: Legacy, and the cover art of Prometheus?

The answer to the riddle: All of these things are electric blue.  I’ve written in the past about how electric blue is a kind of signal of science fiction, one that shows up in a statistically robust way when performing an unbiased analysis of the color spectrum of movies.  Electric blue can be found in your major sci-fi franchises:




And in the box art of minor box-office bombs:


In your kitschy classics:


Read More

Towards Heuristics of Heuristics

The world is complicated, nearly unmeasurably so. Even the workings of the smallest bits of matter in the universe are incredibly complex, and so to make progress in understanding the world, we have to neglect some of this vast complexity. We must impose simplifications on our mental models of the world in order to make progress in comprehending the world.

Following the definition of William Wimsatt, these simplifications can be called heuristics. By his reckoning, not only are heuristics necessary, but the choice of particular heuristics guides the path of our gaining of knowledge.

Heuristics are necessary because human beings are finite. In the old Enlightenment-era schemes, the world could be perceived as Laplacian in nature, as a finite series of determinate computations. Now that we are aware of the true scope of the universe, the notion of a determinate universe, while perhaps theoretically interesting, lacks utility. Even if we were to possess perfect information, the scale of the universe would not yield to any accurate calculation.

These arguments apply just as well when scaled down to the level of realistic scientific inference. There are too many permutations of gene regulatory networks to model them as the interactions of molecules. Combinatorial explosions abound in dealing with genomes of even a few thousand genes. We must idealize them.

So we require simplifications, and a change in the framework of our thought. Absent a single, unifying conception of the universe, we are allowed a choice of heuristics. For example, one choice of heuristics for gene regulatory networks is to consider the network in the context of graph theory (like Davidson).

You end up with something that looks like this.

Davidson-type wiring diagram of a gene regulatory network. Boxes are genes, lines connecting them represent regulatory interactions. From
Davidson-type wiring diagram of a gene regulatory network. Labelled transcription cartoons (e.g. FoxA) are genes, lines connecting them represent regulatory interactions. From

In addition to being (sometimes) visually appealing, this heuristic frames the problem in a space of mathematics which seems to apply well to gene regulation. That space is network theory. By considering regulatory interactions (Gene X activates Gene Y) as edges and genes as nodes, we can perhaps gain some useful knowledge by applying the well-understood axioms and practices of network theory to this new, empirical problem of gene regulation. This perspective is intriguing, and also simplifying: under the hood, regulatory interactions are not simple mathematical relationships, they are incredibly complex processes performed by elaborate molecular machinery. Perhaps, however, these incredibly complex machines perform their operations in ways that are similar enough to those simple mathematical relationships that we can, for a moment, neglect the considerable intricacy of those molecular machines.

Supposing that the network theory heuristic is an interesting and useful one, I now want to broaden the scope of this post to a larger, more significant question. If some heuristics are useful, and others aren’t, can we come up with a general theory for which heuristics apply in which situations?

I had this thought while re-reading William Wimsatt’s excellent tome on heuristics.  He advocates for them, and for an escape from physics-style thinking (in terms of Lamarckian demons and absolute rule-sets). I agree on both scores, but the trouble with heuristics is that they are frighteningly* arbitrary.

For instance, I could choose to use one set of heuristics to simplify gene regulation, and you could choose a different set.  Our two sets would differ in terms of their simplifying assumptions, and, if we advanced the study of each of them sufficiently far, they might produce different predictions about the behavior of some particular gene regulatory systems.  How would we know which one was right?

This scenario is something of a false problem, because we could always do experiments to test the predictions of each, and then disregard (or assimilate) the one whose predictions turned out to be correct less often.  Even so, as long as there were two seemingly correct systems of thinking about gene regulatory networks, there would be tension between them.  And one could envision scenarios in which people proposed wildly inappropriate heuristics to tackle gene regulatory problem-agendas, producing false results but being unprovably wrong (for at least a short time).

In the long run, to avoid these kinds of problems, it would be desirable to have a guidebook of sorts which prescribed the kind of heuristics which are most likely to be useful in each situation.  Wimsatt doesn’t provide that guidebook, although he seems to hint at its utility (and perhaps, its future existence).

The Book of Heuristics

In thinking about heuristics, I always tend to come back to the concept of statistical models which, under my reading of Wimsatt, are perfect examples of heuristics. A statistical model describes the way that two or more variables relate to each other. To work properly, it has to assume some structure between the two or more variables. When we then apply the model to actual data, we fit the data into the structure, and in so doing, perhaps learn something about the problem at hand.

For example, linear models assume that two variables are linear functions of each other, that is:

y = ax + b

That equation should be comfortable to the reader, as it’s a fairly straightforward relationship to have. Verbally, it means that for every one unit increase in x, y increases or decreases by some steady amount, a. When we fit a linear model to data, we take a series of y‘s and x‘s and, using some algorithms, impute the values of a and b. By applying this heuristic, we are able to learn something, we think, about the underlying link between x and y (e.g. each unit of x is worth a units of y).

Linear models work great in all sorts of places, but they also fail when applied to some datasets.  If one were to attempt to model the amount of sunlight as a function of the hour of the day (with a 10-year dataset of each) using a linear model, one would get something like a flat line. Moreover, the fit of the data would be terrible. We know that there isn’t a linear relationship between these two variables, and so to assume that there is defies common sense and good statistical practice.

The answer to the question I posed earlier (could we build a guidebook for heuristics?) seems to hinge on whether we could identify, in advance, whether a certain heuristic would be likely to fail when applied to a given problem.

To a certain degree maybe that objective is possible, in that we can let intuition and the “eye test” guide us. If, for example, we were to plot hour of the day vs. amount of sunlight, it would be very plain that the dataset is not amenable to a linear fit, although there is clearly a pattern present. But this ‘eye test’ seems really just like mental model fitting.

I wonder if we couldn’t do better still, and without gathering the data beforehand. As a parting quandary, I pose the following question. Is it possible that, by leveraging our intuition about the architecture of a system, we could rule out certain heuristics as being likely to apply poorly to that system?

I think the answer is yes, and I think this kind of heuristic choice is exercised successfully all the time (albeit without a firm theoretical basis for its use). For example, imagine some highly complex system, in which different parts of the system are strongly interconnected. The system functions in such a fashion that changes to any single part ripple outwards in unpredictable ways, sometimes being buffered by the other parts of the system, sometimes causing catastrophic deviations from the system’s normal functioning.

Intuitively, when I envisage such a system, I think of it as being a poor fit for any simple heuristic like a linear model, at least for most questions. Because of the strong interdependence between parts of the system, and the overall intricacy of the structure as a whole, I imagine most perturbations to such a system are unlikely to result in linear effects on any appreciable scale. Again, as a matter of intuition, I would prescribe the use of (for example) more sophisticated statistical models, those with fewer constraints and more flexibility, in order to better agree with the inherent characteristics of the system.

If my intuition is correct (perhaps it’s not!), then it suggests that there could yet be a guidebook of heuristics, a way to tell in advance which heuristic approaches are likely to be most fruitful. In this way, we could build a set of heuristics of heuristics, i.e. meta-heuristics, which shaped not only the particulars of the simplifications we applied to models of complex inquiry, but the kinds of simplifications.


*Frightening only to some. I think we shouldn’t be frightened of their inconsistency with each other; as the quote goes, “Consistency is the hobgoblin of small minds.” Instead, I think it’s one of the universes most forgiving and helpful properties that multiple schemes of thinking about a problem can converge on the same correct answer. We should embrace and enjoy the chaotic, diverse world of heuristics, and set about determining which ones are the best and why.