The Criminal Topography of Chicago

Lantern slide of the street and boulevard system, "present and proposed," from the Plan of Chicago

A Map of Chicago, in Crimes

I recently stumbled upon this website, which is a huge repository of the City of Chicago’s formidable data-gathering.  This page was put together as part of the city’s new commitment to greater transparency, and includes everything from CTA bus records to Tax-Increment Financing districts to registers of public employees and their salaries.  One of the files was a listing of every crime report filled out in the city for the last 13 years.

Let me note here that such a file is a staggering achievement.  For each crime, there is recorded the location, the time, the address, the kind of crime, and a few other bits of miscellaneous information.  In total, the file encompasses 4 million crime reports, spread about over 13 years; an average of 350,000 per year.

This spreadsheet is colossal, is massive, not only in terms of its sheer information (1 GB in total), but in the implications of it.  How many questions can be asked with this data that were impossible or accessible to only a handful of academics before?


This picture is a map of the City of Chicago, in crimes.  Each dot here is a single crime.  The darker green bits are where multiple crimes have piled up in roughly the same place.  You can see that it recapitulates perfectly the known geography of the city.  Where there are people, there are crimes.

There are also some gaps in the map.  Some of these blank spaces are parks; presumably the report is assigned to the nearest street location, instead of properly putting it where it actually occurred.  Other gaps are rivers or industrial zones, where presumably little crime is committed.

Different Neighborhoods, Different Crimes

Yet, not all areas heave the same crimes.  I separated the data by the offense.  I noticed immediately that some crimes were disproportionately committed in certain areas (I’m sure you can imagine some hypotheses).  Here’s an example of that phenomenon, in which I’ve focused on three kinds of crimes: battery (as in assault and battery), deceptive practices, and narcotics.


You can still see the outline of the city here, and its idiosyncratic borders.  But you’ll note that the battery is found mostly on the southwest side of the city, along with much of the narcotics reports.  There are pockets of narcotics reports on the northside, in Rogers Park and other neighborhoods, but narcotics is primarily a southwestern offense.

In strong contrast to the narcotics pattern is the grouping for “Deceptive Practices.” Wikipedia (behold, my complete ignorance of the law) informs me that Deceptive Practices includes things like fraud, false advertising, and misrepresentation.  Such offenses are most often the province of businesses, and so the Loop is home to the greatest cluster of Deceptive Practices reports.  You can see rays of Deceptive Practices reports emanating from the Loop along the major thoroughfares (Milwaukee Avenue, for instance).  This pattern presents a corollary to the above law: where there are businesses, there are deceptive practices.

The Inequality of Arrests

Another variable recorded in the dataset is whether an arrest occurred or not.  For some crimes, the arrest rates are near 100%, but in others, police have discretion as to whether to arrest (in still other cases, there may be no perpetrator to arrest).  An example of a crime in which there is discretion as to whether to arrest is low-level marijuana possession (<15g), which, in 2012, became a ticketable offense.  I decided to model arrest probability for marijuana possession as a function of location.  (Note here that ‘no arrest’ does not necessarily mean a ticket was issued; see below for details.)

I subset the data by the kind of crime and year (2013).  I selected narcotics reports, specifically those with less than 30g of cannabis (as described in the report; this was the lowest level).  I made a logistic regression model, incorporating latitude, longitude, and an interaction term between them.  There are fancier and cleverer ways of doing this modelling; I am not striving for mathematical precision but rather a rough overview.  Here’s the fitted probabilities of arrest, depending on where the crime took place.

arrest probability

I built a fully 3-dimensional version of this graph, which can be rotated and zoomed, here.  You should go play with it.  If you view the visualization head on, with longitude as the x-axis and latitude as the y, you’ll see a map very similar to the first graph on this page.  If you tilt it slightly, you’ll see that this graph can be thought of as a criminal topography of Chicago (as above), but warped or deformed by the probability of an arrest.

The takeaways of the model are largely as expected.  There are significant effects of latitude, longitude, and their interaction, such that the more southern or western the crime occurs, the more likely there is to be an arrest.  The difference is not overwhelming, but stark nonetheless: on the northeast side of the city, the probability of arrest falls to ~80%.  Anywhere in the southern or western sides of the city, it is close to 100%.  That’s probably not an artifact or an accident: we know that African-Americans and Latinos are much more likely to be arrested, overall, and the southern and western parts of the city are where many African-American and Latino people live.

Even so, I ought to note that I haven’t (and can’t) control for all the necessary covariates.  As always, the task is more complex than it appears initially.  Police must arrest when the offense is performed by people under 17, so it may be that in the south & west sides of the city, more arrests are occurring because there are more young people committing the crime.  As well, the police reports do not offer enough granularity to know whether the amount of weed is higher in the southern and western quadrants: the lowest level is simply less than 30g.  Since the technical cutoff for a ticket is only 15g, perhaps it is the case that there’s simply more offenders in the 15-30g window on the south and west sides.

I doubt that’s the case, though.  According to lots of published research, blacks and whites use marijuana at approximately similar rates.  So we are left with other, more uncomfortable reasons for the observed differences in arrest rates.

Whatever the cause, the pattern is clear, and serves as an example of the sort of discovery this data can give rise to.  I am usually skeptical of the notion of “big data” as any kind of transformative phenomenon, on the theory that big data is really just lots of small data put together.  But to the extent that governments embrace it, there may yet be some sea change, in that big data reduces the inherent information asymmetry between the people and their representatives.  The government of Chicago has a thousand maps of the city at their disposal at any moment, some flattering, some despicable; the thought that all of those maps might be laid bare and examined is exciting.



In the first draft of this post, I assumed that many of the marijuana crime reports (<30g) without arrests corresponded to citations that were issued instead.  I made this assumption based on three facts: 1) the marijuana reports without arrests increased suddenly in the crime logs in the same year the decriminalization measure was passed; 2) the number of no-arrest low-level marijuana reports was quite similar to the reported number of tickets issued; and 3) I couldn’t think of a reason why there would be crime reports filed for such small amounts of weed unless there was also an offender present to prosecute.

Having now found a database of some of those (~300) tickets, that assumption appears not to be the case: the issued tickets do not seem to be recorded in the crime database, meaning that the no-arrest marijuana reports are potentially something different.  This conclusion begs the question: what, then, are the no-arrest marijuana reports?  As of right now, I don’t know.  I have emailed the appropriate contact at the Chicago data portal to find out.

The fact remains, however, that whatever the no-arrest marijuana reports are, they are distributed non-randomly throughout the city.  I have changed some language in the post to make clear that these no-arrest reports are not necessarily tickets, and I will update again when I find out what the reports actually are.

Post a comment

You may use the following HTML:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>