Your open data visualizations need context; my last crime map lacked any

To have an impact, programmers and journalists need to add context to their data visualizations.  If you just throw stuff up on a map, it may look neat, but it won’t carry much weight and will soon be forgotten.

My regional rail stop crime heat map sucks at telling a story
septa rail stops heatmap
heat map of crime around Septa regional rail stops

Last weekend I hacked together a heat map showing violent crime counts around Septa regional rail stops committed between 2007 and 2013.  It’s not even accurate, since for some reason, it doesn’t show anything around the regional rail stops at the airport.  The point of it was to demonstrate the open source technology that lets your mash up crime locations(or any location) with other locations, and draw inferences.  If you had cancer data and nuclear sites, or brownfield sites, or high tension electric lines, whatever, you could apply the same technique.

I didn’t think the map would get much play, but it did, since it didn’t do much.  Philebrity posted about it, but was tongue in cheek about it: “There’s no indication how many of the crimes were against SEPTA passengers, what time of day they occurred, or how many were of which crime. But hey, now you know.”  In other words, “this doesn’t say much but there are red parts of Philadelphia and we have post quotas”.  They were in on the joke.  But there was a commentor who called my map on it’s lack of context:

This is not a helpful map because it doesn’t account for population density or passenger volume or some other proxy to explain why there is obviously going to be more crime when there is a very high density of people in a particular area. I think a quick glance tells you that you are probably safer at 17th and Market than you are at 21st and Allegheny but this map doesn’t reflect that at all, in fact it tells you the opposite. This map should be dealing in the relative RATES of crime, rather than total number of crimes.

That’s dead on.  Of course center city rail locations have more crime, there’s more people there.  And just because the heat map shows red, doesn’t mean you’re in more danger of being a victim of violent crime.  I’m currently searching for ways to heat map the rates of crime to add context to crime near transit stops.

Philly crime visualizations so far

It’s been a year since the city has released it’s crime data to civic hackers.  And though there has been a a few very stylish visualizations, enough for the Atlantic Cities Blog to note, I’m not seeing much context associated with them.  There are 15 homicides in the triangle, so?  There looks like there’s more murders in Frankford than Mayfair, so?

There is a spot on quote in the Technically Philly post announcing the crime data release:

“…in private conversations with IT leads within the city and representatives elsewhere, there is real political concern about the impact of near real-time, automated crime mapping. Wouldn’t that make some neighborhoods look bad and create winners and losers, one council member once asked …”

I really thought that was going to happen.  I was picturing a new age of digital redlining where the middle class would look at heat maps and stay away from those areas.  That never happened.  The reason, I think, is that the visualizations lacked context.

Use PostGIS on open data to get stuff near other stuff

Heat maps are awesome.  You can tell great stories with them, like Axis Philly’s crime change map.  You have x amount of crimes in this neighborhood, y amount of crimes in that neighborhood, and if you have a shapefile of the city’s neighborhoods, you can do this with relative ease: 


But I think a lot developers working in open data stop there.  I think the neatest things are yet to be done outside of using shapefiles to define your separate areas.  Last year during a fight about an opening methadone clinic in Northeast Philly, someone was quoted saying there’s more crime around 7-Elevens than methadone dispensaries.  What a great challenge.  I want to play with this stuff so I started with crimes near Septa transit stops and heat map them.  So I threw both Philly crime data and septa data into PostGreSQL and used the PostGIS extension to pull out crime counts nearest each regional rail stop to get this:

septa rail stops heatmap

So for each station, with a latitude, longitude and a total count, I used LeafletJS and the plugin heatmap.js to set this up.  Nice and easy without the need for shapefiles or geoJson or whatever.

The project is hosted on GitHub but the SQL to get the data is, I think noteworthy:

SELECT row_to_json( ROW ) FROM (
   SELECT *, (
      FROM incident
      ST_DWithin( ST_MakePoint( rail_stops.stop_lon,rail_stops.stop_lat ), incident.point, .0015 ) = 't'
      AND (
         incident.text_general_code LIKE '%Robbery%'
         OR incident.text_general_code LIKE '%Assault%'
         OR incident.text_general_code LIKE '%Rape%'
         OR incident.text_general_code LIKE '%Homicide%'
   FROM rail_stops
) ROW;

That SQL code right there asks PostGIS to pull selected crime incidents and count them around each regional rail location.  The juice of it is using the function ST_DWithin to create a buffered circle around the rail stop location, and asking the database to pull back all crimes of the desired type within that circle.

[link] Violent Crime Heat Map of Septa’s Regional Rail Stops

[link] Project page on Github – Crime counts near Septa regional rail stops