Skip to content

Transit to Everywhere

December 30, 2011

Data courtesy MapQuest and OpenStreetMap CC-BY-SA, the City and County of San Francisco, and Bay Area Rapid Transit

This is an overlay of the transit and walking trip plans generated by OpenTripPlanner from Powell and Market to every other intersection in San Francisco, after Eric Fischer’s map of walking routes to every intersection in San Francisco. It brings out the transit routes but also shows well-used walking routes. The lines do not vary in width (don’t let Market Street fool you, it’s actually several lines—BART, MUNI rail in 2 directions, Muni bus, walking—very near each other).  The lines fade where there are fewer routes using them, because they are rendered as black set at 10% opacity. Where there are more lines overlapping, the lines become darker, in what I believe is a log (or log-like) scale. It ended up just mostly being a map of San Francisco, with transit routes emphasized. It doesn’t show potential utilization of the transit system, because the routes are not weighted (it would probably be wise to weight the routes by the density of the block they terminate in and by their service area; i.e., estimate the number of people within the Thiessen polygon of each intersection and weight the route by that). Also, I had difficulty finding an opacity level where the usage of transit routes fades towards the end (as it clearly should) but still shows the streets that walked down by just one or two trip plans.

I think the data I used to make this map could possibly be better utilized to make a cartogram of San Francisco transit times (like another of Eric Fischer’s maps, but including transfers and walking times).

I’d also like to make a companion map using the OTP bike router. I think it could look really interesting in San Francisco, because the router will try to avoid hills.


I set up an instance of OpenTripPlanner using a graph built from OpenStreetMap data for the San Francisco area, as well as GTFS data from BART and San Francisco Muni. I used the pre-built binaries of OTP. I then used a Python script to request directions from Market and Powell to every other intersection in San Francisco, as defined in the StIntersections dataset from here. I stored the directions in a PostGIS database. I used one machine as the OTP server, and ran the script and PostGIS on another machine, but I see no reason why they couldn’t be on the same machine. I used QGIS to render the map. For what it’s worth, I’ve open-sourced the script I wrote. It may provide a good example of how to use the OTP JSON API in Python.

GitHub Image Diffs

December 23, 2011
tags: , ,

As you may have gathered, I like Git and GitHub. Today, I ran across a GitHub feature this is really cool and above and beyond the call of duty: not only do they produce and display diffs on text files, but also on image files! You can see an example in one of my repositories. Added points if you can figure out where the map tile is from!

Shapefiles in OpenLayers

December 13, 2011

Update 2011-12-14: It seems that a lot of people are coming here from web searches with phrases like “shapefile openlayers.” If all you want to do is display your data in OpenLayers, I’d highly recommend using a program like Quantum GIS to convert your Shapefile to a more web-friendly format like KML or GeoJSON. Both of these formats can be read by OpenLayers directly, and you’ll see faster performance and more browser compatibility than if you were to load your Shapefiles directly.

Over the last few days, I’ve been using Tom Carden’s shapefile-js library that reads ESRI Shapefiles in JavaScript, which I found via a post on the Prodevelop blog. The library is quite incredible, but his samples use a simple canvas for display. I thought it would be really cool if this could be integrated with OpenLayers, so I created a bit of JavaScript to do so. You can give it a test drive here, and look at the code on GitHub.

Basically, the library does all the heavy lifting. My code converts the shapefile shapes to WKT, which is passed to OpenLayers. Ultimately, I’d like to see an OpenLayers plugin so that you can use Shapefiles directly (i.e., an OpenLayers.Format.Shapefile). The main issue I see is that there needs to be a new strategy as well as a new format, because a) Shapefiles are made up of multiple pieces and b) we need to use the BinaryAjax loader since Shapefiles are binary.

My code seems to work well with points, lines and polygons, including the donut polygon case (to see for yourself, look at South Africa). (I did not test the donut polygon case, but I think it should work). More eyes are of course welcome! Also, the shapefile-js library can only handle pretty small Shapefiles. If I integrate this into OpenLayers, I think, long term, using a Web Worker thread to parse the Shapefile would be wise (which is another challenge to direct OpenLayers integration).

EDIT 2011-12-13 22:36 -0800: I tested the donut polygon case.

Another LA Metro Visualization

November 12, 2011

Here’s another visualization of the data used in the previous post; I made the lines a lot finer, so the noise is less visible. It’s easier than ever to see the Silver Line. I classed the data manually this time.

Making Transit Travel Speed Maps with Open Source GIS

November 12, 2011

Update 2011-11-12 8:21 -0800: I just posted a visualization I like better.

The Internet has been abuzz the past week regarding transit speed maps. It seems to have been spurred by a post on Bostongraphy, which was inspired by many of the amazing visualizations produced by Eric Fischer, especially this one. Indeed, this blog has gotten a fair bit of traffic itself, because Andy Woodruff of Bostonography used my avl2postgis project to retrieve the data.

Most people who have created these maps have used home-made solutions for the cartography, but I thought you should be able to do this with just stock SQL and QGIS. Using QGIS for the cartography allows you to bring in lots of useful tools, things like classification and ColorBrewer ramps.

The main trick is converting the point data that is retrieved from NextBus into line data for mapping (more about the cartographic considerations of line and point data below, for now I’ll focus on the technical aspects). After much whaling and gnashing of teeth, I figured out the spatial SQL to do this:

SELECT loc_a.oid, loc_a.vehicle, loc_a.route, loc_a.direction, transform(ST_MakeLine(loc_a.the_geom, loc_b.the_geom), 26945) AS the_geom,
	(ST_Length(transform(ST_MakeLine(loc_a.the_geom, loc_b.the_geom), 26945))/
	(EXTRACT(EPOCH FROM loc_b.time) - EXTRACT(EPOCH FROM loc_a.time))) *
	2.23693629 AS mph, loc_a.time AS starttime, loc_b.time AS endtime
  INTO acrt.lametrolines
    (SELECT *, ROW_NUMBER() OVER (ORDER BY vehicle, time) AS num FROM acrt.nextbus) AS loc_a
    (SELECT *, ROW_NUMBER() OVER (ORDER BY vehicle, time) AS num FROM acrt.nextbus) AS loc_b
      ON (loc_a.vehicle = loc_b.vehicle AND
        loc_a.route = loc_b.route AND
        loc_a.direction = loc_b.direction AND
        (loc_a.num + 1) = loc_b.num)
 WHERE loc_a.time <> loc_b.time;
ALTER TABLE acrt.lametrolines ADD COLUMN traversal int2;
UPDATE acrt.lametrolines SET traversal = EXTRACT(EPOCH FROM endtime - starttime);
(if you’re not using a timestamp column for the time column in your table (older versions of avl2postgis recommended varchar(19), you’ll want to run this script to convert them to timestamp columns).

The trick here is the window function ROW_NUMBER, which allows us to relate each row to the next row from that same vehicle. You’ll want to change the spatial reference from EPSG:26945 (State Plane California Zone 5) to something that is appropriate to your region. If it uses a unit other than meters, you’ll want to also change the conversion factor (2.2 for m/s to mph).

I added the traversal column afterwards; you could also do it in the original query. I used the traversal column (which is the time between position reports) to filter out segments in QGIS that took more than 3 minutes, so that coarse data is removed. I also filtered out segments with mph > 80, since they are probably caused by GPS noise.

I created a view that sorted by traversal descending—I believe that causes the segments with the most frequent reporting to display on top.  I messed with the symbology a lot to get the maximum amount of data to display; I ended up with 20 equal-interval stops between 0 and 80 mph, and a red-yellow-blue color ramp (admittedly lifted from the Bostonography post), with saturated red at 0 mph, bright yellow around 40 mph and blue at 80 mph. Most of the map is yellow-orange since it falls between 0 and 40 mph, and the degree of redness or yellowness indicates how slow or fast it is.

Comments or questions about how I did it or what the results were are more than welcome, either using the comments (preferably) or the contact link above.

I then symbolized based on the mph attribute. There are all kinds of things you can do with the symbology in QGIS—vary the ramps, the classification, and many other things. Also, since it’s just an SQL database, it would be trivial to make maps that showed, for instance, just Metro Rapid routes, &c.

The coolest thing about this map is how you can see the Orange Line (up north) and the Silver Line (extending east and south from downtown) as thick blue lines (hidden a bit by some of the other lines)—kudos to LA Metro for speeding bus service on these lines! I suspect the rail lines would show the same thing, but this map only shows bus service.

There are a few limitations that one should be aware of when using this map. One is that there are basically two classes of service for most agencies: slower, local-stop service, and fast express service (like the Silver and Orange line in this image); there isn’t much in between (there is Metro Rapid in LA, which somewhat bridges this gap). This means that most of any classification range won’t be used. I can’t wait to hear about innovative ways to solve this, but in the meantime the map still shows some neat things, and is also really pretty. In any case, I fiddled with the symbology a lot but wasn’t really happy with the results. I think a manually defined color ramp might be the way to go eventually, with detail around 10-25mph and less detail elsewhere. I didn’t want to change too much because I think one of the strongest things about this map is the amount of service above 40mph on the busways.

Another issue is drawing order. There are over 300,000 line segments in this map, so some of them draw on top of each other. Deciding which are more important is difficult; I displayed the shortest time segments on top so that the best detail would be emphasized.

A single line segment is drawn from each reported position to the next one. Positions are reported usually every 1-2 minutes, so if a bus is at a traffic light for a minute, that minute is 0 mph, while the next one might be cruising at 20 mph. A better way would be to have the speed averaged over several consecutive reports, if you were looking at specific lines, rather than chokepoints (to find chokepoints, you want the fine-grained data).

This map only shows buses, since LA doesn’t (yet) have real-time positions available for trains.

Also, there seems to be a lot of green and blue around downtown LA, which seems improbable and is likely due to GPS interference. In fact, there are tinges of green on many local streets, which suggests that there are some flaws in the data.

Google Maps Tile Scales

October 29, 2011

I found this buried deep in an appendix of the Mapnik XML Schema Reference, and I thought it so useful I am reposting it here:

Zoom level Scale denominator
0 559,082,264
1 279,541,132
2 139,770,566
3 69,885,283
4 34,942,642
5 17,471,321
6 8,735,660
7 4,367,830
8 2,183,915
9 1,091,958
10 545,979
11 272,989
12 136,495
13 68,247
14 34,124
15 17,062
16 8,531
17 4,265
18 2,133
19 1,066
20 533

These are used not only by Google Maps, but also by Bing Maps, OSM, CloudMade and many others, and in fact just about any Google Mercator tile source.

Mapping Real-Time Delays: Review

September 8, 2011

Some readers may have noticed that I’ve updated my last post several times in the last few days. After thinking about the algorithms I used, I realized there were some significant issues with them. I’ve explained them a certain amount in my updates to my previous post, but I’d like to expand on the issues a bit here.

  1. fig. 1

    Using an Inverse Distance Weighting algorithm exaggerates delays where stops are sparse by allowing them to spread over larger areas; the graphic should make this clear; if the red dots are stops with delays, one in the city center and one in a suburb, it is clear that the delays will be magnified where stops are sparse (figure 1), because there are less stops around it.

  2. Using an IDW layer also causes areas where there is no transit service to show data based on the nearest stops.
  3. The data from TriMet (and, it seems, perhaps other GTFS-realtime producers as well) contains data for only one or two stops on a given trip, so delays only show near where the vehicle currently is. For instance, if a delayed bus is downtown right now, chances are it will remain delayed all the way to the end of its route. This causes the red ‘delayed’ spots to follow delayed buses, rather than showing all the areas where there are delays, or showing the origins of delays. This is especially true in the outer suburbs, where the average delay for a stop is often based on just one or two transit vehicles.