Part 3 of n: Preparing Geo-spatial queries

From our last post we created an aggregated collection of coordinates with count of number of times frequented.

As stated, following is my goal: To show a heat map visualization of top pickup and dropoff locations in NYC. Currently I have divided the city into its five boroughs -> Manhattan, Brooklyn, Staten Island, Queens and Bronx. Each showing its top most frequented locations.

The technique that I applied is that:
View each borough’s geographical area as a polygon and use the geoWithin operator on those polygon coordinates to get the records for that borough.
We can create a rough diagram of each borough and set the coordinates at each point which makes a polygon. I used google maps for that.

Following are the polygons with coordinates that I created for each borough.

Manhattan:

Manhattan

Manhattan

Brooklyn

Brooklyn

Brooklyn

Staten Island

Staten Island

Staten Island

Queens

Queens

Queens

Bronx

Bronx

Bronx

Now that we got our coordinates, we can write a query to fetch all records within those coordinates.
Query for manhattan:

db.aggLocations.aggregate(
	[{
		$match: {
			"_id.lglt": {
				$geoWithin: {
					$polygon: [
						[-74.034240, 40.686697],
						[-74.019992, 40.680709],
						[-73.995495, 40.704948],
						[-73.971463, 40.709893],
						[-73.961764, 40.743814],
						[-73.911724, 40.794679],
						[-73.927174, 40.802346],
						[-73.933354, 40.835214],
						[-73.907433, 40.873646],
						[-73.933699, 40.882083],
						[-74.013984, 40.756951],
						[-74.034240, 40.686697]
					]
				}
			}
		}
	}, {
		$sort: {
			"value.cnt": -1
		}
	}, {
		$limit: 2000
	}], {
		allowDiskUse: true
	})

It took around 400ms to execute.

Similarly, you can create queries for other boroughs. I have prepared for the rest in the Node.js server. Check it out: https://github.com/tarun11ks/NYCTaxi/blob/master/js/external/server.js

Cool! Now we can head to our next post where we will setup our Node.js server.

Analyzing NYC 2013 taxi data

It all started after I saw this post on Hacker news : https://news.ycombinator.com/item?id=7910173
Thanks to Chris Wong for foiling the data.
Off-topic: Just checked his site and found that he has foiled another data. Awesome!

Back to the org topic, it’s a HUGE dataset. 173 million records!

Inspired from Chris’s work, I decided to give it a try and created a single page web application. This application will show a heat map visualization of top pickup and dropoff locations in NYC. Currently I have divided the city into its five boroughs -> Manhattan, Brooklyn, Staten Island, Queens and Bronx. Each showing its top most frequented locations.

I will share my work here by dividing it into different parts:
1) Preparing the dataset using MongoDB
2) Creating Map-Reduce
3) Preparing Geo-spatial queries
4) Using Node.JS to provide a REST interface
5) Finally Backbone.JS to create the single page application

Following is the GitHub page: https://github.com/tarun11ks/NYCTaxi
You can have a look at the Technology Stack here: http://stackshare.io/tarun11ks/nyctaxi

Note: The articles are not beginner articles. It expects some knowledge of Backbone.JS and MongoDB.