Part 3 of n: Preparing Geo-spatial queries

From our last post we created an aggregated collection of coordinates with count of number of times frequented.

As stated, following is my goal: To show a heat map visualization of top pickup and dropoff locations in NYC. Currently I have divided the city into its five boroughs -> Manhattan, Brooklyn, Staten Island, Queens and Bronx. Each showing its top most frequented locations.

The technique that I applied is that:
View each borough’s geographical area as a polygon and use the geoWithin operator on those polygon coordinates to get the records for that borough.
We can create a rough diagram of each borough and set the coordinates at each point which makes a polygon. I used google maps for that.

Following are the polygons with coordinates that I created for each borough.

Manhattan:

Manhattan

Manhattan

Brooklyn

Brooklyn

Brooklyn

Staten Island

Staten Island

Staten Island

Queens

Queens

Queens

Bronx

Bronx

Bronx

Now that we got our coordinates, we can write a query to fetch all records within those coordinates.
Query for manhattan:

db.aggLocations.aggregate(
	[{
		$match: {
			"_id.lglt": {
				$geoWithin: {
					$polygon: [
						[-74.034240, 40.686697],
						[-74.019992, 40.680709],
						[-73.995495, 40.704948],
						[-73.971463, 40.709893],
						[-73.961764, 40.743814],
						[-73.911724, 40.794679],
						[-73.927174, 40.802346],
						[-73.933354, 40.835214],
						[-73.907433, 40.873646],
						[-73.933699, 40.882083],
						[-74.013984, 40.756951],
						[-74.034240, 40.686697]
					]
				}
			}
		}
	}, {
		$sort: {
			"value.cnt": -1
		}
	}, {
		$limit: 2000
	}], {
		allowDiskUse: true
	})

It took around 400ms to execute.

Similarly, you can create queries for other boroughs. I have prepared for the rest in the Node.js server. Check it out: https://github.com/tarun11ks/NYCTaxi/blob/master/js/external/server.js

Cool! Now we can head to our next post where we will setup our Node.js server.