Analyzing NYC 2013 taxi data

It all started after I saw this post on Hacker news : https://news.ycombinator.com/item?id=7910173
Thanks to Chris Wong for foiling the data.
Off-topic: Just checked his site and found that he has foiled another data. Awesome!

Back to the org topic, it’s a HUGE dataset. 173 million records!

Inspired from Chris’s work, I decided to give it a try and created a single page web application. This application will show a heat map visualization of top pickup and dropoff locations in NYC. Currently I have divided the city into its five boroughs -> Manhattan, Brooklyn, Staten Island, Queens and Bronx. Each showing its top most frequented locations.

I will share my work here by dividing it into different parts:
1) Preparing the dataset using MongoDB
2) Creating Map-Reduce
3) Preparing Geo-spatial queries
4) Using Node.JS to provide a REST interface
5) Finally Backbone.JS to create the single page application
6) Use D3.js for some fancy charts/graphs/visualizations [Pending]

Following is the GitHub page: https://github.com/tarun11ks/NYCTaxi
You can have a look at the Technology Stack here: http://stackshare.io/tarun11ks/nyctaxi

Note: The articles are not beginner articles. It expects some knowledge of Backbone.JS and MongoDB.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s