Dataset
Our demand model relied on four primary sources of data; rideshare data from the Chicago Data Portal, census data from the US Census Bureau (USCB), weather data from Weather Underground, and Place data from Google Maps Places API.
Figure 1: Plot of API Data of Chicago
Chicago Data Portal contains 100M+ records with 80+ columns about Uber and Lyft rides in the city of Chicago. The data was down sampled from over a year of records to the month of September 2019 with roughly 4.3M records. The time range was selected to make sure the data is recent and relevant. Even with reducing the size of the data a GPU had to be used for data cleaning and merging. The trip pickup and drop-off latitude and longitudes and the pickup and drop-off were retained for all September 2019 ride records.
The latest census data (2010) was used to collect information on the population of Chicago such as car ownership, age statistics, population density, use of public transit and the average income of the area. This data was pulled at the tract level. Tracts, defined by the USCB, are small (in cities) contiguous geographic areas which contain approximately 5000 to 10,000 people.
The Google Maps API was used to pull Points Of Interest (POIs). POIs are geolocation tags which include the name a place and what type of place it is. Within Chicago there were 104 unique location types and 34k unique places of interest. These POIs are plotted across Chicago Tracts in the figure to the right.
Weather plays an important role in predicting the number of rides to expect between tracts. The weather data was sourced from World Weather Online. The temperature, precipitation, wind, wind-chill and humidity were retained for the hour across the city of Chicago (assuming consistency in conditions geographically but not temporally).