After observing heavy skew in our data, with the vast majority of rideshare demand being equal to 1 for a numbers for an OD pair at a given time, we considered running two models in series; a classification model to separate few trips vs many trips and then a MPL Regressor to predict the count of trips for the many-trip classified subset. This was an attempt to further reduce prediction skew. Sadly, this approach yielded an only slight improvement above baseline with a CPC of 0.41.
  Running a MLP Regressor directly improved our results markedly yielding a CPC of 0.702. While we found this result initially surprising, review of the predictions rendered by this ensemble method provided insight into the observed increase in accuracy. The initial classification was assigning a Boolean high/low value to all records. Because of the heavy left skew of the data and a simple mean ride count assignment label for all low-ride predicted records, the error created through this assignment and the relatively high count of records receiving this assignment significantly added to the final error.
  To further improve results, we shifted to using a gradient boosted decision tree regressor, which has proven to be useful for a variety of demand modeling problems. After tuning, the model returned a CPC value of 0.752. The figure to bellow shows these improvements in CPC by model.