Frank Fineis

May 1, 2021

The inner workings of the lambdarank objective in LightGBM

The lambdarank loss function is a more complex pairwise classification loss, and implementing it efficiently is challenging. We'll lay out the math underlying NDCG optimization how LightGBM's LambdarankNDCG objective function achieves this goal.

Apr 30, 2020

CSV to model with a one-liner: an end-to-end Sklearn pipeline

We'll explore just how far we can go with Scikit-learn Pipelines by building an end-to-end feature processing + modeling framework on high-dimensional fraud detection data from Kaggle.

Mar 26, 2020

Joining Type II change tables and logistics industry data

We'll use logistics data to illustrate the benefits, but added complexities, of using Type II changelog data for tracking complete data version history over time.

Feb 24, 2018

How bad is it? Exploring racial disparities in NCAA quarterbacking with an ETL pipeline

When it comes to racial gaps in the rates of college football quarterbacking, what are the numbers? Let's explore an end-to-end ETL process - building a MySQL database, scraping a college football statistics website into that database with Python, automating a Google Image scrape, pushing data back into that database, and then use R to publish some metrics from our efforts.

Sep 13, 2017

The Case for Big Brother in Big Data

When a Data Scientists working to deploy models to entire industries or fleets should demand structure from their data. Big government, "big brother," what have you, can and should help us get to the next plateau when it comes to scaling machine learning applications beyond just one-off use cases and models.