Frank Fineis
Data Science blogging. Statistics. Data Engineering. Consulting.

The lambdarank loss function is a more complex pairwise classification loss, and implementing it efficiently is challenging. We'll lay out the math underlying NDCG optimization how LightGBM's LambdarankNDCG objective function achieves this goal.

We'll explore just how far we can go with Scikit-learn Pipelines by building an end-to-end feature processing + modeling framework on high-dimensional fraud detection data from Kaggle.

We'll use logistics data to illustrate the benefits, but added complexities, of using Type II changelog data for tracking complete data version history over time.

When it comes to racial gaps in the rates of college football quarterbacking, what are the numbers? Let's explore an end-to-end ETL process - building a MySQL database, scraping a college football statistics website into that database with Python, automating a Google Image scrape, pushing data back into that database, and then use R to publish some metrics from our efforts.

When a Data Scientists working to deploy models to entire industries or fleets should demand structure from their data. Big government, "big brother," what have you, can and should help us get to the next plateau when it comes to scaling machine learning applications beyond just one-off use cases and models.