Education

2017-(current)
PhD candidate in Statistics, Northwestern University (Evanston, IL)
  • Deep learning projects
    • NUCLSTM, predicting nucleosome locations with bidirectional LSTMs
    • Defeating adversarial examples with nonparametric ML
  • Publications
    • DegNorm: normalization of generalized transcript degradation improves accuracy in RNA-seq analysis (Genome Biology, 2019)
    • degnorm Python CLI, open-source distributed computing library (active maintainer)
  • Teaching
    • Department of Statistics Data Science course series
    • Introduction to Statistics
    • Introduction to Econometrics
  • Research focus
    • Ecological inference + regression
    • Randomized mixture of experts modeling
2014-2015
M.Sc. in Applied Mathematics, University of Washington (Seattle, WA)
  • Research focus
    • Singular value decomposition and soft nuclear norm thresholding for sparse matrix imputation
  • Certifications
    • Computational Finance and Risk Management certificate
2013
B.A. in Mathematics. Wesleyan University (Seattle, WA)
  • Phi Beta Kappa

Experience

2019-20 (1 yr)
Senior Consultant, Data Science at Aptitive, LLC. (Chicago, IL)
  • Lead development, deployment, and monitoring of all production cloud ETL processes (AWS DMS, CloudWatch, Lambda, SNS, SQS), as well as Snowflake Computing Data Vault 2.0 database deployment for large for logistics brokerage client.
  • Managed team development of SQL and ETL solution architects.
  • Developed Snowflake SQL Python CLI for Aptitive consultants’ data analysis and ETL needs.
  • Developer and maintainer of Aptitive’s Airflow deployment scripts and Airflow DAG codebase.
2018 (4 mos)
DevOps Consultant at Northwestern University IT Research Computing Services (Evanston, IL)
  • Built containerized workflows for specialized Linux computing needs, e.g. GPU-enabled containers.
  • Developed load testing scripts and user documentation for NU Linux HPC cluster, named Quest.
  • Developed cost-per-use and queueing strategy for SLURM deployment to Linux cluster system-wide.
2015-17 (2 yrs)
Data Science Lead at Uptake (Chicago, IL)
  • Managed team of Data Scientists as Data Science Data Engineering Team Lead
  • Built PostgreSQL database to serve all predictive models’ output, failure events on partner assets, as well as the web application and data pipelining infrastructure (Airflow) enabling Uptake executives to track all models’ statistical performance over time.
  • Built NLP recommendation system from Scikit-learn Pipelines for Data Science team that infers true supervised ML training labels from noisy, inaccurate textual descriptions of labels.
  • Developed time-to-event models in R to predict failures on diesel engines and natural gas compressors, built model piloting system to update clients on daily model performance.
  • Lead data science development of unsupervised multivariate anomaly detection product supporting IoT-enabled machinery. I hold a patent for this algorithm.
2015 (3 mos)
Data Science for the Social Good Fellow at the University of Washington eScience Institute. (Seattle, WA)
  • Engineered Python web application for bus dispatchers to optimally reroute King County Metro paratransit riders in the event of a bus breakdown, from cost + feasibility perspective.
  • Found weekly, monthly trends from years’ worth of paratransit ridership data; modeled demand, route disruptions, and costs in R, presented to executive leadership.

Computing proficiency

  • Languages: Python, R, Matlab, C/C++, JS, bash shell scripting
  • Data engineering: SQL (Snowflake, PostgreSQL, MySQL, MS SQL Server), Redis, Apache Airflow + Hive + Pig, pyspark
  • Dev tools: git, Travis CI, Marathon, Docker, Singularity
  • AWS cloud: EC2, S3, CloudWatch, DMS, RDS, Lambda + Step functions (check out my Alexa skill), message passing (SNS, SQS)
  • Data science tools: PyTorch, Keras, TensorFlow, Pandas + Numpy + Scipy + scikit-learn, Tableau, Flask, R Shiny