
While I am not really a sports fan, I am a fan of stats. So Lebron breaking an almost 4 decade old record for all time scoring during the regular season is something that I can appreciate. I was reminded of the research work of a faculty member from my university which combined data science and collegiate basketball. I remember seeing a scatterplot but the X & Y coordinates are spatial information of game events.
I became inspired to map out all of Lebron’s career points. First I needed to learn how to get the data. Luckily, I am the only one who is interested in this. I have followed this blog post and accessed the NBA’s stats API to scrape all of Lebron’s shots. I don’t think this result includes all free throws though but that is good enough for me.
The Matplotlib hex bin plot is nice and all, but I wanted to add more interactivity. I moved on to using Plotly Express. I followed this article which provided me the code of drawing the basketball court in Plotly. I wanted to plot the cumulative points for each season. Unfortunately cumulative plotting is not directly supported so I had to wrangle the data frame a little bit. After a few more tweaks I was able to create the visualization below:
Read More
Build an Automated ETL Pipeline. From setting up Docker to utilizing APIs and automating workflows with GitHub Actions, this post goes through it all.
Mastering Pipelines: Integrating Feature Engineering into Your Predictive Models
Master predictive modeling with Scikit-Learn pipelines. Learn the importance of feature engineering and how to prevent data leakage.
Unlocking Data Science: Your Easy Docker Setup Guide
Ready to dive into data science? Learn how to set up your development environment using Docker for a seamless and reproducible workspace. Say goodbye to compatibility issues and hello to data science success!
Predicting a Fitness Center’s Class Attendance with Machine Learning
In this project I analyzed a fitness center's attendance data to predict attendance rates of its group classes.