Analyzing Power Lifting Data
***Work done originally on July 2020***
Powerlifting is the sport where people compete to see who can lift the most in terms of bench press, squats and, dead lifts. Many people compete in these competitions to display their strength and work and train hard to get their bodies in shape for these events.
For this post, I will be writing about 2 topics:
- Looking at how much lifters lift in terms of bench press, squats and, dead-lifts per gender.
- See which competitions these countries are located that have the most disqualified lifters via a geo-scatter plot.
I found a dataset on https://openpowerlifting.gitlab.io/opl-csv/bulk-csv.html that shows data about lifters and includes attributes like, name, age, weight, date, amount lifted, score and location of meet.
Let’s code away and create some plots that show how much people lift between genders.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import plotly.express as px
import zipfile# load dataset# get the zip file of the data and then extract it using zf.open()
zf = zipfile.ZipFile('openpowerlifting-latest.zip')
powerlifting = pd.read_csv(zf.open("openpowerlifting-2020-06-20/openpowerlifting-2020-06-20.csv"), low_memory=False)powerlifting.columns
Index(['Name', 'Sex', 'Event', 'Equipment', 'Age', 'AgeClass',
'BirthYearClass', 'Division', 'BodyweightKg', 'WeightClassKg',
'Squat1Kg', 'Squat2Kg', 'Squat3Kg', 'Squat4Kg', 'Best3SquatKg',
'Bench1Kg', 'Bench2Kg', 'Bench3Kg', 'Bench4Kg', 'Best3BenchKg',
'Deadlift1Kg', 'Deadlift2Kg', 'Deadlift3Kg', 'Deadlift4Kg',
'Best3DeadliftKg', 'TotalKg', 'Place', 'Dots', 'Wilks', 'Glossbrenner',
'Goodlift', 'Tested', 'Country', 'Federation', 'ParentFederation',
'Date', 'MeetCountry', 'MeetState', 'MeetTown', 'MeetName'],
dtype='object')powerlifting.shape # got close to 2 million powerlifting records# How many Men, Women and Mixed genders participate
gender_plot = sb.countplot(x="Sex", data=powerlifting)
Visualizing Lift Attempts Between Male and Females
Even though there are 3 different genders in the dataset, the mixed gender (Mx) is barely visible in the barplot. If we look closely at the data and select only powerlifters that are gender-neutral, there are only 13 of them.
# From what wee see here, there are much more Men that compete than woman in powerlifting meets.
# Mx is for mixed gender and there seems to be much less; 16 lifters to be exact.mixed_gender = powerlifting[powerlifting["Sex"] == "Mx"]
As there are much more men and women then gender-neural competitors, we won’t include them in the plot only (otherwise it will look like as if there aren’t any which can be misleading)
Plots for male and female lifters for 3 attempts of bench press, squats and, deadlifts are below:
# filter out the Mixed genders and create some histogram plots of how much lifters squat, bench and, deadlift
# and ranking scores
not_mixed_gender = powerlifting[powerlifting["Sex"] != "Mx"]# histogram of gender vs each of the 3 Bench press attempts
g1 = sb.FacetGrid(not_mixed_gender, col="Sex")
g1.map(plt.hist, "Bench1Kg")g2 = sb.FacetGrid(not_mixed_gender, col="Sex")
g2.map(plt.hist, "Bench2Kg")g3 = sb.FacetGrid(not_mixed_gender, col="Sex")
# 3 attempts of squats for each gender
g4 = sb.FacetGrid(not_mixed_gender, col="Sex")
g4.map(plt.hist, "Squat1Kg")g5 = sb.FacetGrid(not_mixed_gender, col="Sex")
g5.map(plt.hist, "Squat2Kg")g6 = sb.FacetGrid(not_mixed_gender, col="Sex")
g7 = sb.FacetGrid(not_mixed_gender, col="Sex")
g7.map(plt.hist, "Deadlift1Kg")g8 = sb.FacetGrid(not_mixed_gender, col="Sex")
g8.map(plt.hist, "Deadlift2Kg")g9 = sb.FacetGrid(not_mixed_gender, col="Sex")
What is interesting is that there are many failed attempts for both genders and made me wonder where are these people getting disqualified at?
Where In the World are Lifters Being Disqualified at?
While looking at the plots on best attempts on different exercises, I noticed that there were negative values which means either the lifter was disqualified or a failed attempt.
It made me ponder which countries where these meets are held do people get disqualified the most. As such, I decided to create a scatter Geo plot showing these locations.
The approach was
- Filter the data to only have lifters who were disqualified.
- Calculate the number of lifters disqualified per meet country.
- Add a ISO alpha 3 code to the data for the purpose of creating the scatter geo plot. To know more about ISO alpha 3 country codes, click here.
- Use the programming python’s plotly package to create the plot.
Below is a scatter Geo plot where you can see which meet locations are lifters getting disqualified. Here you can zoom in and out and pan across the globe and see countries of disqualified lifters. The bigger the ‘bubble’, the more disqualified lifters are.
Click on the link below, have fun, experiment and you might be surprised of the results!
Thank you for checking out my blog post! Hopefully I’ve piqued your interest in powerlifting and data analytics. Feel free to comment on the blog if you like. Any feedback is greatly appreciated.
Link to my github repository of all the code used for this blog:
thatdatascienceguyblog/powerlifting at master · thatdatascienceguy/thatdatascienceguyblog
Repository of blog posts of my 'That Data Science Guy' blog https://thatdatascienceguy.com …
This page uses data from the OpenPowerlifting project, https://www.openpowerlifting.org.
You may download a copy of the data at https://gitlab.com/openpowerlifting/opl-data