Extracting LaunchBox’s Video Game Metadata: Getting Data of Video Game Consoles

Looking at the Data

The zip file that was given to me by Launchbox has several XML files. One of them is a file containing all the platforms that Launchbox has in their records.

  • Read the XML file of video game platforms
  • For each platform, extract certain attributes and store them in a DataFrame.
  • Take the DataFrame and write it to a CSV file.

Getting the Data

I used python to simply read in the XML file as follows:

import xml.etree.ElementTree as ET
import pandas as pd
import re
# read in the data
platforms_xml = ET.parse('Platforms.xml')
root_xml = platforms_xml.getroot()
  • Name
  • Emulated
  • Release Date
  • Developer
  • Manufacturer
  • CPU
  • Memory
  • Graphics
  • Sound
  • Display
  • Media
  • Maximum Controllers
  • Notes
  • Category
# Create Empty list to store attributes
platform_attrs = ["Name", "Emulated", "ReleaseDate", "Developer",
"Manufacturer", "Cpu", "Memory", "Graphics", "Sound",
"Display", "Media", "MaxControllers", "Notes", "Category"]
# list to store all the data to be converted to CSV
rows = []
for platform in root_xml:
data = []
# Check if any of them are null and if so, set the value to 'None'
for field in platform_attrs:
if platform is not None and platform.find(field) is not None:
data.append(platform.find(field).text) # add the data
else: # it is a empty value set to None
data.append(None)
# append the data/observation to the rows list
rows.append({platform_attrs[i]: data[i]
for i in range(0,len(platform_attrs))})

Taking the Data Extracted from the XML File and Putting in a DataFrame

The code below shows how to convert the extracted data to a data frame:

# Create a dataframe out of the extracted data from the xml file platforms = pd.DataFrame(rows, columns = platform_attrs)
# looking around the csv file, there are time/timezones in the date as well.
# let's remove the time and keep the date
dates = [re.sub("T.*", "", date)
if date is not None else None
for date in platforms["ReleaseDate"]]
platforms["ReleaseDate"] = dates

Querying the Data

After all that work, I was curious on a few questions like

  1. How many Nintendo consoles have been made and when?
  2. What are the specs of each Sony console?
  3. Which consoles were using floppy disks and when did they start fading out?

Summary

I have shown you how I took XML data courtesy of LaunchBox and using programming and logic, convert it to a DataFrame and reveal insights about the data. I love to let the data speak for itself. :)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jonathan Hernandez

Jonathan Hernandez

Data science grad who loves blogging about data science topics. Open to work.