One of the best ways to understand data is through the use of descriptive statistics, figuring out the minimum and maximum values, the median value, and the quartiles. When you’re working with smaller datasets this is easy, but with larger datasets you need to parse a lot of data to get these metrics. Luckily, you can use SQL to get descriptive statistics for your data directly from the database.
I’m working on a dashboard to track COVID-19 cases per capita in Calgary, and while the government’s open data API provides daily case counts within the city it doesn’t have any history available. The easy solution to this is to download the data on a daily basis and archive it myself, but I want to automate the download and loading into the database so I don’t have to think about it. Luckily, MySQL and a bit of shell script goes a long way.
If you need a database for a project, MySQL is one of the most popular choices. It’s free, open-source and is a core part of of the popular LAMP (Linux, Apache, MySQL, PHP) web application stack. If you want to get started using MySQL for a project, here’s a guide of how to install it on a fresh installation of Ubuntu 20.04.
One of the first steps in any data science project is to acquire and analyze the raw data. Since this data will commonly be stored in databases, understanding Structured Query Language (SQL) will enable you to get the data you need and start working quickly. This post summarizes the basics of SQL’s SELECT statement, which is how you retrieve information from the database.
A successful data scientist needs to draw on skills from many disciplines, and one of the core skill sets is knowledge of relational databases and querying using structured query language (SQL). Relational databases are the most common way to store structured data, so a firm understanding of databases is key to obtaining performing simple analysis and reporting quickly.