Calculating medians and quartiles across groups in SQL

One of the best ways to understand data is through the use of descriptive statistics, figuring out the minimum and maximum values, the median value, and the quartiles.  When you’re working with smaller datasets this is easy, but with larger datasets you need to parse a lot of data to get these metrics.  Luckily, you can use SQL to get descriptive statistics for your data directly from the database.

A brief overview of SQL’s SELECT statement

One of the first steps in any data science project is to acquire and analyze the raw data. Since this data will commonly be stored in databases, understanding Structured Query Language (SQL) will enable you to get the data you need and start working quickly. This post summarizes the basics of SQL’s SELECT statement, which is how you retrieve information from the database.

Supporting skills for data science: relational databases & SQL

A successful data scientist needs to draw on skills from many disciplines, and one of the core skill sets is knowledge of relational databases and querying using structured query language (SQL). Relational databases are the most common way to store structured data, so a firm understanding of databases is key to obtaining performing simple analysis and reporting quickly.