One of the first steps in any data science project is to acquire and analyze the raw data. Since this data will commonly be stored in databases, understanding Structured Query Language (SQL) will enable you to get the data you need and start working quickly. This post summarizes the basics of SQL’s SELECT statement, which is how you retrieve information from the database.

In order to make this information easily accessible, I’ve created a quick reference card (or “cheat sheet”) which matches this post and can be printed and kept by your desk for easy review. Download the quick reference card below and / or read on to learn about the basics of SQL.

SQL Cheat Sheet Quick Reference Guide

SQL is a standard language for storing, manipulating, and retrieving information from relational databases. The SELECT statement allows you to retrieve the information that is stored in the database. You need to specify which columns you want to retrieve and the tables that the information is stored in, but you can also use more advanced functionality to sort, filter, create calculations, group data, and create averages and subtotals from the data.

The information below assumes that you have a grounding in how relational databases work, which helps to understand how tables can be connected together.

General Format

SELECT {columns, aggregate functions, or subqueries}

FROM {tables or subqueries}

WHERE {filters based on columns or subqueries}

GROUP BY {columns to aggregate by}

HAVING {filters based on aggregates}

ORDER BY {specific columns or aggregate functions}

SELECT

Choose the columns, aggregates, and calculations that you would like shown in the results of your queries. Separate the columns using commas.

SelectUsingExample
All columns*SELECT *
Specific columnscolumn or table.columnSELECT firstname, students.age
Rename columnscolumn AS newnameSELECT lastname AS surname

The following aggregate functions are available, but require the use of the GROUP BY function to define the non-aggregate columns:

MIN(), MAX(), SUM(), AVG(), COUNT()
SELECT firstname, COUNT(firstname) FROM students GROUP BY firstname

Subqueries that return a single column and single row can be used by enclosing the entire subquery in brackets:

SELECT
  firstname,
  COUNT(firstname) / (SELECT COUNT(firstname) FROM students)
FROM
  students
GROUP BY
  firstname

Calculations can be created by using basic mathematical operators to add, subtract, multiply and / or divide any numerical columns.

FROM

Choose the tables that your columns are found in. Just like columns, use AS to alias (rename) your tables. If you are using multiple tables, you need to join them together using one or more columns.

Join TypeIllustrationExample
Left join FROM table_a AS a LEFT JOIN table_b AS b ON a.key = b.key
Right joinFROM table_a AS a RIGHT JOIN table_b AS b ON a.key = b.key
Inner joinFROM table_a AS a INNER JOIN table_b AS b ON a.key = b.key
Full outer joinFROM table_a AS a FULL OUTER JOIN table_b AS b ON a.key = b.key
Cartesian joinFROM table_a CROSS JOIN table_b

Join using multiple columns by using AND, and use comparison and logical operators to add filtering conditions directly to the join.

WHERE

Filter the results of your query based on the initial columns by using a mix of comparison and logical operators. Use brackets to change the default precedence of logical operators.

Logical Operators

(Listed in order of precedence)

OperatorExampleResult
NOTNOT aTRUE if A is FALSE
ORa OR bTRUE if A or B are TRUE
ANDa AND bTRUE if A and B are TRUE

Comparison Operators

ComparisonOperatorExample
Equal to=WHERE age = 20
Not equal to<>WHERE age <> 20
Greater than>WHERE age > 20
Greater than or equal to>=WHERE age >= 20
Less than<WHERE age < 20
Less than or equal to<=WHERE age <= 20
Between (inclusive)BETWEEN 1 AND 5WHERE age BETWEEN 18 AND 22
In list of values or subqueryIN (1, 3, 5)WHERE age IN (18, 19, 20, 21, 22)
Partial string matchingLIKE '%X%Y%Z%'WHERE firstname LIKE 'Tom%'

GROUP BY

Choose the columns that aggregation will be performed across. (ie, all the non-aggregate columns) Separate the columns with commas:

SELECT
  city,
  province,
  count(studentnumber)
FROM
  students
GROUP BY
  city,
  province

HAVING

Filter the query using the results of aggregate functions by using comparison and logical operators.

HAVING AVG(grade) <= 3.5

ORDER BY

Reorder the results of your query by specifying columns or aggregates and a sort direction (ASC for ascending order or DESC for descending order), separated by commas.

ORDER BY age ASC, AVG(grade) DESC

Although it might seem simple on the surface, SQL, is a powerful way to retrieve data and these simple functions can be combined in elegant ways to solve complex problems. If you have digested this information and want to learn more, I recommend reading up specifically about the uses of sub-queries and string functions.

I offer SQL training for corporate and professional groups if this has whet your appetite and you want a more thorough grounding. Contact me to learn more.

Leave a Reply

Your email address will not be published. Required fields are marked *