One of the best ways to understand data is through the use of descriptive statistics, figuring out the minimum and maximum values, the median value, and the quartiles. When you’re working with smaller datasets this is easy, but with larger datasets you need to parse a lot of data to get these metrics. Luckily, you can use SQL to get descriptive statistics for your data directly from the database.
One of the first steps in any data science project is to acquire and analyze the raw data. Since this data will commonly be stored in databases, understanding Structured Query Language (SQL) will enable you to get the data you need and start working quickly. This post summarizes the basics of SQL’s SELECT statement, which is how you retrieve information from the database.
A successful data scientist needs to draw on skills from many disciplines, and one of the core skill sets is knowledge of relational databases and querying using structured query language (SQL). Relational databases are the most common way to store structured data, so a firm understanding of databases is key to obtaining performing simple analysis and reporting quickly.