SQL3: Filtering Data
Filtering Data, Conditional Operators, WHERE Clause, Exercises & Interview Questions
As stated in the previous SQL post. If you don’t know SQL, no company will hire you for any data related role.
Here’s another Machine Learning Engineer position that opened up.
How about that…. This thing matches exactly what I said:
![Twitter avatar for @BowTied_Raptor](https://substackcdn.com/image/twitter_name/w_96/BowTied_Raptor.jpg)
Table of Contents
Filtering Data
Conditional Operators
Where Clause
Putting Everything Together
Exercises/Interview Questions
1 - Filtering Data
When you are working with SQL Tables, sometimes we don’t want to see every single row. Sometimes, we want to filter the table, so that we only see the rows that meet our given criteria. This is the basics of filtering data in SQL.
2 - Conditional Operators
2.1 Basics
This section here is the same as conditions in Python/R, with some minor symbolic changes. To make life easier, I’ve put all the basic conditional expressions here for you in a simple table.
If you type in something like A > 45, this means you are checking to see all of the values in column A, which are greater than 45.
2.2 BETWEEN, IN, LIKE
BETWEEN: The BETWEEN operator is used to specify minimum, AND maximum ranges. For example, if I want to see only the entries in column A between 40 and 45. I’ll say: A BETWEEN 40 AND 45.
IN: The IN operator lets you pull rows which in your list of allowed values. Our list of allowed values are placed inside the brackets following the IN operator. For example, if I wanted to pull only values which were 40,42,49,69, and 420 in column A. I’ll say: A IN (40,42,49,69,420)
LIKE: The LIKE operator lets you tell the database to look for any records that match a certain pattern. For example, if I wanted to pull all the records in column A that have ‘app’ (and only app) then I would use the LIKE operator. I’ll say: A LIKE ‘app’
Here is a table to help you keep track of these:
The LIKE Operator can be further expanded by the use of something called Wildcards. More explained below.
2.3 LIKE Expanded (Wildcards)
We typically use the LIKE operator when we are interested in looking at text data that somewhat matches a string we are looking for. Using the LIKE operator to search for an exact string is pointless (example done above). So, to tell SQL that there is allowed to be some generic text to the left/right of my specified string, we use Wildcards.
A table is created below.
When you see the _ symbol, you are telling SQL, 1 character can be substituted for something else. When you use the % symbol, you are telling SQL, any number of (includes 0) can be substituted for something else. Here are a few scenarios to illustrate this.
Let’s pretend I wanted to search for ‘BowTied Raptor’
LIKE ‘_owTied Raptor’
LIKE ‘Bowtied Rapto_’
LIKE ‘_owTied Rapto_’
LIKE ‘%Raptor’
LIKE ‘Bowtied%’
LIKE ‘%Tied Rap%’
The last example is the one that you’ll use 99.99% of the time. You can think of this as a simple CTRL + F for finding something.
2.4 Combining Multiple Expressions
This section is super easy. This literally the exact same way as how you can combine conditional expressions in R/Python.
We have 3 options.
To make life simple, I just attached the truth tables for these 3 below:
AND:
OR:
NOT:
3 - WHERE Clause
Now that we know how to create simple expressions, now let’s talk about how we can combine this with what was learned in the previous post.
When we retrieve data from a SQL table, or we combine multiple tables together, we can use the WHERE clause in order to specify a condition. The conditions we can use are the ones above.
4 - Putting Everything Together
I am using this tool to run SQL queries.
I am using the Titanic.csv dataset. You can get it here. Every single one of the statements presented here just wrangle the data. This means the original table is un-affected by any of these.
4.1 Basic Operator
From my Titanic dataset, if I only wanted to look at those who survived, I am interested in looking at the ‘Survived’ column, where the values in that column are equal to 1.
4.2 LIKE Wildcard
Let’s say I was interested in looking at the names of the passengers. More specifically I only wanted to look at entries where the name of the passenger had the phrase ‘Eliz’. To do this, we’ll use the LIKE operator, with the %% wildcard.
Note: Another name for this is called pattern matching.
4.3 Combining Multiple Expressions
Now, Let’s combine some expressions together. Let’s say I was only interested in looking at passengers who met all of the criteria below:
Age is between 20 to 30
Name has the phrase ‘Eliz’
They Survived
This is where we would use AND multiple times.
5 - Exercises/Interview Questions
5.1 Exercises
Use the Titanic Dataset, and SQL to answer these questions to get some practice.
Give me the name of a male passenger.
Give me the name of a male passenger that DID survive the titanic.
Give me the name of a male passenger that DID survive the titanic, and is bigger than the age of 50.
Bonus Questions: Use what you learned on aggregating data from the last SQL post
How many male passengers were in the titanic?
What is the average age of the male passengers on the titanic who survived?
How many men were on the titanic who had the name ‘Thomas’, survived, and was in the 2nd class?
5.2 Interview Questions
These are the most likely interview questions you’ll be asked on this specific topic.
I have this where clause WHERE A = ‘a’ or A = ‘xyz’ or A = ‘lol’, how can I clean this?
Ans: Just use IN to get WHERE A IN (‘a’, ‘xyz’, ‘lol’). This is popular because this is the most common way to filter SQL using R/Python. Look up sqlalchemy for more detail.
What is pattern matching?
Ans: Pattern matching lets you search for pattern in your data. This is extremely useful, if you are searching for a particular word/phrase in your dataset, but you only have a rough idea of what that word is. We typically use wildcards to help us with this.
If I use name NOT LIKE ‘Thom%’ what happens?
Ans: We search for every single name, that doesn’t start with the string ‘Thom’
If I use name LIKE ‘___’ what happens?
Ans: We get every single name, which has 3 letters.
Click Here to continue to SQL 4: Grouping Data & Aggregating it. Focus will be on the GROUP BY clause.