

Discover more from Data Science & Machine Learning 101
SQL 12 - Current Meta For Pulling SQL Data
What is Django, Setting Up Django, A Basic Django Query, SQL To Django Examples, Open/Close A Connection
This is the final post on SQL. This post will give you a summary on what the current meta (State of the Art) is, for pulling data from a SQL database.
Most Machine Learning Engineers, who work at high tech companies use Django (Python Library) to pull data. Django handles a lot of the annoying headache work that we had to do in SQL 11 (loading data from a SQL server). As of right now, no counterpart for Django exists in R… yet. But, given how copycat both R & Python are of each other, it’s only a matter of time.
Most likely, this method for pulling data from a SQL server will start becoming the norm soon. When the MLEs in high tech leave, to go work somewhere elsewhere. They will bring their tools & methods with them.
Table of Contents:
What is Django
Setting Up Django
A Basic Django Query
Complex SQL To Django Examples
Open/Close A Connection
1 - What is Django?
Django is a web framework for building web applications that is written in Python. It follows the Model-View-Template (MVT) architectural pattern.
Here is a video on MTV below:
Some key features of Django include:
Object-Relational Mapping (ORM): It allows devs to interact with databases using Python objects and queries instead of writing raw SQL.
Authentication and Authorization: It provides a built-in authentication and authorization system.
Admin Interface: Django includes a ready-to-use admin interface that can be customized to manage the data models of a web application.
And a bunch of web dev stuff, that no Machine Learning Engineer cares about…
If you’ll be building some sort of tool/web page/dashboard, you’ll find the Admin Interface very handy.
If you are going to be building an API, you’ll find the Authentication very handy.
However, this is a SQL post, what we care about the most is the ORM. More specifically, we care about the models that Django will provide us.
2 - Setting Up Django
You should annoy the Data Engineer (BowTiedCelt) to setup the infrastructure, as this is their responsibility.
But, here’s the premise:
2.1 Config Database Settings
The Data Engineer will create a new Django project. Then, they will configure the database settings in the settings.py file. Django supports various database management systems, such as PostgreSQL, MySQL, SQLite, and Oracle.
The Data Engineer will provide the file which has the connection information (Remember the last post):
Host
User
Password
Port
2.2 Django Models
In your Django app, you will create what are called Django models (Python class). These models (class) will represent the tables in your SQL database. The models are a mapping between your Python object, and the SQL database. This allows you to interact with the data using Python code instead of raw SQL queries. Django's ORM will handle the database interaction.
from django.db import models
class MyTable(models.Model): #Makes a Table called MyTable
col1 = models.CharField(max_length=100) #col1 set as a CHAR
col2 = models.IntegerField() #col2 set as an INT
2.3 Migration Files
After defining the models, your Data Engineer will create and run migrations files. These will sync the database schema with the defined models. Anytime you make an underlying change to the structure of the tables, you can make them here. Then you will re-run the migration files.
Django will keep track of all the migration files which have been run so far, and will keep your models Synced with the SQL tables.
In your cmd/terminal you can type something like this:
python3 manage.py makemigrations
python3 manage.py migrate
3 - A Basic Django Query
Here are a few examples of raw SQL queries, and their Django Python counterpart.
3.1 SELECT *
SQL:
SELECT *
FROM my_table;
Django:
We will use the .all() to basically run a SELECT *.
all_data = my_table.objects.all()
3.2 SELECT
SQL:
SELECT my_col1, my_col2
FROM my_table;
Django:
We will use the .values_list() to basically run a SELECT.
all_data = my_table.objects.values_list(['my_col1', 'my_col2'] flat=True)
3.3 WHERE
SQL:
SELECT *
FROM my_table
WHERE my_col = 3;
Django:
We will use the .filter() to run a WHERE clause.
all_data = my_table.objects.filter(my_col=3)
WHERE Greater than example:
WHERE my_col > 3;
Django:
We will use the __gt_ to use the greater than filter.
all_data = my_table.objects.filter(my_col__gt=3)
3.4 INSERT
SQL:
INSERT INTO my_table (col1, col2)
VALUES (val1, val2);
Django:
We will use the .bulk_create() to insert a list of rows.
list_of_rows = [my_table(col1=val1, col2=val2), my_table(col1=val3, col2=val4)]
my_table.objects.bulk_create(list_of_rows)
The first line here makes a list of all of the rows we will want to insert. The second line does the insertion for us.
Click here to see more of Django’s ORM documentation. Every single SQL query can be written using Django syntax.
4 - Complex SQL To Django Examples.
I highly recommend clicking on this link so you have the Django QuerySet documentation while you are looking at these.
4.1 - Join, Filter, and order
SQL:
SELECT a.*, b.*
FROM table1 a
JOIN table2 b
ON a.table2_id = b.id
WHERE a.col1 = 'some_value'
ORDER BY b.col2;
Django:
all_data=table1.objects.select_related('table2').filter(col1='some_value').order_by('table2__col2')
The .select_related is letting us do the merge. It automatically knows the merge will happy on the id column. We are using the .filter to tell it to filter on ‘some_value’ on the column called col1. The .order_by is basically the SQL’s ORDER BY.
4.2 - Group by, count & average
SQL:
SELECT col1, COUNT(col1), AVG(col2)
FROM my_table
GROUP BY col1;
Django:
all_table = my_table.objects.values('col1').annotate(count=Count('col1'), average=Avg('col2'))
Using the .values(‘col1’), we are able to tell Django to get me all of the contents of col1. Then, using .annotate(), I’m able to give it a list of functions it should use with the group by clause. You’ll notice that the Django code doesn’t have a GROUP BY reference in there.
This is because when we say .values(‘col1’), and then use annotate later on, Django automatically assumes it should use the GROUP BY clause on ‘col1’. Hence it creates the SQL query dynamically for you. I highly encourage you to re-read the contents a few times over and over, and watch the video playlist below to get some extra practice on this one. You won’t understanding everything in day 1.
If you want to do some more practice, you can watch the video below:
5 - Open/Close A Connection
If you’ve read the last SQL post, one thing we spoke about was opening/closing a connection. Django manages database connections automatically for you. The parameters will be on the settings in the project's settings.py file (made by the Data Engineer). settings.py file.
When you use Django pull data, it will open a database connection if needed. It will then continue to re-use this connection. Once everything is done, it will automatically close the connection, or re-use it to pull more data (if the connection is still available).
Basically, when you use Django to pull data, you can just close your eyes, it’ll handle everything for you. If there are some scenarios where things are going haywire… Then it’s your Data Engineer’s problem to fix this issue, not yours.
SQL 12 - Current Meta For Pulling SQL Data
Surprised to see Django presented as state of the art... Especially for ML applications, I'd go with Streamlit, Gradio, or Dash. Also, Django has issues when used behind reverse proxies, which is often a requirement for production grade apps.