Data Science & Machine Learning 101

Data Science & Machine Learning 101

Share this post

Data Science & Machine Learning 101
Data Science & Machine Learning 101
Deep Learning 10 - Computer Vision

Deep Learning 10 - Computer Vision

Your First computer vision model

BowTied_Raptor's avatar
BowTied_Raptor
Jun 12, 2025
∙ Paid
2

Share this post

Data Science & Machine Learning 101
Data Science & Machine Learning 101
Deep Learning 10 - Computer Vision
Share

Computer vision is the earliest success story of deep learning. Computer vision is the problem domain that led to the initial rise of deep learning between 2011 and 2015. A type of deep learning model called convolutional neural networks started getting remarkably good results on image classification competitions around that time.

Everything You Ever Wanted To Know About Computer Vision. | by Ilija  Mihajlovic | TDS Archive | Medium

1 - Intro to convnets

In this link, we made a fully connect dense neural network (Multi layered perceptron) to classify the MNIST digits. We are going to discuss how a convnet works, then, by the end of the post, we’ll build a simple convnet, that will tackle the exact same problem as our dense neural network above, and you can see the difference in performance right there.

The fundamental difference between a densely connected layer and a convolutional layer is this:

How to Develop a CNN for MNIST Handwritten Digit Classification -  MachineLearningMastery.com
  • Dense layers learn global patterns in their input feature space (ie for MNIST digit, patterns involving ALL pixels)

  • Convolutional layers learn local patterns - in the case of images, patterns found in small 2D windows of the inputs (see example below)

This key characteristic gives convents 2 interesting properties

  1. The patterns they learn are translation-invariant. After learning a certain pattern in the lower right corner of a picture, a convent can recognize it anywhere: for example in the upper left corner. This makes convents data-efficient when processing images, which means they dont need as many training samples to learn

  2. They can learn spatial hierarchies of patterns. A first convolutional layer will learn small local patterns such as edges, a second convolutional layer will learn larger patterns made of the features of the first layers, and so on….

2 - Feature maps

Convolutions operate over rank-3 tensors called feature maps, with 2 spatial axes (height, width, and depth). The depth axis is sometimes called the “channels” axis.

For an RGB image, the depth axis is 3 because it has 3 color channels:

  • Red

  • Green

  • Blue

For a black, and white pictures (like the MNIST digits data above), the depth is 1 (levels of grey). The convolution operation extracts patches from its input feature map and applies the same transformation to all of these patches, producing an output feature map.

machine learning - Feature map of convolutional-neural-network and total  number of parameters - Stack Overflow

This output feature map is still a rank

Keep reading with a 7-day free trial

Subscribe to Data Science & Machine Learning 101 to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 BowTied_Raptor
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share