Rattibha

Alex Bodner

12 Tweets 1 reads Jul 12, 2024

How do machines see? 👀🤖
Join me in this thread to learn how AIs “see” and classify images. Let's see what Convolutional Neural Networks (CNN) are🧵🧵

The core idea is to give the image to a neural network, but if we give the image to a classic NN, we would have many inputs, which would complicate the computation. In HD, we would have 1920x1080 = 2,073,600. Which would be very noisy and heavy for the model to process

Also, the operations that the network does do not make a lot of sense over the raw image. What is adding all the pixels of an image with a factor for each one? Furthermore, learning this would be very complicated due to the number of combinations there are. Then what do we do?

This is where Convolutions appear. It is a mathematical operation that in images works as a filter for something in particular. What we do is apply many of these filters consecutively. And it turns out that first edges are detected and combining them we create complex structures

How is the operation defined? A filter (matrix) is passed through the image multiplying element by element the value of the filter where it matches in the image. The sum of the multiplications is placed in the corresponding pixel in a new matrix, creating a filtered image

After doing a convolution, we usually do what is called pooling, which is to make the image smaller by taking 1 pixel per neighborhood. The criterion for choosing the remaining pixel can be to take the maximum, the neighborhood average, or the minimum.

Why pooling? Not only it greatly reduces dimensionality, making computation much easier. But it also makes the network robust against translations.

Wrapping up, what is a CNN like?
First we stack several layers of convolutions + pooling to extract what would be the most important patterns in the image. And then the output of this is passed to a Fully Connected Network, which gives us the predicted class.

For those who don't know, a FC (Fully connected) Network is a classic neural network (MLP), which will return a class for the image or whatever you want depending on the problem.
The parameters that the network learns are the convolution filters and the FC parameters.

And that's it! If you were interested, don't forget to like and RT so it can reach more people and follow me to see more content like this!
#CNNs #CNN #AI #IA #ConvolutionalNeuralNetworks

@predict_addict @ai_for_success @skalskip92 @minchoi

Also, I just made this post diving into how the Fully Connected layers work, check it out!
x.com

Loading tweet...

Loading suggestions...

Categories

More from this author

Related Threads

Popular Threads

Categories

More from this author

Related Threads

Popular Threads

Unroll Thread