Everyone says to get into AI you just need to learn computer science, maths, and stats. But which parts exactly? How? My goals for this guide are to help you better understand what you need learn:
- Show you that this a topic worth learning deeply.
- Give you a guide and the tools for one way to learn how AI works.
Why this, why now, why care?
I believe that we are in a unique time in human history.
The Age of AI has begun. Artificial intelligence is as revolutionary as mobile phones and the Internet.
Mo Gawdat, a former exec at Google, said that AI is "bigger than climate change" and that people should hold off on having kids right now because of it.
On a more positive note, Marc Andreessen recently proclaimed:
I am here to bring the good news: AI will not destroy the world, and in fact may save it.
A consortium of politicians, AI pioneers, and captains of industry banged the doom gong this year to pause AI experiments.
We are in a similar moment to when they were deciding if nuclear bombs could kill us all and ignite the atmosphere, or end the war. No matter where you are on the spectrum (doomer / accelerationist) — now is the time to learn.
Some have said we are in a cambrian explosion in technological progress. I'd say we are in the Pokemon Gold/Silver era of AI (when the Pokemon still made sense).
Are you going to miss this moment?
Step 1 in learning this stuff is developing the motivation.
For me, it was reading Bill Gate's article on why the new generative AI advancements are a genuine breakthrough. Feeling inspired, I then opened up ChatGPT and had my first real conversation with the technology. That conversation made me understand the world a little bit more and also, strangely, I found myself bit less lonely in it.
More practically, I work in tech and I believe our jobs will be the first to change dramatically. To repeat the cliche: Engineers/office dogs/PM's etc.. won't be replaced by ChatGPT, they will be replaced by Engineers/office dogs/PM's that use ChatGPT.
So put your lizard brain to work and start thinking about existential risks! This tech will likely bring about a lot of change to all of our lives. So now is the time to be catching the wave, and not being rolled by it.
A note on the format before we begin
This is a story about how I started the climb up the AI learning curve and how and why you should too.
- It is not an article on "how AI works", it is an article on one way to deeply learn how AI works.
- It is not a high-level listicle giving an overview of all AI technology, it is a fundamentals-first approach to learning how neural networks and deep learning works.
The intention was that after understanding the fundamentals then you can pursue different frameworks and directions (like Generative AI).
- Learning how AI "brains" work
- Learning how AI can know things
- Learning how AI learns
- Learning how AI can improve
The learning curve is steep but you have done the hard part:
To get up that curve there's a few things to learn which we will cover.
Come on Sisyphis! Let's start rolling that boulder up the curve.
Learning how AI "brains" work
The world's best tutor
This might be a complex topic, but luckily you have access to the world's best tutor, ChatGPT! You should seriously consider paying for GPT-4, it's worlds apart from the free version.
I think people's resistance to using ChatGPT either comes from an ignorance of it's potential, or an arrogance that a piece of software could know more than them. I honestly could not have gotten through this topic without ChatGPT helping explain concepts differently, grading my exercises and suggesting improvements, and providing constant encouragement with endless patience.
Raising money for foundational models in AI is so hot right now — of similar impact is the effort to create foundational memes in AI. One such meme is the Subway Prompt Artist.
If you don't get the joke then here is ChatGPT explaining it for you.
When I was searching for a foothold into this topic a lot of the guides I came across race up the learning curve like you're not a noob. They present a "draw an owl in two steps" process for learning AI.
- Go to the Neural Networks and Deep Learning textbook.
- Understand the whole book and become an AI Researcher.
The guide from here provides more steps in-between and introduces a few helpful resources in working through Michael Nielsen's incredible, free textbook Neural Networks and Deep Learning.
You could dive into the textbook straight away or you could watch 3Blue1Brown's Youtube series which provide a visual explanation of the first chapters. I found myself alternating between getting stuck in the book and rewatching 3Blue1Brown's videos.
Linear algebra, the mathematics of data
Pretty quickly you will come across a lot of algebraic syntax. This may seem daunting if you have not been exposed to it before. But remember you have ChatGPT to explain the equations in any depth you need, as many times you need. I had not seen maths like this since high school but the ChatGPT method worked for me.
In this learning journey, Grant Sanderson (from 3Blue1Brown) is like a NPC quest-giver that will show up many times, right when you need him. His series on linear algebra was the perfect introduction to the syntax involved and for understanding the vector transformations used in deep learning. (This textbook chapter was also quite approachable and helpful.)
Neural network architecture
After you get comfortable with the maths involved you will learn about how neural networks are constructed and their beautiful inspiration from how our own meat-based actual neural networks work.
Our first network in the textbook has three layers (an input layer, a hidden layer, and an output layer) and weights and biases between each neuron for a total of 11,935 parameters (ChatGPT explaining the calculation). The human eye and visual system is kinda like a network of neurons. An eye can detect digits, so with the basic architecture above how can our network detect digits as well?
The design of how the network works (the hyper-parameters), and the settings of the network (the parameters) form a computer with a function with many variables (11,935 in the first example).
This large amount of flexibility allows AI to learn how to do amazing creative things like detect digits or have car accidents. They can go from a random position of these parameters to learning the best settings to achieve their goal.
This involves learning with a process called gradient descent which we will get to soon. What's important now is being able to think in higher dimensional space so we can intuitively understand where these models are operating.
Learning how AI knows things
Thinking in higher-dimensions
Before I got my head around how things work beyond three dimensions, whenever I encountered them I felt like Otto from the Simpson's "woah dude higher orders of mathematical dimensions man."
It's not out-of-this-world, it's just different. This math overflow thread unlocked an understanding for me that I'll now attempt to share.
To begin with:
Think about a 1-dimensional number line and everywhere you could place a point.
Think about a 2-dimensional grid and all of the lines you could draw with two variables.
Think about a 3-dimensional space and all of the shapes you could create with three variables.
Think about a 4-dimensional timeline with all of the three-pointed shapes you could create over time.
Think about a 5-dimensional... or maybe time is a flat circle, with the passing of time being an illusion caused by the changing frames in a deterministic universe...
The point I am making is that is hard to go beyond our familiarity with 3-dimensional space, and so then how do you even make the leap to many-dimensional space? In the case of our simple network, to 11,935-dimensional space! And with more complex models they have many billions of parameters!
One crutch/mental model that worked for me is thinking about image resolutions. A one megapixel image is a space with one million dimensions (or points of freedom) that can be moved through by changing the pixels in the image.
In our simple model there are 11,935 dimensions which can be roughly thought of as 110x110 pixel image (not to be confused with the input image). Every image you could possibly think of can pretty much fit in there.
The point is (in case the image above didn't make it 100% clear), is this is a very large realm of possibility.
Neural Networks as high-dimensional computers
Our neural networks are not static images, they are commputers (which operate in a very large realm of possibility). Through learning they have found a configuration (a multi-dimensional image) where they can compute as designed.
Neural networks can literally compute any function. Our network 'knows' a 3 is a 3 because it can take the input (the density of pixels in a digit), then process the input in relation to the network's function (i.e. using each of the 11,935 parameters set to detect digits), to give an output (classifying the digit).
The knowing part is not necessarily the intelligent part. How it learned to best navigate it's "shape" in higher-dimensional space is where the interesting maths happens.
Knowing enough python
Now the exciting part. Running the network on your own computer! For this next part of the learning curve you'll need to pick up some Python.
You don't need to become a data scientist to run the network (but don't let me stop you!). You just need to know enough about the syntax to read the code in the textbook. It's also worth reading the 'absolute basics for beginners' docs on NumPy.
There are so many incredible free resources to learn Python. If you already have coding skills in another language, then these two short courses on Kaggle are a great option.
Kaggle gives you an editor which can grade your responses. Of course, you have your pal ChatGPT to give you feedback on your answers, motivation, and coach you on how to improve. ChatGPT is like a pair programmer that doesn't sit next to you breathing through their mouth.
With ChatGPT's help, some Python, and following along in the textbook, you can run the neural network on your computer.
Then all of a sudden... the sand is thinking.
I felt a real sense of wonder when I ran this for the first time. I was training a neural network. A handcrafted, old-fashioned AI. I think at this point you're allowed to order a badge for yourself that says "neural network trainer".
Learning how AI can learn
Getting your network up and running to detect digits didn't require you to understand how AI learns. Now's the time to learn!
From the code you will see that your network is initialised in a random state and via 10,000's of training examples learns how to detect digits.
For you to learn how neural networks can learn you need to develop:
- An intuitive understanding of the mathematics of change, calculus.
- A understanding of how calculus can be used to find the gradient of a shape in multi-dimensional space (where the shape represents the "cost" of the network's accuracy).
- Then comprehend how the parameters need to adjust to move "down" this slope to a more accurate position.
Speaking of moving around curves, congratulations on your own scholastic gradient ascent. (You will get the pun at the end of this section).
Time to return to our NPC quest-giver 3Blue1Brown and their videos on how neural networks learn. (I rewatched these videos a couple of times at this stage).
Calclus, the mathematics of change
Unless you're not a noob, you probably now need Grant's helpful series on calculus. How far you go into that series and further into his series on multivariate calculus is up to you. A basic understanding will suffice.
Rolling down the learning canyon
When it comes to how your network learns, you will come across the analogy of a ball rolling down a canyon shaped "cost function" as the network improves (and the "cost" decreases).
But how does a ball roll down a multi-dimensional canyon?
Firstly, invoking your mental model of multi-dimensional space could help in grasping the mechanisms in which the network can learn. There is a position in the large realm of possibility to move to (a local minima in the canyon) that can create a computer that can achieve our goal (detect digits).
Learning is all about moving in this multi-dimensional space, along the shape of the "cost function", to a better location for our goal. To do this you need to factor in how training examples, and their distance from the network's answer to the correct answer, can nudge all parameters in a better direction (with a better combination of parameters). This technique of the parameters taking random-sized steps down in the canyon is called "stochastic gradient descent" and reveals another foundational meme: the stochastic parrot.
But which way is "down" in this space? This is where the backpropagation algorithm, the "workhorse of learning in neural networks" comes in. Each of these parameters has a gradient which can slope down to a better location.
The algorithm churns through the calculus, over and over, with each round of training, helping nudge the model in hopefully the right direction. The model in the textbook takes a few minutes to train on your laptop, while the most advanced models can take months to train.
How neural networks learn is complex. But it can be understood.
With this understanding you no longer see AI as "magic" or a black-box. You can lift up the hood and see how it is built, how to train it, and how it learns.
I pursued this journey to get these moments of wonder and enlightenment that I would have missed out on if I was just importing big libraries like Pytorch to run the neworks for me.
To check your understanding, try keeping up with Andrej Karpathy's video on backpropagation. It's like watching those viral videos where some dude builds a house out of mud in the jungle. Karpathy builds at twice the speed of human thought.
Learning how AI can be improved
After all of this hard work you and your model emerge in 1998. However, with the fundamentals learned it becomes a fun exercise to improve them with more modern techniques.
At this stage you come to appreciate that there is an art and science to building and improving AI. The learning algorithms take care of the parameters within the neural network. But you control the hyper-parameters, all the parts of the network's design:
- how many layers deep the network is,
- how many neurons are in the layers,
- how fast it should learn,
- and techniques to prevent overfitting.
Why is it an art? I think it is because there isn't a definitive understanding of how these systems need to be structured. You learn heuristics and crutches, but not rules.
It's interesting that it not a deterministic science. It's not a problem that you can just throw time, money, compute, and dyson spheres at. What is your contribution going to be?
Where to next
That's up to you. My hope was that this article inspired you to learn this topic deeply and that by sharing one way to get into the topic it sped up your own learning and made it a shared journey.
If you think "ackchyually you made a noob mistake" then that's likely and you can publicly point out my mistake on twitter.
Developing existential motivation to learn AI
Learning how neural networks and deep learning works
- 3Blue1Brown - Neural Network series for the introduction.
- Michael Nielsen - Neural Networks and Deep Learning for the long(ish) road to deeply (starting to) understanding the topic.
- Andrej Karpathy on Backpropagation to check your understanding.
- Christoper Olah on Neural Networks for another description of neural networks that is visual and approachable.
How amazing is it that the leaders in the topic have spent so much time explaining the concepts for noobs. Thank you.
And no thanks to the AI doomers who made me want to drill through my GPU mid-training cycle.