• €249 or 2 monthly payments of €124.50

MASTER OPTICAL FLOW: Hardcore Deep Learning Skills for Object Tracking and Video Analysis

Unleash your Computer Vision potential and take your skills from image to video analysis.

📥 This course is now closed and will open Early 2025. Make sure to be in the waitlist to be notified and receive goodies while you wait.

Understand the Secret Algorithm used by Self-Driving Car Companies to Process Videos into 4D Perception Software

Dear Computer Vision Engineer, if you want to develop a deep understanding of the field of "Video" in Computer Vision, then this page will show you how...

Here's the story:

A few months ago, I decided to take on the ambitious task of studying the architectures of the most advanced self-driving car companies, and draw schematics of how they work.

My goal was to be closer to reality, and understand what companies were using in their architectures to drive autonomously.

So I looked at companies like Tesla, Aurora, Nvidia, Cruise, and the entire thing was all very exciting, until I came across an intriguing paper from Waymo's Research Team.

It's name?

"ViDAR"

ViDAR? What is ViDAR?

I started to look at it, and discovered an incredibly elegant architectured designed for a task they called "Deep 4D Perception".

The idea was simple: to fuse 3D Depth Maps (calculated via Stereo Vision) with Previous Time Frames to calculate not a 3D, but a 4D Output.

To do this, they were using a technique called "Optical Flow".

If you are familiar with traditional Computer Vision, you may already know what "Optical Flow" means. But if you come from the Deep Learning space, you probably don't.

The Optical Flow is the motion of a pixel through time.

It can be computed via traditional approaches, or with Deep Learning techniques. Thanks to the optical flow, we can use that motion to create FLOW MAPS, describing the movements of each pixel in time.

Ane example of a flow map, each color represent a "direction":

This meant I had just learned the 3rd and final pillar in Computer Vision: After segmentation maps, that classify each pixel of an image, depth maps, that take two stereo images, and find the depth for each pixel we have... flow maps, that take two consecutive images as input, and output a motion vector!

Thinking about it:

This meant that using Optical Flow, we could process series of images (videos). And I was curious...

"Who else was using it?"

Was Optical Flow a common approach? Was it one approach among many other approaches? Was processing videos a new trend? Or something companies were doing for a while?

I explored more.

And more.

And realized something...

Up until 2018, 2019, maybe even 2020, self-driving car companies were still processing images one by one. You get an image, you classify the objects, and you do this in a loop.

But from 2020, they all started to process videos!

Bigger than this one technique, I realized that Computer Vision was no longer the science of processing images...

It became the science of processing videos!

And this meant that an entire sub-field had been created. This sub-field was unknown, and among the tens of thousands of engineers trained on Computer Vision... only a handful of them were even aware that it existed!

Once you start digging into this field, you realize fantastic applications of Computer Vision, such as:

you can then classify this movement.

This is known as "Action Classification", and it's an entire sub-field of Video Processing. And many of these other subfields use Optical Flow, like:

  • Object and Pixel Tracking

  • 3D Flow Estimation

  • Motion Prediction

  • Velocity Estimation

  • Action Segmentation

  • Trajectory Analysis

  • Video Frame Interpolation

  • Action Classification

  • And many more...

Some are purely robotics based, but others like Action Classification can get you to work on Shoplifting Detection in retail, or football analysis in specialized startups...

When you start processing videos, you can do a lot more than just images!

The time to process images one by one is gone. We are now processing videos; and if you want to be at the cutting-edge of Computer Vision, you too are going to have to learn how to process videos.

This idea of processing videos is becoming increasingly used in Computer Vision. From autonomous vehicles, to other specialized fields like Retail (shoplifting detection), to video analysis (football games, ...) and even SLAM and 3D Reconstruction.

But you can't just input images into a CNN.

The field is different, and will require you to upskill on many aspects of Computer Vision you may not have been introduced to so far...

Things like sending several images as input to a CNN, or using 3D CNNs for Spatio-Temporal Fusion, or combining CNNs and RNNs/LSTMs together, or calculating Flow Maps, and interpreting these Flow Maps for 4D Perception...

All these techniques are very advanced, and not so much popularized in the Computer Vision world...

...until now!

Introducing…

MASTER OPTICAL FLOW: Hardcore Deep Learning Skills for Object Tracking and Video Analysis

In this online course, you’re going to learn how optical flow works, and you’ll dive into the advanced Deep Learning techniques used by researchers to process videos. 


Let’s take a look at the program.

MODULE 1

INTRODUCTION TO VIDEO ANALYSIS

In this first module, we'll begin by exploring the world of Pixel Tracking. We'll start getting familiarized with the idea of tracking pixels and features one by one, and to create "Flow Maps".

What you'll learn:

  • The 3 core techniques professionals use to track anything in a video and how to decide which one to implement in any scenario.

  • How Visual Tracking works in 3 steps (use it correctly, and you'll be able to do Visual SLAM, Sensor Fusion, and Target Tracking)

  • Movement Detection - A simple technique to detect objects from their motions without Deep Learning and without object detection algorithms.

  • Template Matching: The Algorithms used to track any bounding box using Convolutions, Histograms, or Siamese Architectures.

  • Multi-Object Tracking: My kickass algorithm to track multiple objects at the same time.

  • The Optical Flow Formula: An in-depth look at optical flow, from use-cases to advanced maths

  • A special look at the most popular optical flow algorithms (one of them has been invented in the 1980 and is still the most used today)

  • My complete guide to understand and create Flow Maps (the only place where you'll find it)

  • 💻 Tracking & Optical Flow Workshop: Use your Computer Vision skills to create 3 different optical flow algorithms

MODULE 2

OPTICAL FLOW WITH DEEP LEARNING

In the second module, we'll learn to create Neural Networks that read 2 images simultaneously and create a Flow Vector and Flow Map for every pixel.

What you'll learn:

  • FlowNet Research Review – Deep Dive in the Pioneer of Deep Optical Flow

  • The Anatomy of a Deep Flow Algorithm

  • 11 Deep Learning operations you can do to build a Flow Vector and how to decide which one to use (and not to use)

  • Why an Optical Flow output isn't a "Segmentation Map" and the common mistake made by Engineers

  • Deep Fusion - A detailed look at "correlations" and Fused Convolutions. 

  • The Research Paper Triangle – The strategy I use to read and understand research papers

  • PyTorch Playbook - Everything you need to know about PyTorch to build a FlowNet from scratch without tutorials (included: 2 advanced techniques used in the research field)


  • 💻 The FlowNet Workshop – Build and Train your own Optical Flow Algorithm with PyTorch and Self-Driving Car Data

MODULE 3

CUTTING-EDGE OPTICAL FLOW APPLICATIONS

In the final module, we'll dive into the Cutting-Edge Optical Flow Algorithms and see some applications of Optical Flow.

What you'll learn:

  • An In-Depth Look at the RAFT Algorithm: Read and Understand the most advanced research paper on Deep Optical Flow

  • The Little-Known Encoder-Iterator Architectures used nowadays

  • Advanced Fusion - How 4D Correlations work to fuse multiple streams of convolutions

  • RNNs, LSTMs, GRUs: Going back to the fundamentals of Recurrent Neural Networks –– Because Computer Vision people hate it, we'll make it simple!

  • My technique to read a research paper from top to bottom

  • The overview of the action recognition and object tracking fields

  • The word-for-word transcript of my discussion with an Optical Flow Expert, and all the questions I asked him when preparing this course (and a little-known communication "hack" you can use to get the help of an expert on any topic)

  • A Complete MindMap of the Optical Flow techniques and object tracking field

  • Focus of Expansion: An unknown technique to estimate a trajectory and do lateral control from a video alone

  • Structure From Motion: How to create 3D Scenes from a video (Bonus: a draft of my Visual SLAM workshop that uses Feature Tracking to build 3D Maps)

  • Action Recognition: The 5 Architectures used in the field, and how to use it too!


  • 💻 Motion Estimation Workshop: Use a cutting-edge Optical Flow Algorithm, and fuse this with object detection to build a vehicle motion estimator.

SPECIAL BONUS

Video Transformers Live Event Recordings

You don't have to stick to CNNs and prediction tasks. Today's most advanced architectures for Action Classification are covered inside this 90' workshop recordings, along with a mini-project where you'll implement your first video transformer!

A few of the things inside:

  • A complete introduction to Transformer Networks for Computer Vision "brain" (most examples are made on text processing and are completely impossible to get for Computer Vision Engineers, so we built Computer Visione examples)

  • 6 Architectures to process videos, and a deep dive into the state of the art architecture for action classification.

  • A thorough analysis of the code of a Video Transformer Network, and how to code Multi-Head Self-Attention mechanisms.

  • and many many more...

FAQs

What is the difference between this course and MASTER OBSTACLE TRACKING?

Although these two courses both fix the idea of tracking, they are completely different.

  • In MASTER OBSTACLE TRACKING, you'll learn how to track bounding boxes through time, you'll therefore work at the "object level".

  • In Optical Flow, you'll build flow maps in which you'll be able to track not objects, but "pixels".

The difference is similar to the difference between object detection and image segmentation, one is higher level than the other.

Now, while it's great to process bounding boxes, I think it's even better to process pixels, and especially when you learn to use Deep Learning to do this, because you're now longer processing frames one by one, you are processing them together in a Deep Learning model.

How long is the course?

Depending on your level, you'll need between 8 and 15 hours to complete the projects. I also included bonuses and going further projects that you can explore once you'll have the foundations.

What are the prerequisites, and where does it lead?

This course is the last and most advanced of the Computer Vision Journey, I would therefore recommend not enrolling if:

❌ You have never trained a Convolutional Neural Network before or don't know how Deep Learning works
❌ You can't code in Python and have never done a tutorial with a notebook before
❌ You have never heard the word "Taylor Series" before (we won't do that, but it indicates your Maths level)
❌ You are not familiar with OpenCV and couldn't look for tutorials.

This course is for you if:
✅ You already know the prerequisites, but still find it hard to reach a higher level.
✅ You're ready to transition from image to video, and are ready to apply the techniques taught
✅ Nice to Have – You have already worked with segmentation architectures and autoencoders.
✅ You're looking for a high-level understanding of the field, and not just how-to tutorials
✅ You're emotionally stable and patient, enough to go through the ups and downs of training a cutting-edge model

How does this fit in Computer Vision?

You can see Optical Flow as the 3rd Pillar of Computer Vision. The first pillar is image segmentation with Deep Learning, the second pillar is Depth Estimation with 3D, and the third is Optical Flow and videos.

Looking at it this way, you realize that it means this course isn't just about "Video", it's not really about working on a sub-niche of Computer Vision. In this course, we'll dive into methodologies to understand research papers, we'll see techniques to combine CNNs and RNNs, and we'll understand how to implement advanced Computer Vision architectures.

This course isn't just about video, it's about mastering an advanced pillar of Computer Vision that very few engineers get.

"My general thought is that this course is great, it has no other competitor on internet. It's very good that it's small and doesn't need weeks to finish, and I think I wish to have even more details on this now.

Finally, your action classification lesson, oh it's really very good, thank you very much. I urge you to do a similar lesson in all computer vision and LiDAR courses."

Abdelrahman Ahmad, Computer Vision Researcher Engineer

Optical Flow Edgeneer

"I though this was a great overview of Deep Optical Flow. I would highly recommend this course to anyone who has a grasp of classic optical flow estimation and deep learning."

Isaas Berrios, Electrical Engineer at Boeing

Optical Flow Edgeneer

Aman Vyas, Master's Student in robotics and autonomous system in university of Turki

"The knowledge I acquired was instrumental in creating a deep VO model based on the FlowNet architecture for my thesis."

"As a result of buying the "MASTER OPTICAL FLOW" course, I gained a deep understanding of optical flow algorithms and their applications in video analysis.

The specific feature I liked most was its comprehensive coverage of optical flow algorithms and their practical applications. The knowledge I acquired was instrumental in creating a deep VO model based on the FlowNet architecture for my thesis.

3 other benefits of the course were the hands-on projects that reinforced learning, the clear explanations of complex concepts, and the access to a supportive community for discussion and troubleshooting. These aspects greatly enhanced my understanding and application of deep learning techniques in visual odometry.

I would recommend this course because it provides comprehensive, hands-on learning that significantly enhances understanding and practical application of deep learning techniques in Optical flow."

Are your skills devaluating?

One thing to realize is that hard-skills are vital to Computer Vision and Robotics Engineers. Without getting technically better, we're just staying  with skills that devalue.

Here's a short personal story I lived that can demonstrate this:


A few years ago, as a Computer Vision Engineer (and a decent one), I had an interview for a startup doing shop-lifting detection. As soon as I got on the phone with them, I got really impressed. It was all pure action recognition and video analysis. 3D CNNs, Optical Flow algorithms, and more...

An example of what they do:

When you work on Videos, you can suddenly do extremely advanced things like Shoplifting Detection.

Image Credits: Veesion

Incredible opportunity, incredible company, incredible salary...

But also, incredible requirements. And I had no idea how to get them. I mean, how to classify a scene? How to solve this particular business problem? I had no clue.

At the time, there was no material for this, and of course, I failed my interview. My robotics skills weren't helpful for this task. Although I was perfecting my existing skills, I never thought about "adding" more skills in Computer Vision.

You can build an exciting career in Computer Vision and autonomous tech. You can master the algorithms Tesla and Waymo use. You can work on advanced and specialized startups like the ones doing Shoplifting detection and football analysis. And the best part is, if you choose to go there, you won't have a lot of competition.

This skill is rare, unknown, and advanced.

When you look at Computer Vision Engineer, most of them will gladly start at Level 1 and classify images, they will learn some PyTorch, and go to Level 2 with Deep Learning, CNNs, object detection, image segmentation, etc...

But how many of them will go to Level 3? or 4? How many will dare to build the skills that will take them to specialized companies?

It wasn't really possible back then, I believe it is now.

Select your Optical Flow Adventure

How far do you want to master Video Computer Vision

📥 This course is now closed and will open Early 2025. Make sure to be in the waitlist to be notified and receive goodies while you wait.

REGULAR

Optical Flow

✅ MASTER OPTICAL FLOW (249€)

249€

PREMIUM

Video Transformers Edition

✅ MASTER OPTICAL FLOW (249€)

✅ VIDEO TRANSFORMERS WORKSHOP RECORDINGS (valued 89€)

257€