Unlocking the Power of AI Vision: Exploring the YOLO Algorithm’s Revolutionary Capabilities

Naveed Hyder
6 min readApr 21, 2023

--

The YOLO (You Only Look Once) algorithm was developed by Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi, who were researchers at the University of Washington and the Allen Institute for Artificial Intelligence at the time of its creation. The first version of YOLO was published in 2015 in a paper titled “You Only Look Once: Unified, Real-Time Object Detection”.

It is a popular object detection algorithm that is widely used in computer vision applications. The basic idea of YOLO is to divide an input image into a grid of cells and, for each cell, predict a set of bounding boxes and the class probabilities of the objects that are present in the cell.

The YOLO algorithm works by using a deep neural network to predict the bounding boxes and class probabilities for each cell in the input image. The network is trained on a large dataset of labelled images, where each image is annotated with the bounding boxes and class labels of the objects that are present in the image.

The YOLO algorithm was designed to address some of the limitations of traditional object detection methods, such as the need for multiple passes over an image and the reliance on region proposals or sliding windows. YOLO uses a single CNN to predict the bounding boxes and class probabilities for all objects in an image in a single pass, making it faster and more accurate than traditional methods.

To train the YOLO algorithm, the researchers used a large dataset of images with annotated bounding boxes and class labels. They fine-tuned the GoogLeNet architecture to detect objects in the images and optimise its performance using a loss function that penalises errors in both object localization and classification.

The YOLO algorithm has since undergone several improvements and versions, including YOLOv2, YOLOv3, and YOLOv4, which incorporate advancements in deep learning techniques and architectures, such as skip connections, residual blocks, and feature pyramid networks. These improvements have further improved the accuracy and speed of the YOLO algorithm, making it one of the most popular object detection algorithms in computer vision research and applications.

During training, the network learns to predict the bounding boxes and class probabilities (class probabilities refer to the likelihood that an object detected by the YOLO algorithm belongs to a particular class or category) by minimising a loss function that measures the difference between the predicted and ground-truth values.

The loss function used in YOLO training is typically a combination of two components: classification loss and regression loss.

The classification loss measures the difference between the predicted and true class probabilities for each object. This loss penalises the network for predicting incorrect class probabilities and encourages it to predict high probabilities for the correct class.

The regression loss measures the difference between each object's predicted bounding unding box coordinates. This loss penalises the network for predicting inaccurate bounding boxes and encourages it to predict bounding boxes that closely match the ground-truth bounding boxes.

In the inference phase, the network takes an input image and processes it through the network to predict the bounding boxes and class probabilities for each cell. The predicted bounding boxes are then post-processed to eliminate overlapping boxes and select the boxes with the highest confidence scores.

Here is a snapshot of a project that i made which counts the number of cars moving in a Lane using the Yolo Algorithm

Why is the YOLO algorithm relevant?

The YOLO algorithm is relevant because it provides an efficient and accurate way to detect objects in images and videos, which is a fundamental task in computer vision. Object detection has many practical applications, such as autonomous driving, surveillance, robotics, and image retrieval.

Traditional object detection methods rely on sliding windows or region proposals, which can be computationally expensive and slow. YOLO, on the other hand, performs all the predictions for an image in a single pass through the neural network, making it much faster than traditional methods. This real-time performance makes it ideal for applications that require fast and accurate object detection, such as self-driving cars or real-time video analysis.

Moreover, YOLO can detect objects of different sizes and shapes and can handle complex scenes with multiple objects and occlusions. This makes it a robust and versatile algorithm that can be used in a wide range of scenarios.

The basis on which the probability is predicted

The probability predicted by the YOLO algorithm is based on a combination of two factors: objectness score and class probabilities.

Objectness score

When the YOLO algorithm is trying to find objects in a picture, it looks at different boxes in the picture to see if they contain an object. The algorithm gives each box a score to show how likely it is that there is an object inside the box.

The score is called the “objectness score”, and it is calculated by checking two things:

First, it checks how likely it is that there is an object in the box.

Second, it checks how well the box fits the actual shape of the object in the picture.

If the box is a good fit and has a high probability of having an object inside, it gets a high objectness score. This score helps the algorithm figure out which boxes are more likely to contain objects so it can focus on those boxes and find the objects faster.

Class probabilities.

The class probabilities represent the probability that the object in the bounding box belongs to a specific class. For example, if the YOLO algorithm is trained to detect cars, people, and bicycles, the class probabilities will represent the likelihood that the object in the bounding box is a car, a person, or a bicycle.

During inference, the YOLO algorithm predicts the objectness score and class probabilities for each bounding box and selects the bounding box with the highest objectness score as the final prediction. If multiple bounding boxes have high objectness scores, non-maximum suppression ( NMS: is a post-processing step that is used to reduce the number of redundant bounding box predictions.)is applied to eliminate redundant boxes and select the boxes with the highest combined objectness score and class probability.

The YOLO (You Only Look Once) algorithm predicts the probability that a detected object is accurately the object based on several factors. These factors include:

Object Localization: The YOLO algorithm predicts the coordinates of the bounding box that encloses the detected object. The accuracy of this prediction is a key factor in determining the probability that the detected object is accurately the object.

Object Classification: The YOLO algorithm predicts the class label of the detected object based on the features extracted from the image. The accuracy of this prediction is another factor in determining the probability that the detected object is accurately the object.

Non-maximum Suppression: The YOLO algorithm applies non-maximum suppression to eliminate overlapping bounding boxes and select the boxes with the highest confidence scores. This step helps to ensure that only the most accurate bounding boxes are retained and reduces the likelihood of false positives.

Training Data: The YOLO algorithm is trained using a large dataset of annotated images that include bounding boxes and class labels. The accuracy of the training data is another factor in determining the probability that the detected object is accurately the object.

All of these factors are used together to calculate the probability that the detected object is accurately the object. The YOLO algorithm assigns a confidence score to each detected object based on the intersection over union (IoU) between the predicted bounding box and the ground-truth bounding box, as well as the predicted class probability. The confidence score represents the probability that the detected object is accurately the object. If the confidence score is below a certain threshold, the YOLO algorithm will discard the detection as a false positive.

In your example, if the YOLO algorithm detects a person and shows a value of 0.75 in the rectangle around the person, it means that the algorithm is 75% confident that the object within the bounding box is accurately classified as a person, based on the features it has extracted and the patterns it has learned from the training data.

--

--

Naveed Hyder

I write about entrepreneurship, innovation, and the power of ideas.