If you think about it, you must have spent a lot of valuable time looking for the room keys in your messy room. This is something that happens to everyone and is among the most frustrating experiences. But today, you could use computer algorithms to solve this kind of problems. This is the true power of the object detection algorithms. Though this is not what object detection algorithms are designed to do, they can be employed for round-the-clock surveillance and real-time vehicle detection in the smart cities. These are powerful deep learning algorithms.
With recent advancement in the computer vision models on deep learning, the object detection applications are much easier to develop than it had been ever before. TensorFlow’s Object Detection API is an open source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models. The techniques have also been leveraging massive image datasets to reduce the need for the large datasets besides the significant performance improvements. Moreover, the current approaches focus more on the end-to-end pipelines and this has led to significant improvements in performances and has enabled real-time use cases.
To gain in-depth knowledge and be on par with practical experience, then explore Our TensorFlow Training course.
let’s have a look at the following concepts of this Object Detection Tutorial using TensorFlow
Object detection is a computer technology that is related to image processing and computer vision. The technology deals with detecting the instances of the semantic objects of different classes like building, human beings, cars, and others in videos and digital images. Some of the domains of object detection that have gone through proper research are pedestrian detection and face detection. There are numerous applications of object detection in areas like image retrieval, computer vision, and video surveillance.
Some of the major applications of object detection are related to computer vision and include face recognition, video object co-segmentation, etc. It is used in instances like tracking objects, tracking a person in a video, tracking the movement of a cricket bat, and many more.
People often confuse image classification with object detection. When the main aim is to classify the image into a certain category, image classification is used. On the other hand, to identify the location of the objects in an image or count the number of instances of an object, object detection is to be used. Labelled data is needed in order to train a custom model. The labelled data in the context of object detection are images that have corresponding labels and bounding box coordinates.
In a typical object detection algorithm, an image is sent to the network, which is then sent through lots of convolutions and pooling layers. The output would be an object of the class. For each input image, there is a corresponding class as output. After taking the image as an input, the image is divided into various regions.
Each of these regions is considered a separate image. The regions are then passed to the Convolution Neural Networks (CNN) to classify them into various classes. Once each of the regions has been divided into corresponding classes, all the regions are combined to get the original image with the detected objects.
However, there are some problems with such trivial algorithms as the images might have different aspect ratios and spatial locations. These factors could lead to a large number of regions and the computational time would increase.
Frequently Asked TensorFlow Interview Question & Answers
Object Detection has a lot of real-life applications and can be used in different scenarios. New algorithms and models keep on outperforming the previous ones and object detection is one of the areas of computer vision which is maturing very rapidly. Below here are its applications.
For instance, a group of researchers at Facebook had developed the DeepFace, which is a facial recognition system based on deep learning. Google also has its own facial recognition system which can automatically segregate the photos based on the person in the images.
Object Detection is one of the computer technologies that is connected to image processing and computer vision. It detects the instances of an object like building, human faces, cars, trees, and others. The primary job of face detection is to ensure whether there is any face in the image. face detection is the first and most essential step and it detects the faces in images. It is used in areas like security, law enforcement, biometrics, personal safety, and entertainment.
Faces can be detected in real-time and it helps to track persons or objects. The face detection methods can be appearance-based, feature-based, knowledge-based, or template matching.
Another important use of object detection is people counting. It can be used for analyzing store performance or recording crowd statistics during festivals or other activities. However, it can be difficult at times as people move out of the frames very quickly.
Off-the-shelf people counters are not very expensive but the data generated by them is tied to proprietary systems that limit the options for data extraction and KPI optimization. An embedded DIP using your own camera and SBC would save time and money and offer the freedom to tailor the application to the KPIs you need. Insights can be extracted from the cloud that would not be possible in other cases.
The overall functionality for your DIP IoT application can be enhanced using the cloud. The visualization, alerting, reporting offer increased capabilities and so do the cross-referencing outside data sources.
Object Detection is often used in industrial processes to identify products. Using visual inspection to find a specific object is a basic task and it is involved in various industrial processes. This includes inventory management, sorting, quality management, machining, and packaging. Inventory management is sometimes quite tricky as it could be hard to track items in real-time. Localization and automatic object counting allow improving inventory accuracy.
Several challenges need to be taken into account while object detection is being performed. The objects come in different sizes, shapes, colors, and orientation. There is additional noise which occurs through variation in illumination, viewpoint, shadows, and occlusions. Ensuring the desired accuracy is important without arranging too many training examples.
Self-driving cars are something evident in the future. However, the working is very tricky as a lot of different techniques are required to perceive the surroundings like laser light, GPS, radar, computer vision, and odometry. Sensory information is interpreted to identify appropriate navigation paths and obstructions with the help of advanced control systems. When a sign of a living being is found in the path, the car automatically stops. The process is very fast and is a huge step towards Self-Driving cars.
Self-driving cars are being designed with the intention to save lives. A lot of people are involved in road accidents every year. Autonomous vehicles allow accurate and safer transportation and needless death tools are lowered. Object detection is performed in two steps - image classification and image localization. Image classification determines what the objects look like and image localization provides the specific location of the objects.
A very important role is played by Object Detection in terms of Security. It is used by police personnel to access security feed and match with the existing database. It helps to detect criminals or their vehicles. It can even be used to locate stolen products. There could be limitless applications. The abilities of a machine to look out for objects have surpassed the capabilities of human beings.
Using technology to perform surveillance is a lot more efficient. As surveillance is a repetitive and mundane task, performance dips can result in human beings. Letting technology do the task can help human beings to focus on the actions to be taken if something goes wrong. A lot of personnel might be needed to survey a large strip of land. Mobile surveillance bots, along with stationary cameras can mitigate the problems.
The computer vision tasks are categorized into a few simple procedures.
The output labels are changed to make the bounding boxes around an object. This helps the programming model to learn the class of the object and the position of the object in the image. Four parameters are added in the output layer which includes the centroid, the proportion of height and width of the bounding box. A bunch of output units is added to get the cartesian coordinates of the different positions to be recognized. The different positions or landmark would be consistent for particular objects.
If we are trying to detect multiple objects in the image, we can use the same technique that was being used in object localization. The difference is that we would want the algorithm to be able to classify and localize all the different objects in the image and not just one. The simple idea is to crop the image into multiple images and run the same algorithm for all these cropped images.
The following algorithm should be followed:
Cropping multiple images and passing through CNN would be very expensive computationally. The computation power can be improved with the sliding window method. It would replace the fully connected layer and for a given window size, the input image would be passed only once. In actual implementation, the cropped images are not passed one at a time, but the entire image is passed at once.
This section explains other drawbacks in the previously proposed model. Square windows are slid all over the image. The object may be rectangular or maybe none of the squares match perfectly with the actual object. The algorithm might be able to find and localize multiple objects in the images. However, the accuracy of the bounding box method is quite bad.
YOLO (You Only Look Once) is a solution which is much accurate and faster than the sliding window algorithm. There is a minor tweak in the algorithms. The image is divided into multiple grids. The label of the data is changed so that the classification and localization algorithm can be used for each grid cell. The algorithm proceeds as follows:
But still, there are some problems. Multiple objects in the same grid cannot be detected. The issue can be solved by choosing smaller grid sizes. But the algorithm can still fail in certain cases, for instance, a flock of birds. In addition to having C+5 labels for each grid cell, the idea used in anchor boxes is to have (C+5)*A labels for each grid cell and A is the required anchor boxes. If an object is assigned to one anchor box in a grid, the other object can be assigned to the other box of the same grid.
The most famous deep learning library today is TensorFlow. It is owned by Google. Machine learning is used in all of the Google products to improve translation, search engine, image captioning, and recommendations. Google users get to have a faster and refined search with Artificial Intelligence. Google uses machine learning to take advantage of the massive datasets to help users get the best experience. The researchers, programmers, and data scientists all use machine learning.
TensorFlow was built as a framework to help developers and researchers work together on an AI model. Lots of people can use it once it has been developed and scaled.
Creating an object detection algorithm is the best way to understand how everything works. The necessary algorithms are provided with TensorFlow. You can create an entire object detection algorithm as follows. However, you need to take care of two things before you start:
A few prerequisites would be required to get the job done. A few things need to be installed on the system.
Tensorflow can be downloaded using the pip or conda commands:
# For CPU
pip install tensorflow
# For GPU
pip install tensorflow-gpu
The other libraries are also to be installed using the pip or conda commands. The following code would work.
pip install --user Cython
pip install --user contextlib2
pip install --user pillow
pip install --user lxml
pip install --user jupyter
pip install --user matplotlib
Protocol Buffers are the language-neutral, platform-neutral, extensible mechanism, which is like XML, but smaller and much simpler. Version 3.4 or above of the same needs to be downloaded. TensorFlow's model needs to be cloned or downloaded from GitHub. Both the models and protobuf should be placed in the same folder. After that, it is time to run protofbuf from the research folder.
"path_of_protobuf's bin"./bin/protoc object_detection/protos/
1. You need to start by importing all the libraries.
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
sys.path.append("..")
from object_detection.utils import ops as utils_ops
from utils import label_map_util
from utils import visualization_utils as vis_util
2. The required model is to be provided and the frozen inference graph generated by TensorFlow.
MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'
PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')
NUM_CLASSES = 90
opener = urllib.request.URLopener()
opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
file_name = os.path.basename(file.name)
if 'frozen_inference_graph.pb' in file_name:
tar_file.extract(file, os.getcwd())
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
return np.array(image.getdata()).reshape(
(im_height, im_width, 3)).astype(np.uint8)
PATH_TO_TEST_IMAGES_DIR = 'test_images'
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 8) ]
def run_inference_for_single_image(image, graph):
with graph.as_default():
with tf.Session() as sess:
# Get handles to input and output tensors
ops = tf.get_default_graph().get_operations()
all_tensor_names = {output.name for op in ops for output in op.outputs}
tensor_dict = {}
for key in [
'num_detections', 'detection_boxes', 'detection_scores',
'detection_classes', 'detection_masks'
]:
tensor_name = key + ':0'
if tensor_name in all_tensor_names:
tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
tensor_name)
if 'detection_masks' in tensor_dict:
# The following processing is only for single image
detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
# Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
detection_masks, detection_boxes, image.shape[0], image.shape[1])
detection_masks_reframed = tf.cast(
tf.greater(detection_masks_reframed, 0.5), tf.uint8)
# Follow the convention by adding back the batch dimension
tensor_dict['detection_masks'] = tf.expand_dims(
detection_masks_reframed, 0)
image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')
# Run inference
output_dict = sess.run(tensor_dict,
feed_dict={image_tensor: np.expand_dims(image, 0)})
# all outputs are float32 numpy arrays, so convert types as appropriate
output_dict['num_detections'] = int(output_dict['num_detections'][0])
output_dict['detection_classes'] = output_dict[
'detection_classes'][0].astype(np.uint8)
output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
output_dict['detection_scores'] = output_dict['detection_scores'][0]
if 'detection_masks' in output_dict:
output_dict['detection_masks'] = output_dict['detection_masks'][0]
return output_dict
8. In the final part, all the functions would be called and the inference is run on all the input images.
for image_path in TEST_IMAGE_PATHS:
image = Image.open(image_path)
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
image_np = load_image_into_numpy_array(image)
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
# Actual detection.
output_dict = run_inference_for_single_image(image_np, detection_graph)
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
instance_masks=output_dict.get('detection_masks'),
use_normalized_coordinates=True,
line_thickness=8)
plt.figure(figsize=IMAGE_SIZE)
plt.imshow(image_np)
To perform real-time object detection through TensorFlow, the same code can be used but a few tweakings would be required. OpenCV would be used here and the camera module would use the live feed from the webcam. The code can be summarised as follows:
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
import cv2
cap = cv2.VideoCapture(0)
sys.path.append("..")
from utils import label_map_util
from utils import visualization_utils as vis_util
MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')
NUM_CLASSES = 90
opener = urllib.request.URLopener()
opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
file_name = os.path.basename(file.name)
if 'frozen_inference_graph.pb' in file_name:
tar_file.extract(file, os.getcwd())
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
while True:
ret, image_np = cap.read()
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
scores = detection_graph.get_tensor_by_name('detection_scores:0')
classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
# Actual detection.
(boxes, scores, classes, num_detections) = sess.run(
[boxes, scores, classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8)
cv2.imshow('object detection', cv2.resize(image_np, (800,600)))
if cv2.waitKey(25) 0xFF == ord('q'):
cv2.destroyAllWindows()
break
Object Detection is becoming common today. Its significance in face detection and face recognition is very well understood. It is also gaining wide acceptance in terms of surveillance and security measures. TensorFlow is one of the greatest libraries that is helping the users to easily achieve great results in Object Detection.
The algorithms are being constantly updated as that is what Machine Learning is all about. Old algorithms are being outperformed, and soon enough, Object Detection can be used in self-driving cars and other sophisticated areas.
Name | Dates | |
---|---|---|
TensorFlow Training | Nov 05 to Nov 20 | View Details |
TensorFlow Training | Nov 09 to Nov 24 | View Details |
TensorFlow Training | Nov 12 to Nov 27 | View Details |
TensorFlow Training | Nov 16 to Dec 01 | View Details |
Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.