Image annotation in machine learning or deep learning, is the process by which images are labelled or classified using text or annotation tools, or both, enabling an artificial intelligence system (a robot, machine or automated system, etc) to recognise features of data on its own. Annotating an image is adding metadata to a data set.

The different types of image annotation

Image annotation is a type of data labelling. It is also sometimes referred to as tagging, transcription or processing. Indeed, it is possible to annotate videos continuously, in streams or frame by frame.

Image annotation allows your machine learning system to recognise the marked items you want it to recognise. Images can be used for supervised learning of your system. Once your system has been trained, it will be able to make a decision or take an action based on its ability to identify features in the annotated images.

Image annotation is most often used to recognise objects, boundaries and to segment images for understanding content, meaning or the entire image. For each of these uses, a significant amount of data is required to train, validate and test a machine learning model to achieve the desired result. Image annotation can be simple where it is only a matter of classifying the image according to its description (e.g. we have an image of a cat in a living room, we can tag the image of the cat with the label "house cat"). It can also be complex in the case where we need to distinguish different elements or areas of the image to be annotated (for example: we need to teach our machine to recognise the difference between a Siamese cat and a Persian cat). 

There are 4 main types of image annotation:

  • classification,
  • object detection,
  • semantic segmentation
  • and deep learning image segmentation

Classification

Classification is a type of image annotation that consists of identifying the presence of similar objects represented in images throughout a data set. This type of annotation is used to train a machine to recognise an object in an unlabelled image, which resembles an object in other already labelled images, already used for training the machine. 

For example: an annotator can tag indoor images with the following tags according to their type: kitchen, living room, etc. He can also tag outdoor images by determining whether it is "day" or "night". 

Object detection

Object detection is a form of annotation that involves identifying the presence, position and number of one or more objects in an image and annotating them as accurately as possible. It is possible to annotate objects in an image with techniques such as polygon annotation or bounding box annotation. For example, we may have several images of street scenes in which we want to identify trucks, pedestrians, bicycles or cars. It is possible to annotate them separately on the same image thanks to object detection.

Semantic segmentation

Semantic segmentation is a more advanced application of image annotation. It is used in a number of ways to analyse the visual content of images and determine how objects in an image are different or the same. This method is used when we want to understand the presence, location and sometimes the size and/or shape of objects in images. 

For example: we have several images including a crowd and a stadium on which we want to annotate, it is possible to annotate the crowd to segment the stadium seats.

Deep learning image segmentation makes it possible to track and count the presence, position, number, size and shape of objects in an image. Using the previous example of the stadium and the crowd, it is possible to annotate both the individuals in the stadium and to determine the number of people in the crowd using this type of annotation and using per-pixel segmentation. 

Detecting the limits

Image annotation can also be used to train machines to recognise the boundaries or lines of objects (called "Boundary recognition") in an image. These boundaries can include the edges of an individual object or the area of topography displayed in the image. This type of annotation is used in the training of autonomous cars. It is used to recognise the boundaries of pavements, for example, or traffic lanes, etc.

It can also be used in training machine learning models for drones where it is important to follow a particular path or to recognise potential obstacles such as power lines, for example. 

Medicine and retail also use this type of annotation to improve their machine learning systems: for example, for the annotation of medical images where we need to recognise the different cells, or the different aisles of a shop where we can focus on the most crowded aisles and exclude the other aisles that do not correspond to what we want to annotate.

How to annotate images?

To annotate your images, several data annotation tools exist. Some tools are available for sale, while others are available as open source or freeware. Are you looking for a partner to help you annotate your images? Isahit has its own innovative annotation tool within its socially responsible digital platform and a community of skilled contributors, spread over 4 continents, ready to help you.

ℹ️ Ask for an image annotation demonstration and let's talk about your project.

To find out more about our annotation services, see our solutions for computer vision.