Video Annotation

Annotation is a sort of descriptive message added to text, diagram, document, image, or video. Annotation guarantees attachment of metadata to ensure an enhanced experience during file access. Video annotation is the extraction of the information about the video and the addition of this information to the video which can assist in browsing, search, analysis, retrieval, comparison, and categorisation. The addition of informative or descriptive messages to videos creates an interactive user experience, increases the viewers’ bandwidth enabling them to watch the video for longer duration, and increases the number of viewers. Video annotation of content for education, sports, health, and so on, enables relevant search results for text-based search criteria. Annotations are mainly useful for social media in order to make the content more descriptive and popular. Annotations are also used in training videos and for analytics purposes.


Video Annotation


Types of Metadata Used in Annotation

Content-independent Metadata: Text Annotation is the best suited of this type of metadata. For example, this type of metadata can be the name of the speaker, author, geolocation, and so on. Here, the metadata is not directly linked to the content, can be directly overlying the video, but may or may not be a part of the actual video content. The addition of informative or descriptive messages is also a part of this.

Content-dependent Metadata: Annotation performed based on low-level video frames or audio frames on the content is a part of this type of metadata. Usually, this type of annotation is performed through video analytics.

Different Types of Video Annotation

Text Annotation: It is the method of attaching the text to your video content. It is a basic way to add information or include a call-to-action and can be of any of the following forms:

  • Title: Title annotations provide viewers an idea about a video’s content and are a useful branding tool.
  • Speech bubble: Speech bubbles convey unspoken information.
  • Spotlight creators: Spotlight annotations display a custom message when the viewer hovers over a defined area.
  • Labels: Labels display a custom text when the viewer hovers over a defined area. Dissimilar to spotlight annotations, label annotations appear below the defined frame and possess slightly different configurations.

Content-based Annotation (Text/Object in Video): The information in the form of text that exists in video frames is called collateral text. This is used so that keyword phrases and potentially richer representations are extracted from text fragments. As an example, the text of news, documentary programs, movies, and newspaper film reviews. Textual data in the video content is a rich source of information, and thus, if available, allows the filtering and searching of video data by users in a more intuitive and natural manner.

Audio Annotation: Users can modify the audio content and annotate the audio output.