Search and Retrieval of Multi-Modal Data using Large Multimodal Models (LMMs)

The Problem Statement
An FMCG giant’s research team needed to develop an AI–enabled tool to extract insights from clinical data and user-feedback data available in the form of multi-modality data (images, videos and metadata). The solution should extract information from a diverse set of data (images, videos, speech and text) and bring all the information into one common platform for efficient search, retrieval, analysis and visualization. The tool should accompany a responsive web interface for the analysts to assess the impact of a product.
The Solution
We developed a solution that uses Large Multi-Modal Models (LMMs) to analyse and extract information from various data types, such as videos, speech, images, text etc. The goal was to bring information from diverse sources into one common platform for comparison and visualization. A multimodality search engine was implemented so that users can seamlessly navigate through the information without worrying about the data modality. By leveraging advanced search algorithms, the system could rapidly query vector databases, ensuring quick, reliable retrieval of multi-modal data to support feedback analysis and decision-making.

Features of the Solution:
- AI models for key-scene detection and caption generation from videos to capture important moments.
- Keywords, tag generation and video summarization through Large Multimodality Models.
- AI models for transcription and emotion detection from speech & text.
- Amalgamation of various data modalities through a common latent space of Multimodal Models.
- Vector database for indexing and retrieval of information.
- Web-based frontend for navigating the diverse set of information.
- Cloud-based deployment of complete system (Azure).

Benefits Delivered
- Swift search and retrieval of information from multimodality data.
- Extraction of latent information for data such as emotion and key scenes.
- Quick consumption of user feedback by product analysts/researchers.
- Increased customer satisfaction with improved analytics and product insight.