KritiKal Solutions Inc. is a premier technology services firm with a global footprint and over 22 years of experience. It excels in product engineering, R&D, and cutting-edge innovation and has catered to its clients through over 500 projects with its deep expertise across AI-driven vision systems, embedded technologies, and cloud and mobile software solutions.

Contacts

sales@kritikalsolutions.com

India Phone Number

(0120) 692 6600

USA Phone Number

+1 (913) 286 1006

Search and Retrieval of Multi-Modal Data using Large Multimodal Models (LMMs)

Search and Retrieval of Multi-Modal Data using Large Multimodal Models (LMMs)
Category:
Gen AI / FMCG Beauty Care

The Problem Statement

An FMCG giant’s research team needed to develop an AIenabled tool to extract insights from clinical data and user-feedback data available in the form of multi-modality data (images, videos and metadata). The solution should extract information from a diverse set of data (images, videos, speech and text) and bring all the information into one common platform for efficient search, retrieval, analysis and visualization. The tool should accompany a responsive web interface for the analysts to assess the impact of a product. 

The Solution

We developed a solution that uses Large Multi-Modal Models (LMMs) to analyse and extract information from various data types, such as videos, speech, images, text etc. The goal was to bring information from diverse sources into one common platform for comparison and visualization. A multimodality search engine was implemented so that users can seamlessly navigate through the information without worrying about the data modality. By leveraging advanced search algorithms, the system could rapidly query vector databases, ensuring quick, reliable retrieval of multi-modal data to support feedback analysis and decision-making. 

multi-modal data

Features of the Solution:

  1. AI models for key-scene detection and caption generation from videos to capture important moments. 
  2. Keywords, tag generation and video summarization through Large Multimodality Models. 
  3. AI models for transcription and emotion detection from speech & text. 
  4. Amalgamation of various data modalities through a common latent space of Multimodal Models. 
  5. Vector database for indexing and retrieval of information. 
  6. Web-based frontend for navigating the diverse set of information. 
  7. Cloud-based deployment of complete system (Azure). 
Amalgamation of various data modalities through a common latent space of Multimodal Models

Benefits Delivered

  • Swift search and retrieval of information from multimodality data. 
  • Extraction of latent information for data such as emotion and key scenes. 
  • Quick consumption of user feedback by product analysts/researchers. 
  • Increased customer satisfaction with improved analytics and product insight.