Semantic Enrichment

A low code platform to quickly build semantic enrichment pipelines and meta data, helping faster and deeper informatics research

Complex and time consuming to conduct informatics research, and to implement and update a solution for the varying needs

Search and analyze relevant concepts and entities in greater depth from a variety of public and internal information sources such as research papers, press releases, etc., across the stages of the drug discovery and development process
Search and analyze relevant concepts and entities

Scientists in Biopharma companies need to search and analyze relevant concepts and entities in greater depth from a variety of public and internal information sources. They end up spending hours or days finding the related contextual information due to different data formats, varying vocabularies, and scattered information across sources.

A semantic enrichment solution provides a framework to address this challenge by identifying and associating data, definitions, and contexts from a large, unstructured, and heterogeneous content from different sources.

Key challenges in implementing a semantic enrichment solution:

  • Complex to develop a number of solution components from data wrangling to entity extraction to knowledge presentation
  • Continually updating the solution supporting different data sources and formats, upgrading vocabularies, and changing user interfaces
  • Insufficient coverage in the existing biomedical vocabulary databases (For example, UMLS falls short in preclinical records)
Our approach leverages a low-code configuration and declarations driven solution for all the stages from extracting data to presenting knowledge, instead of developing a one-off custom solution for specific use cases.
  • Sourcing and wrangling of data, Converting formats, and Extracting specific fields based on the conventions (eg: press release, blog posts etc.)
  • Semantic enrichment activities including extracting and resolving entities, associating them based on ontologies, and leveraging vocabulary databases
  • Advanced NLP and AI driven classification, ranking, and summarization
  • Storing/writing results into different search-friendly databases – full-text, graph, relational DBs, document DBs, etc.
  • Developing a user interface for search and exploration of entities and knowledge
Our Solution

Informatics research enabler with a low-code semantic enrichment platform

Semantic Enrichment Pipeline components - Data Extraction, Entity extraction & resolution, Enriched data storage, and User Interface
Semantic Enrichment Pipeline and Components

Key components & strengths

Low-code Platform and Declarative Pipeline

Framework that enables creating adaptable and scalable low-code solutions and enables code reuse across data formats, types and use cases

Library of reusable data processing components

Basic data wrangling (e.g. HTML parser), File format conversion (e.g. PDF to text), and Advanced components (e.g. document classification)

Semantic Enrichment Solution Components

Entity extraction, entity resolution, ontology association, classification, and ranking besides leveraging Metamap and Aganitha’s MDM

Enriched Data Storage in different databases

Integration for storing and writing results to full-text search, CMS, Relational, Document, and Graph Databases enabling multi-faceted search

Intuitive, Semantic, and Configurable UI

Dynamic view of the processed records allowing the users to search and analyze information across sources by slicing and dicing facets produced by the pipeline

Variety of entities across different data sources

Extract entities like drugs, proteins, genes, diseases, enzymes, organizations, etc. from several public sources and data from internal repositories

Demo Application Example Screens


Faster informatics research across use cases and easier solution implementation and updates

Enhanced discovery

Of entities and knowledge through proper association between data from different sources, contexts and definitions

Quicker to insights

Across multiple processes – competitive intelligence, drug discovery and development, drug safety etc.

Productive user experience

Through an eCommerce like search and explore of entities and information by slicing and dicing relevant facets

Faster on-boarding and updates

Leveraging plug and play available components across the pipeline and configuration driven low code platform

Wider Vocabulary

With Aganitha’s continually expanding MDM (Master Data Management) database complementing Metamap (UMLS)

Acts as an add-on accelerator

For AI ML driven knowledge graphs, computational biology, and computational chemistry solutions

Discover our offerings across the biopharma value chain

Learn more about our Semantic Enrichment