Research

Advancing Medical AI Science

Our research spans large language model training, biomedical NLP, genomic AI, and safe clinical reasoning. We publish openly and collaborate with leading institutions.

Deep Cognition Labs AI platform illustration — clinician reviewing AI diagnostics, a robotic assistant at a patient bedside, and real-time vitals monitoring

Focus Areas

Where we push boundaries

Biomedical NLP

Named entity recognition, relation extraction, and semantic reasoning across clinical and biomedical text corpora.

LLM Alignment

Medical-specific RLHF and DPO pipelines ensuring safe, grounded, and factually accurate model outputs.

Genomic AI

Deep learning on genomic sequences, structural variant calling, and gene-phenotype association modeling.

Drug Discovery AI

Generative and predictive AI for molecular property optimization, ADMET modeling, and target identification.

🏥

Clinical Reasoning

Chain-of-thought clinical decision support, differential diagnosis generation, and evidence-based reasoning.

👁️

Medical Vision

Multimodal models for pathology slide analysis, radiology interpretation, and visual clinical grounding.

Methodology

How we build trustworthy models

Every DeepCog model follows a rigorous 4-stage pipeline from data curation to clinical validation.

01 //

📚

Data Curation

Multi-source biomedical corpus assembly with quality filtering, deduplication, and expert annotation.

02 //

⚙️

Pre-training

Domain-specific continued pre-training on curated corpora with medical tokenizer optimization.

03 //

🎯

DPO Alignment

Direct Preference Optimization using expert clinician preference data for safe, accurate outputs.

04 //

✅

Clinical Eval

Benchmark evaluation on MedQA, USMLE, PubMedQA, and internal clinical validation sets.

Publications

Recent research papers

ARXIV · 2026 · BIOMEDICAL NLP

OpenBioLLM: Advancing Open-Source Biomedical Large Language Models with Expert-Curated Preference Data

DeepCog Research Team · IIT Madras Collaboration

We introduce OpenBioLLM-70B, a state-of-the-art open-source biomedical LLM achieving 91.2% on MedQA. Our novel DPO pipeline leverages expert medical preference annotations across 120K instruction pairs, surpassing GPT-4 on 7 of 9 medical benchmarks.

MedQA 91.2% USMLE 89.4% Open Source

arXiv → HuggingFace

NATURE METHODS · 2025 · GENOMICS

GenomicLLM: A Domain-Specific Language Model for Variant Interpretation and Gene-Disease Association

DeepCog Genomics Lab · Chennai AI Research Center

We present GenomicLLM-7B, trained on 180M genomic sequences from NCBI and Ensembl. The model achieves 87% accuracy on clinical variant classification tasks, enabling automated interpretation of VCF files and generation of clinical genomics reports.

Genomics Variant Calling

Paper → Dataset

NeurIPS 2025 · LLM ALIGNMENT

MedDPO: Scaling Direct Preference Optimization for Clinical Safety in Medical Language Models

DeepCog AI Research · Anna University

We introduce MedDPO, a clinical-safety-focused DPO framework that reduces medical hallucination by 68% while maintaining benchmark performance. Our 120K preference dataset is annotated by board-certified physicians across 24 specialties.

Safety Alignment Hallucination ↓68%

Paper → Code

ICLR 2025 · DRUG DISCOVERY

MolLLM: Unified Molecular Language Modeling for ADMET Prediction and Lead Optimization

DeepCog Chemistry AI Lab

MolLLM bridges natural language and molecular representations using a unified tokenizer for SMILES, InChI, and IUPAC names. Achieves top performance on 14 ADMET benchmarks from TDC, with 3x speedup over existing graph neural network approaches.

Drug Discovery ADMET Molecules

Paper → Demo