What makes us who we are? A psychologist would say it’s our personality, shaped by our thoughts and experiences. A philosopher might point toward our consciousness and choices, while a sociologist would argue it’s our environment and social interactions that define us. An anthropologist, on the other hand, might suggest it’s influenced by cultural norms and traditions. But if you ask a geneticist they would say: It’s all in the genes.
For years, scientists have pondered over this question, to find the enigma of life, through the genetic makeup of individuals. A major step in this direction was the Human Genome Project (HGP) which started in 1990 and was completed in 2003. While HGP gives us a comprehensive map of the human genome, it does not capture the full extent of global human genetic diversity. Small changes in nucleotide positions within the genome have major implications on how an individual responds to the environment, which is what informs us about the ‘genetic makeup’ of an individual or an ethnic group as distinct from others. To decode this diversity many countries have started compiling the genomic sequences of multiple individuals. India launched its version by the name of Genome India Project (GIP)—a monumental effort to sequence and map the genetic variations of its diverse population.
GIP aims to create a comprehensive genetic map of India’s diverse populations and covers over 4,600 ethnic groups and 532 tribes. Due to centuries of endogamy, linguistic shifts, and genetic admixture, Indian populations exhibit unique genetic variations, many of which influence disease susceptibility and drug responses. However, most genomic studies and precision medicine research have been based on data from overseas populations, leading to a significant underrepresentation of Indian genomes in global datasets. This lack of representation impacts the benefits from medical research and personalised healthcare, as genetic variants influencing disease risk and drug metabolism can vary across populations.
THE HUMAN GENOME PROJECT AND INDIA’S JOURNEY
The Human Genome Project (HGP), launched in the 1990s, was a groundbreaking effort to sequence the entire human genome—3 billion base pairs—identifying all 20,000-25,000 genes in human DNA. Coordinated by the US Department of Energy and the National Institutes of Health, this 13-year project involved countries like the USA, UK, France, Germany, Japan, and China; China contributed 1% of the genome sequencing. India, constrained by resource limitations, was not an initial participant but later played a crucial role in analysing publicly available genome data. The project revolutionised genetics, enabling advances in medicine, biotechnology, and personalised healthcare. Despite initial sequencing costs reaching $1 billion, technological advancements have drastically reduced the price, making genome sequencing more accessible.
In response to global advancements, India launched its own initiatives to study genetic diversity. The Indian Genome Variation Project (IGV), led by the Council of Scientific and Industrial Research (CSIR) with participation from multiple premier research institutes, successfully mapped the genetic diversity of India’s populations. This consortium aimed to provide data on validated SNPs (Single Nucleotide Polymorphism) and repeats, both novel and reported, along with gene duplications, in over a thousand genes, in 15,000 individuals drawn from Indian subpopulations. This was followed by India’s first complete genome sequencing in 2009, demonstrating the nation’s growing capacity in genomic research. Another initiative by CSIR, Genomics for Public Health in India (IndiGen) programme, successfully sequenced the genomes of 1,008 Indians within six months. This was aimed to provide faster diagnosis in case of rare genetic diseases.


GENOME INDIA PROJECT (GIP)
Launched in 2020, GIP is a flagship national initiative aimed at mapping the genetic landscape of India. It is a collaborative effort involving 20 academic and research institutions across the country, working under the coordination of the Department of Biotechnology (DBT), Government of India. The primary goal of the project is to create a comprehensive catalogue of genetic variations unique to the Indian population, which will serve as a foundation for genomics-driven healthcare and research. Given India’s vast genetic diversity, influenced by thousands of years of migration, endogamy, and admixture, this project is expected to revolutionise precision medicine, disease prediction, and drug response studies tailored to India’s population.
The Genome India data was released at the Genome India Data Conclave on 9 January 2025 with a recorded message from the Prime Minister in the presence of Union Minister of State for Science & Technology (independent charge), Dr Jitendra Singh. DBT Secretary Dr Rajesh S Gokhale and lead scientists involved in the project discussed its significance and the way forward on related issues arising from its release.
KEY ACHIEVEMENTS
1. Sample Collection
One of the project’s critical milestones is the collection of 20,000 genetic samples from 83 distinct populations across India. These include samples from ethnic minorities, tribal communities, and different linguistic groups, ensuring a representative genetic dataset that captures India’s rich ancestral diversity.
2. Whole Genome Sequencing
Not all the samples collected resulted in the file sequencing due to various technical and logistic issues. The project finally successfully sequenced 10,074 genomes, generating a high-resolution database of genetic variations specific to India. This data will be instrumental in understanding genetic risk factors for diseases, designing population-specific treatments, and improving drug efficacy studies.
3. Data Archiving and Accessibility
To ensure global accessibility and data security, all sequencing data has been archived at the Indian Biological Data Centre (IBDC) in Faridabad. The data is shared under the Framework for Exchange of Data (FeED) protocols, which allows researchers worldwide to access and analyse Indian genomic data while maintaining privacy and ethical safeguards.
4. Establishment of a Biobank A biobank has been set up to store collected genetic samples for future research. This repository will enable long-term studies, facilitate comparative genomic research, and support the development of genome-based medical interventions.
FUTURE GOALS
1. Scaling Up: Sequencing 10 Million Genomes
In the coming years, GIP aims to expand its dataset exponentially, with a vision to sequence 10 million genomes. This ambitious goal will enhance India’s genomic repository, making it one of the largest in the world and solidifying India’s position in global genomic research.
2. Advancing Personalised Medicine
The project will translate genomic insights into clinical applications, focusing on developing affordable, genomics-based diagnostic tools. By identifying population-specific genetic markers, it will aid in early disease detection, targeted therapies, and precision medicine tailored for Indian patients.
3. Strengthening Collaborations To enrich the dataset and expand applications, the Genome India Project plans to collaborate with leading medical and research institutions, some of which have been mentioned on a newly launched GenomeIndia website (https://genomeindia.in/). These collaborations will bridge the gap between research and clinical practice.
GENOMIC INSIGHTS INTO INDIA’S HEALTH LANDSCAPE:
Rare Diseases Unique to India
India’s diverse genetic landscape has given rise to a range of rare diseases unique to specific regions and populations. For instance, Madras Motor Neuron Disease (MMND), a genetically inherited neurodegenerative disorder, is predominantly found in the Tamil Nadu region. This disease, associated with a mutation in a gene called HSPB1, leads to progressive muscle weakness and respiratory failure. Similarly, Handigodu Disease, a form of progressive familial heart failure, is seen primarily in a community in coastal Karnataka and is believed to arise from a mutation in the LMNA gene, which encodes a specific type of protein named nuclear envelope. Another example is Mucopolysaccharidosis type VII (MPS VII), a metabolic disorder affecting the Tamil and Telugu-speaking populations of Southern India, caused by a deficiency in ß-glucuronidase, leading to the accumulation of glycosaminoglycans and resulting in progressive organ dysfunction. In addition to these rare diseases, Guillain-Barré Syndrome (GBS), though not unique to India, has recently emerged as a growing concern. This autoimmune disorder causes rapid muscle weakness and paralysis, often triggered by infections such as the Zika virus or respiratory illnesses. Studies suggest genetic factors specific to Indian populations may contribute to increased susceptibility, though further research is needed.
Prevalent Health Challenges
Beyond rare genetic disorders, India faces a range of common diseases that significantly impact public health. Genetic conditions like thalassemia and sickle cell anaemia, coupled with infectious diseases such as tuberculosis, dengue, malaria, and filarial infections, present ongoing challenges. Urbanisation and shifting lifestyles have also led to an increase in non-communicable diseases, including diabetes, chronic kidney disease, and non-alcoholic fatty liver disease. These diverse health issues underscore the need for a comprehensive genetic perspective in addressing India’s complex disease landscape.
Cancer and the Need for Population-Specific Research
Cancer is one of the leading causes of mortality in India, with breast, cervical, ovarian, and uterine cancers accounting for over 70% of cancers in women. A study published in the Lancet Oncology highlights how genomic research can improve cancer outcomes by identifying population-specific genetic risk factors and tailoring treatment strategies.
The study emphasises that Indian women develop breast and ovarian cancer nearly a decade earlier than women in high-income countries, suggesting some poorly understood underlying genetic differences. Additionally, the higher prevalence of triple-negative breast cancer in India indicates a distinct genetic and environmental interplay. However, most current cancer treatments and screening protocols are based on Western genomic data, which may not be fully applicable to Indian patients.
Pharmacogenomics and Personalised Therapies Genetic variations also play a crucial role in how individuals metabolise medications—a field explored by pharmacogenomics. Research has identified numerous genetic variants, including those catalogued in the PharmGKB database, that can adversely affect drug response. In India, where certain populations exhibit a high frequency of these variants, the effectiveness of drugs such as anticoagulants, anti-retroviral, and anti-viral medications can be significantly compromised. Mapping these variants across the country’s diverse populations is essential for developing personalised therapies that ensure both safety and efficacy.


THE ROLE OF GIP
Genome India Project is not a rare disease project or looking for disease specific signals, per se. However, its focus of collecting genetic diversity of India will help in gaining insights into genetic basis or predispositions of specific population groups.
By sequencing genomes from diverse populations in India, GIP will help to identify population-specific genetic markers associated with both rare and common diseases, including cancer and drug metabolism. This large-scale effort will indirectly refine diagnostic approaches, improve disease screening programmes, and enable the development of personalised treatment strategies. Ultimately, GIP’s findings will help in bridging the gap between genetic research and clinical applications, paving the way for precision medicine tailored to India’s unique genetic landscape.
CHALLENGES AND CONSIDERATIONS
While the project has made significant strides, several challenges remain:
1. Ethical and Privacy Concerns
The collection and utilisation of genomic data raise ethical dilemmas regarding privacy, informed consent, and data security. Establishing robust ethical frameworks and data governance policies is essential to ensure responsible usage. Data Management Groups in DBT have painstakingly developed policies consistent with law and inspired by national and international standards.
2. Data Management and Computational Challenges
Handling large-scale genomic data in a secured environment requires cutting-edge computational infrastructure and high-level expertise in bioinformatics and data management. Efficient storage, processing, and analysis of genomic datasets will be crucial for deriving meaningful insights.
3. Ensuring Inclusive Representation
India’s genetic diversity, covered in 4,600 ethnic groups and 532 tribal communities, presents a challenge in ensuring comprehensive representation. Efforts must be made to include underrepresented and indigenous populations to create a truly inclusive and unbiased genomic database.
4. Sustaining Funding and Resources Long-term sustainability depends on continuous funding, infrastructure upgrades, and skilled manpower. Public-private partnerships and government support will be vital to maintaining research momentum.
FUTURE CHALLENGES AND OPPORTUNITIES
1. Integration with Healthcare Systems
Translating genomic research into clinical applications requires collaboration between researchers, medical practitioners, and policymakers. Establishing genomics-based healthcare guidelines will be key to incorporating precision medicine into mainstream healthcare.
2. Public Awareness and Acceptance
Educating the public about benefits and ethical considerations of genomics is crucial. Awareness programmes can help dispel misconceptions, encourage participation in genetic research, and promote informed decision-making regarding genetic testing.
3. Strengthening Global Collaborations India’s participation in international genomic projects can enhance research quality, foster knowledge exchange, and facilitate access to advanced technologies. Partnerships with initiatives like the Human Genome Project, the International HapMap Project, and global bioinformatics consortia will be valuable.


GLOBAL PROJECTS
In the race to decode the blueprint of life, genome projects around the world are pushing the boundaries of science and medicine. The UK Biobank, launched in 2006, stands as a cornerstone of genomic research, with 500,000 genomes sequenced to study the interplay of genetics, lifestyle, and environment in diseases like cancer and diabetes. Similarly, the All of Us programme in the USA, which officially began in 2018, aims to sequence 1,000,000 genomes, emphasising diversity to bridge gaps in precision medicine for underrepresented populations. In the past, the 1000 Genomes Project (2008–2015) laid the foundation by mapping 2,504 genomes from 26 global populations, revealing the rich tapestry of human genetic variation. Smaller but equally impactful initiatives, such as Singapore’s Precise SG100K (2021) and Saudi Arabia’s Saudi Genome Program (2014), focus on regional health challenges, while Mexico’s Origen Project (2023) explores the unique genetic mosaic of its population, blending Indigenous, European, and African ancestries. In the table given alongside, various genome sequencing projects across the globe are outlined for reference.
CONCLUSION
The Genome India Project is more than just a scientific initiative—it is an exploration of our genetic identity, with the potential to reshape medicine, redefine ancestry, and revolutionise public health. By embracing genomics-driven healthcare, India aligns itself with a global movement toward more precise, inclusive, and effective healthcare and medical solutions.
Beyond its clinical impact, this project highlights India’s rich genetic heritage, shaped by centuries of migration, adaptation, and evolution. Each sequenced genome brings us closer to answering fundamental questions about our origins, health, and future. However, this journey comes with challenges. Ethical concerns around data privacy, informed consent, and equitable access must remain central, ensuring that genomic advancements benefit all communities, including marginalised and indigenous groups. Public awareness and education will also be key to fostering trust and participation in this transformative effort.
As genome research progresses, India is taking significant steps toward incorporating genomics into mainstream medicine and public health. The challenge now is not just to gather data, but to use it responsibly, ensuring that its benefits are widespread and equitable. With collaboration, ethical research, and innovation, the Genome India Project has the potential to reshape healthcare for generations, not just in India, but across the world. And with every breakthrough, we move closer to unlocking the full potential of genomics in understanding life itself.
*Shandar Ahmad is Professor and Coordinator (DBT Bioinformatics Center), School of Computational and Integrative Sciences (SCIS), Jawaharlal Nehru University (JNU). He was panel moderator on ‘Ethical Issues in Data sharing’ at the Genome India release event. Divyangna Bathla and Richa Mishra are researchers in biological science, working towards their doctoral degree in Bioinformatics at JNU.