If you are a pharma company struggling to identify drug targets in non-coding DNA regions — this project developed software that detects and classifies genomic elements across development, helping pinpoint which regulatory mutations actually cause disease. With 15 partner labs contributing data across 8 countries, the annotation maps cover zebrafish development comprehensively, providing a validated model for human disease gene discovery.
Genomic Analysis Software for Understanding Birth Defects and Genetic Diseases Using Zebrafish
Imagine your DNA is a massive instruction manual with billions of letters, and we only understand the chapters that code for proteins — roughly 2% of the book. The rest was long dismissed as "junk," but it actually contains hidden switches that turn genes on and off during development. This project used zebrafish — whose early development is surprisingly similar to humans — to map out those hidden switches and build software that can read them. The result is a set of computational tools that help scientists spot which genetic "typos" in the non-coding parts of DNA might cause birth defects or disease.
What needed solving
A massive portion of the human genome — the non-coding regions — remains poorly understood, yet mutations in these areas drive many genetic diseases and birth defects. Pharma and diagnostics companies sequencing whole genomes cannot interpret most variants outside protein-coding regions, leaving potential drug targets and diagnostic markers hidden. There is a critical shortage of computational tools and trained experts who can bridge the gap between raw genomic data and actionable biological insight.
What was built
The project delivered software packages for detecting and classifying genomic elements and their characteristics, including tools that work with single-cell data and tools for integration and visualization of multi-omics datasets (CAGE-seq, ChIP-seq, RNA-seq). In total, 10 deliverables were produced, along with 15 trained computational biologists with cross-disciplinary skills.
Who needs this
Who can put this to work
If you are a diagnostics company that sequences patient genomes but cannot interpret variants outside protein-coding regions — this project built software for detecting and classifying genomic elements including in single cells. Non-coding mutations account for a huge share of unexplained genetic diagnoses, and these tools help classify which variants in regulatory regions are functionally relevant.
If you are a bioinformatics company looking to expand your analysis pipeline with developmental genomics capabilities — this project produced software packages for integration and visualization of multi-omics data sets (CAGE-seq, ChIP-seq, RNA-seq). The tools were built by a consortium of 7 universities and 4 research institutes and tested across 15 ESR projects, making them well-validated for integration into commercial platforms.
Quick answers
What would it cost to license or use these genomic analysis tools?
The project was funded under MSCA-ITN, a training-focused programme. The software packages were developed in an academic context across 7 universities and 4 research institutes. Based on available project data, there is no indication of commercial licensing terms — tools are likely available under academic open-source licenses, but commercial use terms would need to be negotiated with the University of Birmingham as coordinator.
Can these tools work at industrial scale with large patient cohorts?
The software was designed for genomic-scale data integration — handling CAGE-seq, ChIP-seq, and RNA-seq datasets simultaneously, including single-cell data. Based on available project data, the tools were validated in research settings across 15 ESR projects and 15 partner institutions, but no evidence of deployment in clinical or industrial-scale production environments is available.
Who owns the intellectual property for the software and genomic annotations?
IP is likely shared among the 15 consortium partners across 8 countries according to the MSCA grant agreement terms. The University of Birmingham as coordinator would be the first point of contact. With 3 industry partners in the consortium, some IP arrangements for commercial exploitation may already exist.
How does zebrafish data translate to human applications?
Zebrafish share a high degree of genomic similarity with humans and are an established model for studying vertebrate development and genetic disease. The project specifically aimed to extend ENCODE and FANTOM annotation efforts to developmental genomics. The genomic codes and regulatory element classifications discovered are directly relevant to understanding human congenital anomalies and developmental disorders.
Is this compliant with genomic data regulations like GDPR?
The project used zebrafish genomic data, not human patient data, so GDPR data protection concerns do not directly apply to the core datasets. However, any downstream application to human clinical genomics would need to meet regulatory requirements. Based on available project data, no clinical trials or human data processing were involved.
What is the timeline to apply these tools in a commercial setting?
The project ended in December 2018 and produced 10 deliverables including the software packages. The tools exist as research-grade software. Moving to a commercial-grade product would require additional engineering, validation against clinical standards, and regulatory approval if used in diagnostics — likely requiring further development investment.
Who built it
The ZENCODE-ITN consortium brings together 15 partners from 8 countries (BE, DE, DK, ES, JP, SE, UK, US), led by the University of Birmingham. The mix includes 7 universities, 4 research institutes, and 3 industry partners — giving a 20% industry ratio, which is modest but typical for a training network. Notably, this is an MSCA-ITN programme, meaning its primary purpose was training 15 early-stage researchers rather than delivering market-ready products. The consortium has zero SMEs, and all industry partners are larger organizations. The international spread including Japan and the US signals strong scientific credibility, but the academic-heavy composition means commercial exploitation was not the primary driver. A business looking to use these tools would need to engage with academic partners who may have different timelines and priorities than commercial partners.
- THE UNIVERSITY OF BIRMINGHAMCoordinator · UK
- UNIVERSITE DE LIEGEparticipant · BE
- IMPERIAL COLLEGE OF SCIENCE TECHNOLOGY AND MEDICINEparticipant · UK
- RIKEN THE INSTITUTE OF PHYSICAL ANDCHEMICAL RESEARCHparticipant · JP
- GENOME RESEARCH LIMITED LBGparticipant · UK
- FUNDACIO DE RECERCA CLINIC BARCELONA-INSTITUT D INVESTIGACIONS BIOMEDIQUES AUGUST PI I SUNYERparticipant · ES
- KARLSRUHER INSTITUT FUER TECHNOLOGIEparticipant · DE
- KAROLINSKA INSTITUTETparticipant · SE
- MAX-PLANCK-GESELLSCHAFT ZUR FORDERUNG DER WISSENSCHAFTEN EVparticipant · DE
- KING'S COLLEGE LONDONparticipant · UK
The University of Birmingham (UK) coordinated this project. Look for the principal investigator in the developmental genomics or computational biology department.
Talk to the team behind this work.
Want to explore how zebrafish genomic tools could accelerate your drug target discovery or diagnostic pipeline? SciTransfer can connect you with the right research team — contact us for a tailored brief.