If you are a pharmaceutical company spending millions on compound screening — this project developed machine learning algorithms that predict compound bioactivity at massive scale. With 5 industry partners and scalable implementations designed for the largest datasets, ExCAPE's virtual screening approach filters out compounds unlikely to work before they reach the lab, potentially cutting empirical testing costs significantly.
AI Screens Millions of Drug Compounds So Pharma Labs Don't Have To
Imagine you're looking for a needle in a haystack — except the haystack is millions of chemical compounds and the needle is the one that could become a new medicine. Right now, pharma companies physically test huge numbers of compounds in the lab, which is slow and expensive. ExCAPE built machine learning software that can predict which compounds are likely to work before anyone touches a test tube, running on the most powerful supercomputers available. Think of it as a super-smart filter that tells scientists "test these 500, skip the other 50,000" — saving enormous time and money in drug discovery.
What needed solving
Pharmaceutical companies test millions of chemical compounds in the lab to find candidates for new drugs, but the vast majority of these tests fail. This brute-force approach is extremely expensive and slow. Existing machine learning tools that could predict which compounds are worth testing cannot handle the full scale and complexity of real industry datasets.
What was built
ExCAPE produced scalable machine learning algorithms and software implementations designed for exascale supercomputers, specifically targeting compound bioactivity prediction at industry scale. The project delivered 21 outputs including algorithm implementations using HPC programming techniques, platform simulation tools for performance tuning, and accelerator-optimized code.
Who needs this
Who can put this to work
If you are a contract research organization handling compound activity testing for multiple clients — this project produced scalable prediction engines that process industry-scale datasets. Built by a consortium of 10 partners across 8 countries with 50% industry participation, ExCAPE's algorithms handle the data heterogeneity and imbalanced datasets that make real-world drug screening predictions so difficult.
If you are an agrochemical company screening compounds for biological activity against pests or diseases — this project's machine learning methods for predicting compound bioactivity apply directly to your screening pipeline. The algorithms were designed to handle extreme data volumes and diverse input features, making them suitable for any industry where chemical-biological interactions need to be predicted at scale.
Quick answers
How much would it cost to implement this kind of AI screening?
The project received EUR 3,910,140 in EU funding across 10 partners over 3 years to develop these algorithms. Implementation costs would depend on your existing HPC infrastructure and data volumes. The algorithms are designed for exascale computing, so you'd need access to high-performance computing resources.
Can this handle our full compound library at industrial scale?
That's exactly what ExCAPE was built for. The project specifically targeted industry-scale input datasets, solving the problem that existing machine learning approaches couldn't scale to the full size and heterogeneity of pharmaceutical data. The algorithms were designed and optimized for exascale computing platforms.
What about IP and licensing for these algorithms?
ExCAPE was funded as a Research and Innovation Action (RIA) under Horizon 2020, coordinated by IMEC in Belgium with 5 industry partners. Based on available project data, specific licensing terms would need to be discussed with the consortium. Open-source components may exist given the academic involvement of 4 university partners.
How accurate are the predictions compared to lab testing?
The project addressed key challenges including confidence estimation and model quality assessment, meaning the system can tell you how certain it is about each prediction. Based on available project data, specific accuracy benchmarks would need to be obtained from the consortium's 21 deliverables and publications.
Is this ready to plug into our existing drug discovery pipeline?
ExCAPE focused on producing scalable algorithms and implementations for high-performance computing platforms, including work on accelerators and HPC programming techniques. Integration with existing pharma pipelines would require adaptation work, as the project was primarily focused on solving the computational scaling challenge.
What data standards does this support?
The project explicitly tackled data standards as one of its core challenges, alongside handling imbalanced data and feature diversity. This means the algorithms were designed to work with the heterogeneous data formats typical in pharmaceutical compound databases. Specific supported formats should be confirmed with the consortium.
Who built it
The ExCAPE consortium is well-balanced for translating research into industry use, with 5 industry partners and 4 universities across 8 European countries (AT, BE, BG, CZ, ES, FI, SE, UK). The 50% industry ratio is strong for a Research and Innovation Action, suggesting real commercial interest in the outcomes. Coordination by IMEC (Belgium), a world-renowned microelectronics research center, brings credibility in high-performance computing. The 1 SME in the consortium indicates some startup-level interest, though the heavy enterprise and research focus suggests this technology targets organizations with significant computing resources and large compound libraries.
- INTERUNIVERSITAIR MICRO-ELECTRONICA CENTRUMCoordinator · BE
- VSB - TECHNICAL UNIVERSITY OF OSTRAVAparticipant · CZ
- IDEACONSULT LIMITED LIABILITY COMPANYparticipant · BG
- ASTRAZENECA ABparticipant · SE
- UNIVERSITAT LINZparticipant · AT
- AALTO KORKEAKOULUSAATIO SRparticipant · FI
- JANSSEN CILAG SAparticipant · ES
- INTEL CORPORATIONparticipant · BE
- JANSSEN PHARMACEUTICA NVthirdparty · BE
- ROYAL HOLLOWAY AND BEDFORD NEW COLLEGEparticipant · UK
IMEC (Interuniversitair Micro-Electronica Centrum), Belgium — contact through their technology licensing office
Talk to the team behind this work.
Want to know if ExCAPE's machine learning algorithms fit your drug screening pipeline? SciTransfer can connect you with the right consortium partner and prepare a tailored technical brief.