REFRACT · Project

Protein Repeat Analysis Tools for Faster Drug Target Discovery and Biotech Design

healthPrototypeTRL 4Thin data (2/5)

Many proteins in our bodies are built from repeating building blocks, like beads on a string. Understanding these repeating patterns helps scientists figure out what a protein does and what goes wrong in diseases. REFRACT brought together 16 teams across 12 countries to build better software that identifies, classifies, and models these protein repeats. Think of it as building a search engine and quality-control toolkit specifically for one of the trickiest parts of protein biology.

By the numbers

consortium partners across the project

countries represented in the consortium

EUR 2,226,400

EU contribution to the project

total deliverables produced

demonstrated software tools and database deliverables

industry partner in the consortium

The business problem

What needed solving

Companies in pharma, biotech, and diagnostics spend significant time and resources characterizing repeat proteins — a class of proteins involved in many diseases and biotechnology applications. Existing tools are fragmented, use different classification systems, and lack benchmarking, making it hard to compare results or build reliable automated pipelines.

The solution

What was built

The project built 9 specialized software tools including automatic protein repeat annotation, homology modeling, sequence and structure alignment, and homolog search. They also deployed database APIs, curated Pfam models, cross-linked data into InterPro, and created a benchmarking platform to compare detection methods.

Audience

Who needs this

Pharmaceutical companies with computational biology / target discovery teamsBiotech firms engineering repeat proteins for therapeutics or industrial enzymesBioinformatics platform companies building protein analysis softwareDiagnostics companies developing protein-based biomarker assaysAcademic drug discovery centers transitioning to commercial pipelines

Business applications

Who can put this to work

Pharmaceutical R&D

enterprise

Target: Mid-to-large pharma companies with computational biology teams

If you are a pharma company spending months identifying viable drug targets — this project developed software for automatic tandem repeat protein annotation and homology modeling, plus database APIs that cross-link into InterPro and UniProt. That means your target discovery pipeline can screen repeat protein families faster and with a unified classification, reducing redundant analysis across internal teams.

Biotech / Protein Engineering

mid-size

Target: Biotech firms designing synthetic proteins or biologics

If you are a biotech company engineering repeat proteins for therapeutic or industrial use — this project built alignment tools for both sequence- and structure-based comparison of tandem repeat proteins, along with benchmarking metrics across detection methods. These tools let you evaluate engineered repeat protein variants against known families with higher confidence, cutting trial-and-error in the design cycle.

Bioinformatics / Life Science Software

SME

Target: Companies building protein analysis platforms or diagnostic pipelines

If you are a bioinformatics platform provider looking to strengthen your protein annotation capabilities — this project delivered open database APIs, curated Pfam hidden Markov models for repeat proteins, and a consensus benchmarking platform across 9 specialized software deliverables. Integrating these into your platform adds a specialized module your competitors likely lack.

Frequently asked

Quick answers

What would it cost to access these tools and databases?

The project outputs — databases, APIs, and software — were developed under a publicly funded MSCA-RISE grant (EUR 2,226,400) and integrated into open resources like InterPro and Pfam. Most tools are likely freely available for academic use, but commercial licensing terms would need to be discussed directly with the consortium coordinator at the University of Padova.

Can these tools work at industrial scale for large protein libraries?

The project delivered database APIs and software for automatic annotation, homology modeling, and sequence database searching, suggesting batch processing capability. However, the consortium had only 1 industry partner out of 16, so large-scale industrial validation may be limited. Performance benchmarking data is available through their dedicated benchmarking platform.

What is the IP situation — can we license this?

With 11 university and 4 research partners, IP is likely held by academic institutions. The project explicitly aimed for propagation into ELIXIR core data resources and possible industrial exploitation, suggesting openness to licensing. Contact the coordinator at the University of Padova for specific terms.

How mature are these software tools?

The project ran for over 5 years (2019–2024) and delivered 23 total deliverables including 9 demonstrated software tools and databases. These include deployed APIs and cross-linked data in established resources like InterPro. The tools are functional research software, not yet packaged as commercial products.

Can these tools integrate with our existing bioinformatics pipeline?

Yes — the project specifically built database APIs and cross-linked data into widely used resources like InterPro and UniProt. The software covers sequence search, alignment, annotation, and modeling, which are standard pipeline components. Integration effort would depend on your existing infrastructure.

Is there regulatory relevance for these protein tools?

Based on available project data, the tools focus on protein characterization and classification rather than direct regulatory compliance. However, better protein annotation supports regulatory submissions for biologics by providing more thorough target characterization and structural understanding.

Consortium

Who built it

REFRACT is a heavily academic consortium: 11 universities and 4 research organizations with just 1 industry partner (6% industry ratio) and zero SMEs. The 16 partners span 12 countries including 7 EU member states and 9 Latin American institutions, making it geographically diverse but research-oriented. For a business looking to adopt these tools, this means the science is solid and peer-reviewed, but you would be among the first commercial users — there is no established industry adoption pathway yet, and commercial support infrastructure would need to be built from scratch with the academic teams.

UNIVERSITA DEGLI STUDI DI PADOVACoordinator · IT
JOHANNES GUTENBERG-UNIVERSITAT MAINZparticipant · DE
UNIVERSIDAD NACIONAL DE LA PLATApartner · AR
EUROPEAN MOLECULAR BIOLOGY LABORATORYparticipant · DE
AGENCIA ESTATAL CONSEJO SUPERIOR DE INVESTIGACIONES CIENTIFICASparticipant · ES
PONTIFICIA UNIVERSIDAD CATOLICA DEL PERUpartner · PE
PONTIFICIA UNIVERSIDAD CATOLICA DE CHILEpartner · CL
UNIVERSIDAD NACIONAL DE QUILMESpartner · AR
UNIVERSIDAD PERUANA CAYETANO HEREDIApartner · PE
CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE CNRSparticipant · FR
ZURCHER HOCHSCHULE FUR ANGEWANDTE WISSENSCHAFTENparticipant · CH
UNIVERSIDAD NACIONAL AUTONOMA DE MEXICO (UNAM)partner · MX
STOCKHOLMS UNIVERSITETparticipant · SE

How to reach the team

The coordinator is Universita degli Studi di Padova in Italy. SciTransfer can facilitate an introduction to discuss licensing or collaboration.

Order a report on this consortium See UNIVERSITA DEGLI STUDI DI PADOVA profile →

Next steps

Talk to the team behind this work.

Want to explore how REFRACT's protein analysis tools could strengthen your drug discovery or protein engineering pipeline? SciTransfer can connect you with the research team and help evaluate fit for your specific use case.

Order a report on this topic More in Health & Biomedical