If you are a research funding agency struggling to track the publication output from your grants — this project developed a text-mining algorithm that automatically identifies funded articles in Europe PubMed Central. It was incorporated into the live Europe PMC platform, meaning funders can now search and browse articles linked to their grants without manual effort.
Automated Text Mining Links Research Papers to Their Funding Sources
Imagine you funded thousands of research projects but had no easy way to find out which published papers came from your money. That's exactly the problem research funders face — the connection between who paid and what got published often gets lost. This project built a text-mining algorithm that automatically reads through millions of life science articles in Europe PubMed Central and extracts the funding acknowledgments, linking each paper back to its funder. Think of it like a smart search engine that reads the fine print of every paper and builds a map of who funded what.
What needed solving
Research funding agencies spend billions on grants but struggle to track which published papers actually resulted from their investment. Without structured links between funding and publications, impact assessment is incomplete, slow, and expensive — often relying on manual end-of-grant reports that arrive years late or not at all.
What was built
A text-mining algorithm that automatically extracts funding statements from full-text research articles in Europe PubMed Central, plus its production deployment into the Europe PMC website where ERC funding statements are now visible and searchable.
Who needs this
Who can put this to work
If you are a scientific publisher looking to enrich your metadata and improve discoverability of funded research — this project developed methods to extract structured funding statements from full-text articles. The approach was proven on the Europe PubMed Central corpus and could be adapted to your own article databases to add funding attribution as a searchable field.
If you are a research analytics company that needs to connect grants to publications for reporting dashboards — this project built and deployed an algorithm that mines funding acknowledgments at scale. The method was validated through standard text-mining quality assessment and consultation with ERC staff, providing a replicable approach for your own data pipelines.
Quick answers
What would it cost to implement a similar system?
The entire project was completed with EUR 60,000 in EU funding over 12 months by a single research organization (EMBL). This suggests a relatively lean implementation. Licensing or adaptation costs would depend on negotiations with EMBL and the scope of your dataset.
Can this work at industrial scale beyond Europe PubMed Central?
The project objective explicitly states plans to explore extending the approach to other Europe PubMed Central funders beyond ERC. The algorithm was incorporated into the live Europe PMC production system, demonstrating it works at the scale of millions of life science articles.
Who owns the intellectual property and how can I license it?
The project was led by the European Molecular Biology Laboratory (EMBL), a public research organization in Germany. IP terms would follow the ERC grant agreement. The Europe PMC platform itself is publicly accessible at europepmc.org.
How accurate is the text-mining algorithm?
The project used standard text-mining quality-assessment methods and consultation with ERC staff to validate outcomes. Based on available project data, specific accuracy metrics are not published in the objective, but the algorithm was deemed reliable enough for production deployment on the Europe PMC website.
Is this only for life sciences or can it be applied to other fields?
The project was specifically designed for life science articles in Europe PubMed Central. However, the text-mining methods for extracting funding statements could in principle be adapted to other publication databases and research domains, as funding acknowledgment formats are similar across fields.
How long did it take from development to deployment?
The project ran for exactly 12 months (September 2014 to August 2015). Within that period, the algorithm was developed, tested, and incorporated into the live Europe PMC website — a remarkably fast development-to-deployment cycle.
Who built it
This was a focused, single-partner project led by the European Molecular Biology Laboratory (EMBL) in Germany — one of Europe's flagship research organizations in life sciences. With a compact EUR 60,000 budget and no industrial partners, the consortium was purely research-driven. The absence of commercial partners means the technology was built for a public infrastructure use case (Europe PMC) rather than for direct commercial licensing. For a business looking to adopt this technology, you would be dealing with a world-class research institution, but would need to negotiate access and adaptation terms directly with EMBL.
- EUROPEAN MOLECULAR BIOLOGY LABORATORYCoordinator · DE
The coordinator is the European Molecular Biology Laboratory (EMBL) in Germany. SciTransfer can help identify the right contact person for licensing or collaboration discussions.
Talk to the team behind this work.
Want to explore how automated funding-attribution technology could improve your research tracking? SciTransfer can connect you directly with the EMBL team behind this deployment.