If you are a bank dealing with the pain of stitching together separate data preprocessing, simulation, and ML systems for real-time fraud detection — DAPHNE developed an open-source runtime and compiler that unifies these workloads into a single pipeline. This eliminates manual data movement between tools and underutilization of your GPU clusters. The system was built by a consortium of 14 partners across 7 countries over 4 years, with a final compiler, runtime, and benchmarking toolkit delivered.
One Open Platform to Run Data Processing, HPC, and AI Workloads Together
Right now, if a company wants to crunch huge datasets, run high-performance simulations, and train AI models, they need three separate software stacks that don't talk to each other — like having three kitchens to make one meal. DAPHNE built a single open-source system that lets you write one script and have it automatically run across all your hardware — CPUs, GPUs, storage tiers — without manually moving data between disconnected tools. Think of it as a universal translator between Spark, TensorFlow, and traditional supercomputing code. The result: less engineering headache, less wasted compute, and faster time from raw data to business insight.
What needed solving
Companies running advanced analytics face a costly integration headache: their data preprocessing runs on Spark, simulations run on HPC clusters, and ML training runs on yet another stack. Each system has its own programming model, data formats, and resource management, forcing engineering teams to manually glue pipelines together. This means wasted compute resources from separate statically provisioned clusters, slow development cycles, and expensive data movement between disconnected systems.
What was built
DAPHNE built an open-source system infrastructure consisting of a domain-specific language with APIs, a compiler that generates optimized code for GPUs and accelerators, a runtime with automatic pipeline scheduling and multi-device data placement, managed storage tiers with near-data processing, and a benchmarking toolkit for performance analysis. All components reached final prototype stage with 16 software deliverables and a publicly released reference implementation.
Who needs this
Who can put this to work
If you are a pharma R&D team struggling to connect your HPC molecular simulations with your ML model training pipelines — DAPHNE built a domain-specific language and multi-device scheduling system that automatically places computations on the right hardware. Instead of your data engineers spending weeks gluing Apache Spark jobs to TensorFlow training loops, one integrated pipeline handles it. The project delivered 16 working software prototypes including GPU code generation and smart storage tier management.
If you are an automotive engineering team that uses HPC simulations for crash testing or aerodynamics but wants to add ML-based surrogate models to cut simulation time — DAPHNE developed automatic data placement across hybrid memory and storage configurations, plus a code generation system for heterogeneous accelerators. This means your existing simulation workflows can incorporate ML without rebuilding your entire compute infrastructure. The project consortium included 4 industry partners who shaped real-world requirements.
Quick answers
What does the DAPHNE system actually cost to adopt?
DAPHNE is released as open-source software, so there are no licensing fees for the core platform. Your costs would be integration engineering and adapting the domain-specific language to your existing pipelines. The project was funded with EUR 6,609,665 in EU contributions, meaning the R&D investment is already covered.
Can this handle our production-scale data volumes?
DAPHNE was specifically designed for large-scale data management, with multi-device operations, automatic data placement across storage tiers, and pipeline scheduling for heterogeneous hardware clusters. The final prototypes include managed storage tiers with automatic placement and near-data processing. However, moving from prototype to production-grade deployment at your specific scale would require validation.
What is the IP and licensing situation?
The project explicitly delivered an open-source reference implementation as one of its key deliverables. Based on available project data, the system is designed to be open and extensible. Specific license terms should be verified on the project repository at daphne-eu.eu.
How does this integrate with our existing Spark and TensorFlow infrastructure?
DAPHNE's core purpose is interoperability between data management, HPC, and ML software stacks. It provides language abstractions (APIs and a domain-specific language) plus an intermediate representation that bridges these different systems. The compiler and runtime handle translation to your underlying hardware automatically.
Is this ready to deploy in production today?
DAPHNE delivered final prototypes of its compiler, runtime, and benchmarking toolkit by project end in November 2024. The open-source reference implementation is available, but this remains a research prototype — not a commercially supported product. You would need engineering effort to harden it for production use.
Who built this and can they support us?
The coordinator is Know-Center Research GmbH in Austria, backed by a consortium of 14 partners including 6 universities and 4 industry organizations across 7 European countries. Based on available project data, commercial support would need to be arranged directly with consortium members or through a service integrator.
Who built it
The DAPHNE consortium brings together 14 partners from 7 countries (Austria, Switzerland, Germany, Denmark, Greece, Poland, Slovenia), with a balanced mix of 6 universities, 4 research organizations, and 4 industry partners. The 29% industry ratio is moderate for a Research and Innovation Action, suggesting the project was research-heavy but with real-world grounding. Notably, the consortium has zero SMEs — all industry partners are larger organizations, which signals the technology targets enterprise-scale computing environments rather than small-business use cases. The coordinator, Know-Center Research GmbH in Austria, is a research center specializing in data-driven business, giving the project a bridge between academic research and applied industry needs.
- KNOW CENTER RESEARCH GMBHCoordinator · AT
- IT-UNIVERSITETET I KOBENHAVNparticipant · DK
- INFINEON TECHNOLOGIES AUSTRIA AGparticipant · AT
- UNIVERSITAT BASELparticipant · CH
- TECHNISCHE UNIVERSITAT BERLINparticipant · DE
- DEUTSCHES ZENTRUM FUR LUFT - UND RAUMFAHRT EVparticipant · DE
- KAI KOMPETENZZENTRUM AUTOMOBIL - UND INDUSTRIEELEKTRONIK GMBHparticipant · AT
- EIDGENOESSISCHE TECHNISCHE HOCHSCHULE ZUERICHparticipant · CH
- UNIVERZA V MARIBORUparticipant · SI
- AVL LIST GMBHparticipant · AT
- INTEL TECHNOLOGY POLAND SPOLKA Z OGRANICZONA ODPOWIEDZIALNOSCIAparticipant · PL
- HASSO-PLATTNER-INSTITUT FUR DIGITAL ENGINEERING GGMBHparticipant · DE
- EREVNITIKO PANEPISTIMIAKO INSTITOUTO SYSTIMATON EPIKOINONION KAI YPOLOGISTONparticipant · EL
- TECHNISCHE UNIVERSITAET DRESDENparticipant · DE
Know-Center Research GmbH (Austria) — contact via project website daphne-eu.eu
Talk to the team behind this work.
Want to explore whether DAPHNE's unified data pipeline technology fits your infrastructure? SciTransfer can arrange a direct introduction to the development team and help assess integration feasibility for your specific use case.