SciTransfer
HOBBIT · Project

Open Benchmarking Platform That Tests How Well Your Big Data Tools Actually Perform

digitalTestedTRL 6

Imagine you're buying a car but there's no crash-test rating, no fuel-economy sticker — you just have to trust the dealer. That's what companies face when choosing big data processing software: no independent, standardized way to compare tools. HOBBIT built an open benchmarking platform — like a Consumer Reports for linked-data technology — where vendors and buyers can run real-world tests on 1 petabyte of industry data from 4 different sectors. The results are public, machine-readable, and repeatable, so you actually know what you're paying for.

By the numbers
1PB
Real industry-relevant data used for benchmarking
4
Different industry domains covered
52
Total project deliverables produced
10
Consortium partners across 6 countries
€3,718,250
EU contribution to project development
5
Industry partners in the consortium
The business problem

What needed solving

Companies investing in big data processing tools have no independent, standardized way to compare performance before buying. Vendor benchmarks are self-serving, and running your own tests is expensive and inconsistent. This means procurement decisions worth hundreds of thousands of euros are based on marketing claims rather than verified performance data.

The solution

What was built

The project built an open, cloud-based benchmarking platform for testing big linked data tools across the entire data processing lifecycle. Deliverable D2.2.1 delivered the first working version of the HOBBIT Platform with a user manual for integrating new benchmarks, tested on 1PB of real industry data from 4 domains. In total, 52 deliverables were produced.

Audience

Who needs this

Enterprise data architects evaluating data integration platformsCTOs at data-intensive companies making procurement decisions on processing toolsIT consultancies advising clients on big data technology selectionGovernment agencies managing large open data portals needing performance baselinesData platform vendors wanting independent validation of their product performance
Business applications

Who can put this to work

Financial Services & Insurance
enterprise
Target: Banks, insurers, and fintech companies processing large volumes of linked customer and transaction data

If you are a financial institution dealing with the challenge of selecting the right data integration and processing tools for compliance reporting — this project developed an open benchmarking platform tested on 1PB of real industry data across 4 domains. Instead of relying on vendor claims, you can run standardized tests to compare how well different tools handle your data volumes before committing to expensive licenses.

E-Commerce & Retail
mid-size
Target: Online retailers and marketplace operators managing product catalogs across multiple data sources

If you are an e-commerce company struggling with product data quality across suppliers and catalogs — this project built modular benchmarks that test every step of the big linked data lifecycle, from ingestion to querying. You can evaluate which data-linking software actually performs at scale with your real data patterns, rather than discovering bottlenecks after deployment.

Healthcare & Pharma Data Management
enterprise
Target: Health data platforms and pharmaceutical companies integrating clinical datasets

If you are a health-data platform integrating patient records, clinical trials, and research datasets from multiple sources — this project created cloud-based evaluation infrastructure that benchmarks data processing tools under realistic conditions. With 10 consortium partners from 6 countries validating the platform, you get vendor-neutral performance data to guide procurement decisions.

Frequently asked

Quick answers

How much would it cost to use the HOBBIT benchmarking platform?

The HOBBIT platform was designed as open and publicly available. The project's exit strategy involved creating a membership association sustained by subscriptions from industry and academia. Based on available project data, specific pricing was not published, but the open-source nature means the core platform can be accessed without license fees.

Can this handle data at the scale our company needs?

The platform was built and tested on approximately 1 petabyte of real industry-relevant data from 4 different domains. The architecture relies on cloud infrastructure specifically to ensure scalability. This is industrial-grade volume, not a lab demo.

Who owns the intellectual property and can we license it?

HOBBIT was funded as an EU Research and Innovation Action (RIA) with €3,718,250 in public funding. The platform and benchmarks were designed to be open and publicly available, with code accessible online. Based on available project data, the IP follows standard EU open-access provisions for publicly funded research.

Is this still actively maintained after the project ended in 2018?

The project planned an association as its exit strategy, created after the second project year, sustained by membership subscriptions. The project website (project-hobbit.eu) was established for ongoing access. Based on available project data, long-term maintenance depends on the association's continued activity.

How difficult is it to integrate our own benchmarks into the platform?

The platform was specifically designed to be modular and easily extensible. Deliverable D2.2.1 includes a user manual for the integration of new benchmarks by third parties. The architecture supports adding custom benchmarks for any step of the data processing lifecycle.

Which industries has this been validated in?

The project assembled real industry-relevant data from 4 different domains at launch, with plans to extend through collaborations. The consortium included 5 industry partners and 5 research organizations across 6 countries. Based on available project data, specific domain names were not listed in the objective summary.

Are there compliance or regulatory benefits to using standardized benchmarks?

The platform produces human- and machine-readable public periodic reports, creating auditable performance records. For industries facing data-processing regulations, having independent benchmark results provides documented evidence of tool capabilities. Based on available project data, no specific regulatory certifications were mentioned.

Consortium

Who built it

The HOBBIT consortium brings together 10 partners from 6 countries (Belgium, Switzerland, Germany, Greece, Poland, UK), with a balanced 50/50 split between 5 industry partners and 5 research organizations. The coordinator, INFAI in Germany, is a recognized applied informatics institute. Having 5 industry partners signals that real-world data needs drove the platform design, not just academic curiosity. However, only 1 partner is classified as an SME, meaning the consortium leaned toward established organizations — important context for any SME considering adoption, as the tool was primarily shaped by larger players' data challenges.

How to reach the team

INFAI (Institut fur Angewandte Informatik) in Germany coordinated the project. Use SciTransfer's coordinator lookup service to get the right contact.

Next steps

Talk to the team behind this work.

Want to know if HOBBIT benchmarks are relevant to your data stack? SciTransfer can assess fit and arrange a direct introduction to the research team.