SciTransfer
STREAMLINE · Project

One Platform to Analyze Both Live and Stored Big Data Without Expert Programmers

digitalPilotedTRL 7

Imagine you run an online business and data pours in like a firehose — user clicks, game events, video streams — billions of events every day. Right now, you need separate tools to look at what happened yesterday versus what's happening right now, plus a team of expensive specialists to glue it all together. STREAMLINE built a single open-source platform that handles both real-time and historical data analysis in one place, with a simpler language so your data team can focus on business questions instead of wrestling with code. Think of it as replacing a messy kitchen full of mismatched gadgets with one well-designed appliance that does it all.

By the numbers
100 million+
users served by consortium partners
billions
events produced by partner services
10 TB
data generated daily
1 PB+
data at rest managed by partners
3
iterations of field trials completed
9
consortium partners across 6 countries
24
total deliverables produced
The business problem

What needed solving

Companies running online services drown in data from multiple sources — user clicks, transactions, network events — but analyzing real-time streams and stored historical data requires separate, complex systems stitched together with custom code. This forces businesses to hire expensive specialists, endure slow development cycles, and accept delays between when something happens and when they can act on it.

The solution

What was built

The project delivered a unified big data analytics platform built on Apache Flink integrated with Hadoop, featuring automated cluster deployment tools, an interactive analytics environment (REPL and Zeppelin-based web interface), and distributed machine learning libraries for real-time processing. All tools were released as open source and validated through three rounds of industrial field trials.

Audience

Who needs this

Video streaming platforms needing real-time recommendation enginesTelecom operators combining network monitoring with customer analyticsOnline gaming companies tracking player behavior at scaleAd-tech firms optimizing campaigns across live and historical dataAny enterprise processing 1TB+ daily and maintaining separate batch and streaming pipelines
Business applications

Who can put this to work

Online Media & Streaming
enterprise
Target: Video streaming or digital content platforms processing large volumes of user interaction data

If you are a streaming platform dealing with customer churn and the inability to personalize recommendations fast enough — this project developed a unified analytics platform built on Apache Flink and Hadoop that processes both live and historical data in one system. Their industrial partners serve over 100 million users generating billions of events and over 10 TB of data daily, and the platform was field-tested across three iterations to handle exactly these workloads.

Telecommunications
enterprise
Target: Telecom operators or mobile service providers needing real-time customer analytics

If you are a telecom company struggling to combine real-time network data with stored customer records for targeted services — this project built a high-level declarative language and interactive analytics environment that lets data scientists work without deep systems programming expertise. The platform was designed to replace the complexity of running Hadoop, Storm, and databases in parallel, cutting the need for specialized boilerplate code across multiple systems.

Online Gaming & AdTech
mid-size
Target: Online gaming companies or advertising technology firms needing fast reactive analytics

If you are a gaming or ad-tech company that needs to react to player behavior or ad performance in real time while also running deep historical analysis — this project delivered distributed machine learning with asynchronous algorithms designed for high input rates. The open-source Flink-based tools were validated through three rounds of field trials and come with automated cluster deployment, reducing setup time and operational burden.

Frequently asked

Quick answers

What would it cost to adopt this technology?

STREAMLINE's core tools are open source, built on Apache Flink and the Hadoop ecosystem, so there are no licensing fees for the software itself. Costs would involve infrastructure (cloud or on-premise clusters), integration effort, and staff training on the declarative language. The automated deployment tool (Chef cookbooks on Karamel) is designed to reduce setup costs.

Can this handle our data volumes at industrial scale?

Yes — the platform was specifically designed and tested for industrial scale. The consortium's own partners serve over 100 million users, produce billions of events, handle over 10 TB of data daily, and manage over a petabyte of stored data. Three iterations of field trials validated performance at these volumes.

What is the IP and licensing situation?

The project developed open-source tools, which means the core technology is freely available. Specific components like the Flink deployment software, the interactive analytics environment (based on Zeppelin), and the Flink-on-Hops/Hadoop integration were delivered as open-source packages. Commercial support or custom extensions would need to be arranged with the consortium partners.

How does this integrate with our existing Hadoop infrastructure?

STREAMLINE was explicitly designed for Hadoop compatibility. The Flink-on-Hops/Hadoop deliverable addresses direct integration into the Hadoop ecosystem, and the deployment tool automates Flink installation on existing clusters. This means you can add real-time stream processing without replacing your current data infrastructure.

What is the current status of the project and its tools?

The project ran from December 2015 to November 2018 and is now closed. The tools were validated through three iterations of field trials. As open-source software, the components remain available but would need evaluation against current versions of Flink and Hadoop for production use today.

Do we need specialized data engineers to use this?

Reducing the need for specialized programming expertise was a core goal. The project developed a high-level declarative language and interactive environment (REPL and web-based Zeppelin interface) so data scientists can focus on domain questions rather than system-specific boilerplate code across Hadoop, Storm, SolR, and databases.

Consortium

Who built it

The STREAMLINE consortium brings together 9 partners from 6 European countries (Germany, Finland, France, Hungary, Portugal, Sweden), led by RISE Research Institutes of Sweden. With 5 industrial partners (56% of the consortium) and 4 research organizations, this is a well-balanced team that leans toward practical application rather than pure research. The industrial partners collectively serve over 100 million users across online media, gaming, and telecom — providing real-world testing grounds at genuine scale. The single SME in the group suggests the tools were built with enterprise-grade requirements in mind. The coordinator, RISE, is Sweden's largest research institute with strong ties to industry, which adds credibility for technology transfer.

How to reach the team

RISE Research Institutes of Sweden AB — contact via their data science or ICT department

Next steps

Talk to the team behind this work.

Want to know if STREAMLINE's open-source big data tools fit your infrastructure? We can arrange a technical briefing with the development team and assess compatibility with your data stack.