SciTransfer
Organization

LEXICAL COMPUTING CZ SRO

Czech SME providing corpus linguistics software and NLP expertise for multilingual lexicographic and open research infrastructure projects.

Technology SMEdigitalCZSMENo active H2020 projects
H2020 projects
2
As coordinator
0
Total EC funding
€640K
Unique partners
40
What they do

Their core work

Lexical Computing is a Czech technology SME specializing in corpus linguistics software and computational lexicography — in plain terms, they build tools that analyze massive text collections to understand how words are actually used in language. Their flagship product is Sketch Engine, a corpus management platform widely used by dictionary makers, translators, and language researchers. In H2020 projects they contributed this language technology infrastructure to pan-European efforts: providing corpus tools and NLP expertise to help build interoperable lexicographic resources and multilingual discovery platforms. They sit at the intersection of linguistics, AI-driven text analysis, and open research infrastructure.

Core expertise

What they specialise in

Corpus linguistics and lexicography toolsprimary
2 projects

Both ELEXIS and TRIPLE relied on computational corpus analysis; ELEXIS explicitly focused on lexicographic infrastructure where Lexical Computing's tooling is a core contribution.

Natural language processing and computational linguisticsprimary
2 projects

NLP and computational linguistics are listed as core keywords in ELEXIS (EUR 550,206), their largest project.

Multilingual and lesser-resourced language coverageprimary
2 projects

ELEXIS specifically targeted lesser-resourced languages, and TRIPLE addressed multilingualism in research discovery — a consistent thread across both projects.

Linked open data and semantic web for language resourcessecondary
1 project

ELEXIS keywords include linked (open) data and semantic web, indicating technical work on interoperable language resource formats.

Open Science infrastructure and EOSC integrationemerging
1 project

TRIPLE (2019-2023) places their tools within the European Open Science Cloud (EOSC) context and OPERAS publishing infrastructure.

Evolution & trajectory

How they've shifted over time

Early focus
Digital lexicography and NLP tools
Recent focus
Open science discovery infrastructure

In their earlier H2020 work (ELEXIS, starting 2018), Lexical Computing focused squarely on the technical foundations of digital lexicography: corpus tools, AI-driven word analysis, semantic web representations of dictionary data, and coverage of under-resourced European languages. By 2019, with TRIPLE, the framing shifted from building language resources to embedding those resources inside broader open science discovery infrastructure — connecting their tools to EOSC and the OPERAS scholarly communication network. The trajectory is clear: from specialist language technology provider toward an actor in pan-European open research infrastructure, with multilingualism as the bridge between the two phases.

Lexical Computing is moving from language-technology vendor toward open research infrastructure participant, making them an increasingly relevant partner for EOSC-aligned projects that need multilingual text analysis or language-resource interoperability.

Collaboration profile

How they like to work

Role: specialist_contributorReach: European22 countries collaborated

Lexical Computing has never coordinated an H2020 project — they join as specialist partners, contributing a specific tool or technical capability that larger consortia need. With 40 unique partners across just 2 projects, they operate inside large, broad consortia (ELEXIS alone spans 20+ institutions). This suggests they are brought in for a well-defined technical contribution rather than as a generalist partner, and working with them likely means engaging a focused team around a specific software or data deliverable.

Despite only two projects, Lexical Computing has built a notably wide network: 40 unique consortium partners spread across 22 countries, reflecting the pan-European character of language infrastructure projects that must cover many national language communities. There is no single geographic concentration — their network is genuinely European.

Why partner with them

What sets them apart

Lexical Computing occupies a rare niche as a commercial SME with a production-grade corpus analysis platform (Sketch Engine) that is simultaneously embedded in academic research infrastructure — most competitors are either pure academics or pure software vendors. For a consortium needing both proven software and a team that publishes research on its own tools, they offer both in one organisation. Their consistent focus on multilingualism and lesser-resourced languages also fills a gap that larger Western-European NLP players tend to ignore.

Notable projects

Highlights from their portfolio

  • ELEXIS
    Their largest project by far at EUR 550,206, this four-year pan-European initiative to build a shared lexicographic infrastructure is the clearest demonstration of Lexical Computing's core commercial and research capabilities working in tandem.
  • TRIPLE
    Signals their expansion into EOSC-linked open science infrastructure, connecting language-technology expertise to the wider European scholarly communication ecosystem via OPERAS.
Cross-sector capabilities
Health and clinical NLP (processing medical texts, terminology extraction)Education and language learning technologyCultural heritage and digital humanitiesPublic sector multilingual content and policy document analysis
Analysis note: Only two projects in the dataset, both as participant, limiting depth of role and leadership analysis. Profile confidence is boosted by strong thematic consistency across both projects and well-known external reputation of Sketch Engine, but claims about specific internal contributions are inferred from keywords and project titles rather than deliverable-level data.