Data Engineer — Graph Technologies

Hybrid
- Eindhoven, Noord-Brabant, Netherlands
Engineering

About Datenna

Datenna is a fast-growing tech scale-up combining cutting-edge open-source intelligence (OSINT) and AI technologies to provide governments worldwide with critical insights into China's techno-economic landscape. Our platform transforms OSINT into actionable insights on China's Defense Industrial Base and broader technological ecosystem through advanced data processing and analysis.

The Role: As a Data Engineer specialising in Graph Technologies, you will join our Data Engineering team and play a central role in building and sustaining the live graph that sits at the heart of Datenna's intelligence platform. This graph continuously ingests, resolves, and connects data from hundreds of heterogeneous OSINT sources — turning raw, fragmented information into a coherent, queryable picture of China's techno-economic landscape. At its core, it is a production graph system: a property graph built to handle continuous updates at scale, not a semantic or ontology-driven knowledge representation layer.

This role operates at both ends of that challenge. You will design and build new parts of the graph — new entity types, new source integrations, new inference layers — while also keeping the existing graph consistent, accurate, and reliable as it receives continuous updates. Building something new and keeping something live at the same time is the core tension of this position. Beyond your own work, we expect you to raise the graph engineering capability of the team around you. You will be the domain expert others learn from.

Your Mission at Datenna:

Design, build, and maintain data pipelines that continuously ingest and update the knowledge graph from heterogeneous OSINT sources, including bringing entirely new sources and entity types online.
Build and improve entity resolution systems that correctly link, merge, and track entities across updates — including detecting when an existing entity changes or a new one needs to be introduced.
Help define and evolve the boundary between the graph and the broader data platform: which processing steps belong in traditional pipeline DAGs, which belong in the graph, and how inferred facts and relationships feed back into the system in a way that preserves data lineage — so conclusions can be traced, audited, and if needed, rolled back.
Use graph algorithms to assess source reliability, cross-check facts across independent data points, and generate enrichment signals such as labels and risk indicators.
Develop graph validation rules and monitoring frameworks that catch inconsistencies introduced by updates before they propagate.
Evolve the graph schema and data models in a way that handles backward compatibility as new data shapes arrive.
Collaborate with data scientists and domain experts to improve data quality and extend coverage to new domains.
Actively share knowledge across the team through code reviews, documentation, internal sessions, or hands-on pairing. We expect this person to raise the graph engineering baseline of the team — not just hold it themselves.

What makes you a great candidate:

Must have:

4+ years of production data engineering experience, with meaningful time spent both building and operating a live graph database using a property graph model (e.g. Neo4j, Amazon Neptune).
Solid understanding of how graphs behave under continuous updates: incremental ingestion, entity lifecycle management, relationship consistency, and schema evolution.
Experience with entity resolution and record linkage at scale, including handling conflicting or incomplete source data.
Practical experience applying graph algorithms for inference, reliability scoring, enrichment, or related use cases — not just for traversal or querying.
Strong Python skills and experience with a pipeline orchestration framework (Dagster or Airflow).
Practical knowledge of data validation, lineage tracking, and monitoring in production pipelines.
A track record of knowledge transfer through mentoring, writing, internal talks, or hands-on pairing. You should be comfortable being the graph expert in the room and helping others grow into that space.

Nice to have:
Any of these would make you a stronger fit, though none are required to apply

Experience with multi-lingual or multi-script data (Chinese, Arabic, etc.) and the data quality challenges that come with it.
Familiarity with semantic web technologies, ontology design, or knowledge representation.
Experience with NLP or information extraction techniques.
Contributions to open-source projects.
Advanced degree in Computer Science or a related field.

Please know that applicants may be subject to a screening process.

AI is part of how we build

We use AI tools extensively to move faster and raise quality, and we focus our human attention on the work where judgment, creativity, and responsibility matter most. We expect every candidate to be comfortable using AI in their day-to-day work in a responsible manner and to continuously look for better ways to apply it.