Are you ready for new challenges and new opportunities?
Join our team!
Current job opportunities are posted here as they become available.
Subscribe to our RSS feeds to receive instant updates as new positions become available.
| Department: | Product Development |
| Location: |
The Mission
Most data engineering roles are about moving data from A to B. This one is about making 20 years of complex, relational ERP data legible to AI agents so they can reason over financial transactions, inventory movements, and supply chain events without hallucinating.
ECI is rebuilding how enterprise software is built and operated using an AI-native model. The data layer is the foundation everything else runs on. Without a world-class context engine, the agents are guessing. You are the person who makes sure they never have to.
This is a greenfield mandate. You will hire the team, choose the stack, define the architecture, and own the outcome. The CTO is your only direct stakeholder.
What Youll Own
You are not supporting the AI initiative. You are building the infrastructure without which it cannot exist.
Context Architecture & Retrieval
Design and own the retrieval systems that allow AI agents to reason over ERP data with zero hallucinations
Build and scale the vector infrastructure pgvector, Qdrant, or equivalent with production-grade embedding and reranking pipelines
Own the hybrid search strategy: semantic retrieval layered on top of SQL-scoped financial data
Drive context window optimization packing the most relevant financial 'truth' into each LLM call efficiently
Knowledge Graph & MDM
Lead the Master Data Management strategy golden record survivorship, identity resolution, entity deduplication across ERP entities
Build the knowledge graph that maps relationships between Vendors, Purchase Orders, Invoices, GL Entries, and Inventory so agents understand meaning, not just rows
Own the semantic layer: translate a 500-table legacy schema into a structured, LLM-readable ontology
Define data quality standards and automated validation pipelines that enforce them continuously
Data Platform & Infrastructure
Build the core data platform from scratch: ingestion, transformation, storage, and serving layers
Own the modern data stack dbt, Airflow or equivalent, Postgres/SQL Server with an AI-augmented workflow throughout
Implement data-centric evals: 'Judge Agents' that verify AI output against ground truth SQL
Build synthetic data generation pipelines that produce high-fidelity, relationally consistent ERP data for agent training and testing
Builder Data Track
Own the Data Builder squad: hire, develop, and hold the team to Builder-level output standards
Partner with the Dev and QA Builder leads to ensure data systems are the right interface for agentic tool-calling
Run the Data track of the Builder Bootcamp define the curriculum, set the graduation bar, make the calls
Partner with product and engineering on AI feature data requirements you are the upstream dependency for almost everything
Governance & Compliance
Define data governance policies for AI-consumed data: lineage, access control, PII handling, audit trails
Own compliance requirements relevant to financial data in an ERP context SOC 2, data residency, retention policies
Build the observability layer: OpenTelemetry, Weights & Biases, or equivalent for embedding quality and retrieval performance
Who you are
Requirements:
You have built and led a data engineering team before you know how to hire, structure, and technically lead a team that ships production data systems
Knowledge graph or MDM at scale: you have designed entity resolution, survivorship rules, and ontologies for complex relational domains not just prototyped them
AI/ML platform or LLMOps experience: you have operated embedding pipelines, vector stores, and LLM-integrated data systems in production you understand latency, cost, and quality trade-offs
You think in systems: schema design, retrieval architecture, and data contracts are your native language
You are comfortable in ambiguity greenfield means no existing patterns to follow and no team to hand things off to on day one
Highly Desirable:
Production RAG pipelines over structured or financial data you have gone beyond demos and operated retrieval systems with real precision/recall requirements
ERP, financial, or supply chain data domain you understand what makes a General Ledger different from a web analytics event stream
Modern data stack depth: dbt, Airflow, Postgres, SQL Server you have opinions about transformation layer design and know when to break the rules
Experience working across time zones with an offshore engineering team (India context is a plus)
The Stack:
Languages
Python, SQL (Postgres / SQL Server), TypeScript
AI / Retrieval
OpenAI / Anthropic APIs, pgvector, Qdrant, LangChain / LangGraph
Data Platform
dbt (AI-augmented), Apache Airflow, Docker
Graph / MDM
Neo4j (primary), with open evaluation of alternatives
Observability
Weights & Biases (embedding evals), OpenTelemetry, custom Judge Agents
Infra
AWS / GCP, Kubernetes, GitHub Actions
The Archetypes were looking for:
The Data Alchemist you believe data is only valuable when an AI can reason over it, and you spend time experimenting with embedding models and retrieval techniques to make that true
The Manual Mapping Hater if you have to map two schemas twice, you've already built an agent to do it for you
Rigor over Hype you know the difference between a vector search demo and a production-grade financial data engine; you care about Precision and Recall
The Founding Mindset you're energized by building from scratch, not managing existing systems, and you make decisions confidently without a playbook
Why this role:
ERP data is the hardest data problem in enterprise software 20 years of relational financial history, undocumented schemas, and zero tolerance for hallucination. If you can solve RAG for an ERP, you have solved the hardest version of the problem.
Greenfield with real stakes: you are not inheriting someone else's technical debt or org structure. You build what you believe will win.
Direct line to the CTO no data governance committee, no analytics manager layer, no 6-month roadmap approval process
Unlimited context budget: access to frontier models and the compute to run serious embedding and indexing experiments
The work matters: every AI feature in the product runs on the infrastructure you build