Mandolin logo

Senior Data Engineer

Mandolin
Full-time
On-site
San Francisco, California, United States
Big Data

About Mandolin

Nearly all disease will become treatable in our lifetimes, and drug discovery is quickly becoming an engineering discipline. Mandolin is building the “last-mile” delivery infrastructure that gets cutting-edge biologics, cell, and gene therapies to patients faster. Our AI-powered knowledge-worker platform already serves leading infusion clinics, with payers and pharma next in line. 

We’re backed by Greylock, SignalFire, Maverick, and founders of famous companies like Yahoo, and led by repeat and exited founders with a team hailing from some of the most technically impressive companies.

Why this role matters

Every product decision, ML model, and customer KPI depends on clean, trustworthy data. We’ve outgrown app-database queries and stitched-together dashboards—now we need a governed warehouse that supports real-time analytics, HIPAA-grade security, and feature engineering at scale. You will design that backbone and make data a first-class product across Mandolin.

What you’ll own

  • Warehouse architecture – choose the stack, model the core marts, and enforce lineage and SLA guarantees.

  • Resilient ELT – build Airflow/Dagster pipelines with versioned schemas, incremental loads, and automated quality checks.

  • Self-serve analytics – deliver a documented semantic layer and dashboards that non-technical teams trust for daily decisions.

  • Security & compliance – implement encryption, RBAC, and audit trails that meet HIPAA and SOC 2.

  • Feature delivery – surface clean, performant datasets to ML, product, and forward-deployed teams.

  • Cost & latency tuning – keep storage bills low and queries fast as volume and complexity grow.

Must-have experience

  • 5 + years building and operating data platforms at scale.

  • Deep expertise in SQL modeling, orchestration (Airflow, Dagster, etc.), and columnar warehouses (BigQuery, Snowflake, Redshift).

  • Proven track record shipping business-critical dashboards to non-technical stakeholders.

  • Strong Python or TypeScript for data tooling and automation.

  • Security mindset—encryption, RBAC, audit logging in regulated environments.

Nice-to-haves

  • Healthcare or revenue-cycle analytics.

  • Change-data-capture from document stores into relational models.

  • Real-time feature-store or streaming data pipelines.