You’ll bring:
5+ years of experience in data engineering, with a focus on large-scale and high-throughput systems
Deep experience working with time series data and purpose-built storage systems (e.g., KDB+, TimeSet, Kronos)
Strong experience building streaming and batch data pipelines using tools like Glue, Kafka, Flink, or Spark
Proficiency in Python and integrating data pipelines with machine learning workflows and libraries (e.g., pandas, NumPy, scikit-learn, PyTorch)
Experience designing efficient, scalable data models and partitioning strategies for time series data
Knowledge of distributed systems, columnar databases, and parallel processing
Familiarity with cloud-native data architectures (AWS, GCP, or Azure) and containerized data infrastructure
Strong understanding of data quality, lineage, monitoring, and observability tools
Excellent communication skills and a proactive, consultative mindset in client-facing environments
Bonus: Experience with multiple time series systems (e.g., KDB+ and Kronos) or contributing to open-source data infrastructure projects
Design, build, and optimize high-performance data pipelines for large-scale time series data
Implement scalable data infrastructure using tools such as KDB+, TimeSet (Google’s large time series database), or Kronos
Develop efficient data ingestion and transformation workflows that handle real-time and historical time series data
Connect time series data systems with Python-based model pipelines to support machine learning training and inference
Collaborate with data scientists and ML engineers to ensure data availability, quality, and accessibility for experimentation and production
Design data models and schemas optimized for time series use cases, including downsampling, aggregation, and indexing strategies
Ensure system reliability, scalability, and performance through monitoring, testing, and tuning
Establish data governance, lineage, and observability best practices in large-scale environments
Mentor junior engineers on large-scale data design, distributed processing, and real-time system architecture
Partner with product, engineering, and infrastructure teams to align data systems with business goals