Yo Hr Consultancy logo

Data Engineer

Yo Hr Consultancy
Full-time
On-site
Dallas, Texas, United States
Big Data
Experience: 5 - 10 Years
 
Must Have:
-5+ years of experience in data engineering, with a focus on large-scale and high-throughput systems
-Deep experience working with time series data and purpose-built storage systems (e.g., KDB+, TimeSet, Kronos)
-Strong experience building streaming and batch data pipelines using tools like Glue, Kafka, Flink, or Spark
-Proficiency in Python and integrating data pipelines with machine learning workflows and libraries (e.g., pandas, NumPy,scikit-learn, PyTorch)
 

Requirements

You’ll bring:

  • 5+ years of experience in data engineering, with a focus on large-scale and high-throughput systems

  • Deep experience working with time series data and purpose-built storage systems (e.g., KDB+, TimeSet, Kronos)

  • Strong experience building streaming and batch data pipelines using tools like Glue, Kafka, Flink, or Spark

  • Proficiency in Python and integrating data pipelines with machine learning workflows and libraries (e.g., pandas, NumPy, scikit-learn, PyTorch)

  • Experience designing efficient, scalable data models and partitioning strategies for time series data

  • Knowledge of distributed systems, columnar databases, and parallel processing

  • Familiarity with cloud-native data architectures (AWS, GCP, or Azure) and containerized data infrastructure

  • Strong understanding of data quality, lineage, monitoring, and observability tools

  • Excellent communication skills and a proactive, consultative mindset in client-facing environments

  • Bonus: Experience with multiple time series systems (e.g., KDB+ and Kronos) or contributing to open-source data infrastructure projects

  •  
What you do:
  • Design, build, and optimize high-performance data pipelines for large-scale time series data

  • Implement scalable data infrastructure using tools such as KDB+, TimeSet (Google’s large time series database), or Kronos

  • Develop efficient data ingestion and transformation workflows that handle real-time and historical time series data

  • Connect time series data systems with Python-based model pipelines to support machine learning training and inference

  • Collaborate with data scientists and ML engineers to ensure data availability, quality, and accessibility for experimentation and production

  • Design data models and schemas optimized for time series use cases, including downsampling, aggregation, and indexing strategies

  • Ensure system reliability, scalability, and performance through monitoring, testing, and tuning

  • Establish data governance, lineage, and observability best practices in large-scale environments

  • Mentor junior engineers on large-scale data design, distributed processing, and real-time system architecture

  • Partner with product, engineering, and infrastructure teams to align data systems with business goals