MUST BE A US CITIZEN
Position Overview: We are seeking a talented and experienced Data Engineer with a strong background in Databricks to join our dynamic and innovative team. As a Data Engineer, you will play a pivotal role in designing, building, and maintaining the data architecture and infrastructure necessary for our data-driven initiatives. Your expertise in Databricks will be crucial in optimizing our data pipelines, ensuring data accuracy, and enabling advanced analytics and machine learning efforts.
Key Responsibilities:
- Data Pipeline Development:
- Design, develop, and maintain efficient and scalable data pipelines using Databricks.
- Collaborate with cross-functional teams to understand data requirements and implement solutions to ingest, transform, and load data from various sources into our data lake or data warehouse.
- Databricks Expertise:
- Utilize your in-depth knowledge of Databricks to optimize performance, troubleshoot issues, and fine-tune Spark jobs.
- Implement best practices in Databricks workspace organization, job scheduling, and resource management.
- Data Modeling and Transformation:
- Work closely with data analysts, data scientists, and business stakeholders to understand data modeling requirements and translate them into efficient ETL processes.
- Apply data transformation techniques to ensure data consistency, quality, and accuracy.
- Data Quality and Governance:
- Implement data quality checks, monitoring, and validation procedures to ensure the integrity of the data flowing through pipelines.
- Contribute to data governance initiatives by adhering to data standards and best practices.
- Performance Optimization:
- Identify and address performance bottlenecks in data pipelines and optimize data processing workflows.
- Continuously monitor and enhance the performance of Spark jobs and Databricks clusters.
- Documentation and Collaboration:
- Create and maintain documentation for data pipelines, workflows, and Databricks-related processes.
- Collaborate effectively with team members, providing technical expertise and guidance on Databricks-related matters.
Qualifications:
- Bachelor's degree in Computer Science, Engineering, or a related field; Master's degree is a plus.
- Proven experience as a Data Engineer, with a focus on designing and developing data pipelines using Databricks.
- Strong proficiency in Apache Spark and Databricks, including Spark SQL and DataFrame API.
- Experience with data integration, ETL processes, and data modeling.
- Proficiency in programming languages such as Python, Scala, or Java.
- Familiarity with cloud platforms (e.g., AWS, Azure, GCP) and data storage solutions (e.g., S3, Blob Storage, Google Cloud Storage).
- Excellent problem-solving skills and the ability to troubleshoot and optimize complex data pipelines.
- Strong communication skills and the ability to collaborate effectively with cross-functional teams.
Preferred Qualifications:
- Databricks Certified Professional certification.
- Experience with streaming data processing using technologies like Apache Kafka or Apache Flink.
- Knowledge of containerization technologies such as Docker and Kubernetes.
- Familiarity with DevOps practices and CI/CD pipelines.
- Previous experience in a data-intensive industry, such as e-commerce, finance, healthcare, or technology.
If you are a motivated and dedicated Data Engineer with a passion for working with Databricks to drive data-driven insights, we encourage you to apply. Join our team and be a key contributor in shaping the future of our data infrastructure and analytics capabilities.