Description
We are seeking a highly skilled and strategic Data Architect with deep expertise in PySpark, Databricks, and AWS to lead the design and implementation of our enterprise data architecture. This role is responsible for building scalable, secure, and high-performance data solutions that enable advanced analytics, machine learning, and business intelligence across the organization. The ideal candidate will have hands-on technical experience combined with a strong architectural mindset and a passion for driving data innovation in the cloud.
Key Responsibilities:
Define and evolve the organizationβs data architecture strategy with a focus on AWS cloud, Databricks, and scalable big data processing frameworks.
Design and implement data lakehouse architectures using Databricks on AWS (S3, Glue, Redshift, Athena).
Lead the development of enterprise-wide data models, data integration frameworks, and data governance standards.
Architect end-to-end ETL/ELT pipelines using PySpark, ensuring reliability, performance, and data quality.
Establish data standards, best practices, and frameworks for building reusable data components across teams.
Collaborate with data engineers, data scientists, and business stakeholders to align technical architecture with business goals.
Oversee the implementation of data security, compliance, and access control policies using AWS IAM, Lake Formation, and Databricks Unity Catalog.
Guide data migration, transformation, and modernization efforts from legacy systems to cloud-native platforms.
Conduct architectural reviews, troubleshoot complex issues, and optimize Spark and Databricks workloads.
Stay up-to-date with emerging technologies and recommend innovations that enhance the data platform.
Required Qualifications:
Bachelorβs or Masterβs degree in Computer Science, Information Systems, Data Engineering, or a related field.
8+ years of experience in data engineering or architecture, with 3+ years in a data architect role.
Extensive hands-on experience with:
PySpark for distributed data processing
Databricks (Delta Lake, Spark tuning, job orchestration)
AWS data services: S3, Glue, Redshift, Lambda, Athena, EMR
Strong knowledge of data lakehouse concepts, data modeling (dimensional and normalized), and data mesh principles.
Proven experience designing and deploying cloud-native data architectures at scale.
Expertise in SQL, data lineage, and metadata management.
Deep understanding of data governance, security, and compliance frameworks (GDPR, HIPAA, SOC 2).
Preferred Qualifications:
Experience with streaming data pipelines (Kafka, Kinesis, or Structured Streaming).
Familiarity with CI/CD and infrastructure as code (Terraform, CloudFormation).
Experience with orchestration tools like Apache Airflow, dbt, or Dagster.
Certifications such as:
AWS Certified Data Analytics β Specialty
Databricks Certified Data Engineer Associate/Professional
TOGAF or other architectural frameworks
Soft Skills:
Strategic thinker with a high-level architectural vision and deep technical execution capability.
Strong leadership and communication skills to guide technical teams and influence stakeholders.
Problem-solving mindset with a focus on performance, scalability, and reliability.
Ability to manage multiple priorities and drive initiatives in a dynamic environment.