Data Architect (PySpark & AWS)

Onset Technologies

Contract

Remote

United States

Big Data

Description

We are seeking a highly skilled and strategic Data Architect with deep expertise in PySpark, Databricks, and AWS to lead the design and implementation of our enterprise data architecture. This role is responsible for building scalable, secure, and high-performance data solutions that enable advanced analytics, machine learning, and business intelligence across the organization. The ideal candidate will have hands-on technical experience combined with a strong architectural mindset and a passion for driving data innovation in the cloud.

Key Responsibilities:

Define and evolve the organization’s data architecture strategy with a focus on AWS cloud, Databricks, and scalable big data processing frameworks.
Design and implement data lakehouse architectures using Databricks on AWS (S3, Glue, Redshift, Athena).
Lead the development of enterprise-wide data models, data integration frameworks, and data governance standards.
Architect end-to-end ETL/ELT pipelines using PySpark, ensuring reliability, performance, and data quality.
Establish data standards, best practices, and frameworks for building reusable data components across teams.
Collaborate with data engineers, data scientists, and business stakeholders to align technical architecture with business goals.
Oversee the implementation of data security, compliance, and access control policies using AWS IAM, Lake Formation, and Databricks Unity Catalog.
Guide data migration, transformation, and modernization efforts from legacy systems to cloud-native platforms.
Conduct architectural reviews, troubleshoot complex issues, and optimize Spark and Databricks workloads.
Stay up-to-date with emerging technologies and recommend innovations that enhance the data platform.

Required Qualifications:

Bachelor’s or Master’s degree in Computer Science, Information Systems, Data Engineering, or a related field.
8+ years of experience in data engineering or architecture, with 3+ years in a data architect role.
Extensive hands-on experience with:
- PySpark for distributed data processing
- Databricks (Delta Lake, Spark tuning, job orchestration)
- AWS data services: S3, Glue, Redshift, Lambda, Athena, EMR
Strong knowledge of data lakehouse concepts, data modeling (dimensional and normalized), and data mesh principles.
Proven experience designing and deploying cloud-native data architectures at scale.
Expertise in SQL, data lineage, and metadata management.
Deep understanding of data governance, security, and compliance frameworks (GDPR, HIPAA, SOC 2).

Preferred Qualifications:

Experience with streaming data pipelines (Kafka, Kinesis, or Structured Streaming).
Familiarity with CI/CD and infrastructure as code (Terraform, CloudFormation).
Experience with orchestration tools like Apache Airflow, dbt, or Dagster.
Certifications such as:
- AWS Certified Data Analytics – Specialty
- Databricks Certified Data Engineer Associate/Professional
- TOGAF or other architectural frameworks

Soft Skills:

Strategic thinker with a high-level architectural vision and deep technical execution capability.
Strong leadership and communication skills to guide technical teams and influence stakeholders.
Problem-solving mindset with a focus on performance, scalability, and reliability.
Ability to manage multiple priorities and drive initiatives in a dynamic environment.

Apply now

Share this job

Twitter Facebook Linkedin Email

Data Architect (PySpark & AWS)

Description

Key Responsibilities:

Required Qualifications:

Preferred Qualifications:

Soft Skills:

More jobs

Estimator

The Q Works Group

Steel Project Manager

The Q Works Group