Novel Technology Services logo
Full-time
On-site
San Francisco, California, United States
Big Data
Description

Before reading the Job description please note:

  • This position is a hybrid role and candidates must be currently located in the San Francisco Bay area.
  • The client requires candidates to possess five years of experience as a Data Engineer. While internships are valuable, they will be considered supplementary to the required five years of experience.
  • Recent work in a startup organization

Data Engineer:

Our client is Delivering the most comprehensive identity insights, our client’s platform equips businesses with fully automated KYB (Know your Business) solutions for Risk, and Fraud management, setting new standards in business verification.

The solution is designed for FS institutions to emphasize their interest in a coordinated effort to mitigate B2B fraud, reduce the risk associated with working with small businesses, and create a centralized, privacy-compliant entity for data-sharing between financial institutions.

Looking for a data engineer to assist with data collection across a variety of fragmented data sources available from both government, public, and private databases.

Responsibilities:

  • Designing the database schema
  • Migrating production tables
  • Working with large scale data in the terabyte range
  • Maintaining operational efficiency of database
  • Normalizing disparate schemas to a single unified scheme
  • Abstracting reusable componentsΒ 
  • Strength in minimizing the amount of new code for new pipelines and instead creating internal packages that allow a high level of reusabilityΒ 


Requirements

Experience with:

  • ETL (extract, transform, load)
  • Database design
  • Primary key, foreign key
  • Indexing
  • Partitioning
  • Access patterns
  • Migrations
  • Data pipelining
  • Core data concepts: ACID transactions, Idempotency, Orchestration

Technologies:

  • Airflow
  • Google Cloud Platform (GCP)
  • GCP Dataflow (aka Apache Beam)
  • PostgreSQL
  • Python
  • Pydantic (a python library)
  • Distributed systems