The Enterprise Corporate Data Team is looking for a Principal Data Engineer, a senior technical leader responsible for architecting the core data infrastructure and platforms that power enterprise-scale AI applications. Reporting to the VP of Engineering, this role will focus on building systems to generate content tagging, semantic ontology, persona modeling, integrating content metadata with behavioral data to support personalization, audience development, and intelligent content discovery.
The Principal Data Engineer will lead the end-to-end design and implementation of scalable pipelines, platforms and systems that support semantic analysis and Knowledge Graph generation across massive volumes of unstructured data using GeN AI systems. This individual will also co-ordinate with an offshore team of engineers, ensuring consistent delivery, code quality, and alignment with business and technical goals. The ideal candidate will possess an entrepreneurial ethos, an ability to operate in a dynamic environment, and a working knowledge of the current digital media landscape. This role is based in New York City.
Key Responsibilities:
● Lead the design and implementation of high-performance data pipelines and infrastructure to support automated generation of semantic ontology and knowledge graph.
● Architect scalable data platforms that integrate structured and unstructured data—including behavioral signals, content metadata, and user engagement data—for Gen AI use cases.
● Build systems that enable semantic enrichment of content through entity recognition, disambiguation, normalization and deduplication techniques.
● Drive the creation and maintenance of flexible ontologies and taxonomies to organize media content for personalization, recommendation, and audience segmentation.
● Partner closely with ML engineers and data scientists to deploy and operationalize models for content and audience intelligence.
● Oversee and co-ordinate with an offshore engineering team, providing technical guidance, code reviews, and project oversight to ensure timely, high-quality deliverables.
● Ensure best practices in data governance, quality, observability, and documentation across all engineering workflows.
● Collaborate with stakeholders across product, marketing, and data science to translate business needs into scalable AI data systems.
● Well versed in architecting, designing and developing large scale OLTP and OLAP systems.
● Experience building and operating streaming systems using messaging systems like Kafka, Pub/sub, SQS etc.
● Experience building an RAG system with Google, OpenAI or another Gen AI platform.
● Experience building a knowledge graph using Neo4j, Spanner, Neptune or another tool is a plus
Qualifications:
● 10+ years of experience in data engineering, with significant experience building large-scale, distributed data systems to support Data analysis, AI/ ML and key business use cases.
● Proven expertise in content classification, tagging, and ontology/taxonomy development, especially using NLP and semantic techniques.
● Strong coding and data architecture skills using Typescript, Python, SQL, and tools like Apache Spark, Kafka, Airflow, Node Js, and cloud-native platforms (e.g., AWS, GCP, or Azure).
● Hands-on experience integrating ML models into production environments for tasks such as entity extraction, text classification, or semantic search.
● Deep understanding of working with unstructured data (text, images, video), metadata enrichment, and knowledge graph integration.
● Experience managing and mentoring distributed/offshore engineering teams, with a track record of driving execution across time zones.
● Excellent communication and collaboration skills, with the ability to bridge technical execution and business strategy.
Preferred Qualifications:
● Experience in digital media, publishing, ad tech, or content platforms.
● Bachelor’s , Master’s or Ph.D. in Computer Science, Data Engineering, or a related field.
● Knowledge of LLMs and generative AI in applied settings (e.g., content summarization, auto-tagging, retrieval augmentation). ● Working experience with OLAP and OLTP systems is a plus
In accordance with applicable law, Hearst is required to include a reasonable estimate of the compensation for this role if hired in New York City. The reasonable estimate, if hired in New York City, is $325,000-$350,000. Please note this information is specific to those hired in New York City. If this role is open to candidates outside of New York City, the salary range would be aligned to that specific location. A final decision on the successful candidate’s starting salary will be based on a number of permissible, non-discriminatory factors, including but not limited to skills and experience, training, certifications, and education. Hearst provides a competitive benefits package, including medical, dental, vision, disability, and life insurance, 401(k), paid holidays and paid time off, employee assistance programs, and more.