EY

Data Engineer (Graph, Vector & Data Platform) - Senior Associate, AI & Data, Technology Consulting

EY
ConsultingSG, 048583OnsitePosted 4 weeks ago

About the role

This Senior Associate role at EY focuses on designing, building, and operating scalable data solutions utilizing Neo4j graph platforms and vector databases. The position involves managing modern data repositories using Apache Iceberg on Nutanix hybrid infrastructure to support advanced AI and analytics use cases.

ConsultingOnsite

Key Responsibilities

  • Design and implement graph data models and entity networks using Neo4j
  • Develop and optimize Cypher queries for relationship and network analysis
  • Build and maintain vector databases using Postgres (pgvector), Milvus, or Weaviate
  • Implement embedding ingestion pipelines for similarity and semantic search use cases
  • Design and manage data repositories / lakehouse layers using Apache Iceberg
  • Develop data ingestion and transformation pipelines from multiple source system
  • Deploy and operate databases and data platforms on Nutanix infrastructure
  • Ensure performance tuning, scalability, availability, and fault tolerance
  • Implement data quality checks, monitoring, and error handling
  • Collaborate with AI/ML, analytics, and application teams

Requirements

  • Bachelor's degree in computer science, Information Systems, or related field
  • Minimally 4 years of hands-on experience in Data Engineering
  • Strong hands-on experience with Neo4j (Graph DB)
  • Proven experience in entity network analysis and relationship-based modeling
  • Hands-on experience with vector databases: Postgres (pgvector), Milvus, and/or Weaviate
  • Strong SQL skills and experience with complex data transformations
  • Experience designing data lakes / lakehouse architectures
  • Hands-on experience with Apache Iceberg or similar table formats
  • Experience operating data platforms on Nutanix or comparable on-prem / hybrid infrastructure
  • Solid understanding of distributed systems and data storage concepts
  • Ideally Python or Java for data processing and integration
  • Experience with Spark, Kafka, or Flink
  • Experience with LLM / AI pipelines and embedding generation
  • Exposure to DevOps / CI-CD for data platforms