About Haleos

We're building the future of how startups are built and scaled. Haleos is developing a comprehensive AI-powered platform that transforms the startup journey through intelligent guidance and systematic approaches. We're solving fundamental challenges in the entrepreneurial ecosystem using cutting-edge artificial intelligence.

The Role

As an AI Research Engineer in Data Infrastructure, you will architect and implement the data engine that powers our AI systems. You'll design robust pipelines that collect, process, and make accessible the diverse data streams flowing through our platform. Your work will enable our AI to deliver increasingly sophisticated insights while maintaining enterprise-grade performance and privacy standards.

You Will

  • Design and implement scalable data pipelines that process millions of discrete data points across our platform

  • Build intelligent systems for real-time data synchronization across multiple third-party integrations (CRM, financial, communication, and development tools)

  • Architect vector database solutions optimized for semantic search and contextual retrieval at scale

  • Develop ETL systems that transform heterogeneous business data into structured, queryable formats

  • Create automated data validation and quality assurance systems to ensure training data integrity

  • Build monitoring and observability tools for data pipeline health and performance optimization

  • Collaborate with ML engineers to ensure training datasets meet model requirements and performance benchmarks

  • Implement privacy-preserving data isolation architectures that maintain strict user namespace separation

Must Have

  • Strong experience building production-grade data pipelines and ETL systems

  • Proficiency with vector databases (Pinecone, Weaviate, or similar) and embedding-based retrieval systems

  • Experience designing data architectures that span multiple integration points and maintain real-time synchronization

  • Demonstrated ability to optimize data systems for sub-20ms query response times

  • Strong understanding of database schema design, indexing strategies, and query optimization

  • Experience with modern data stack tools and frameworks (Airflow, dbt, or similar)

  • Proficiency in Python and SQL; experience with TypeScript is a plus

  • Familiarity with API integration patterns and webhook-based data ingestion

Nice to Have

  • Experience with RAG (Retrieval-Augmented Generation) systems and LLM data architectures

  • Background in building data infrastructure for SaaS platforms with complex multi-tenancy requirements

  • Knowledge of data anonymization and privacy-preserving techniques

  • Experience with cloud infrastructure (AWS, GCP) and containerization (Docker, Kubernetes)

  • Understanding of startup operations and business intelligence systems

Benefits & Compensation

  • Competitive salary and equity package

  • Health, dental, and vision insurance

  • 401(k) with company match

  • Flexible PTO policy

  • Remote-friendly work environment

  • Professional development budget

  • Opportunity to shape foundational technology at an early-stage company

Note: Due to our current stealth development phase, additional product details will be shared during the interview process.

Apply

AI Research Engineer, Data Infrastructure

Austin, TX

© 2026 Haleos, Inc.