Haleos Atlas - The modern engine driving apex founders. Frictionless capital and growth acceleration.

About Haleos

We're building the future of how startups are built and scaled. Haleos is developing a comprehensive AI-powered platform that transforms the startup journey through intelligent guidance and systematic approaches. We're solving fundamental challenges in the entrepreneurial ecosystem using cutting-edge artificial intelligence.

The Role

As an AI Research Engineer in Data Infrastructure, you will architect and implement the data engine that powers our AI systems. You'll design robust pipelines that collect, process, and make accessible the diverse data streams flowing through our platform. Your work will enable our AI to deliver increasingly sophisticated insights while maintaining enterprise-grade performance and privacy standards.

You Will

Design and implement scalable data pipelines that process millions of discrete data points across our platform
Build intelligent systems for real-time data synchronization across multiple third-party integrations (CRM, financial, communication, and development tools)
Architect vector database solutions optimized for semantic search and contextual retrieval at scale
Develop ETL systems that transform heterogeneous business data into structured, queryable formats
Create automated data validation and quality assurance systems to ensure training data integrity
Build monitoring and observability tools for data pipeline health and performance optimization
Collaborate with ML engineers to ensure training datasets meet model requirements and performance benchmarks
Implement privacy-preserving data isolation architectures that maintain strict user namespace separation

Must Have

Strong experience building production-grade data pipelines and ETL systems
Proficiency with vector databases (Pinecone, Weaviate, or similar) and embedding-based retrieval systems
Experience designing data architectures that span multiple integration points and maintain real-time synchronization
Demonstrated ability to optimize data systems for sub-20ms query response times
Strong understanding of database schema design, indexing strategies, and query optimization
Experience with modern data stack tools and frameworks (Airflow, dbt, or similar)
Proficiency in Python and SQL; experience with TypeScript is a plus
Familiarity with API integration patterns and webhook-based data ingestion

Nice to Have

Experience with RAG (Retrieval-Augmented Generation) systems and LLM data architectures
Background in building data infrastructure for SaaS platforms with complex multi-tenancy requirements
Knowledge of data anonymization and privacy-preserving techniques
Experience with cloud infrastructure (AWS, GCP) and containerization (Docker, Kubernetes)
Understanding of startup operations and business intelligence systems

Benefits & Compensation

Competitive salary and equity package
Health, dental, and vision insurance
401(k) with company match
Flexible PTO policy
Remote-friendly work environment
Professional development budget
Opportunity to shape foundational technology at an early-stage company

Note: Due to our current stealth development phase, additional product details will be shared during the interview process.

Apply

AI Research Engineer, Data Infrastructure

Austin, TX