How Ingest Works
A serverless, event-driven framework that moves data from any API into Snowflake Iceberg tables.
Pipeline Deployment
📩
Step 1
You request
Submit a pipeline request via the dashboard — tell us the API, the data you need, and the schedule.
🛠️
Step 2
We build
Our team (and AI agents) build the connector, configure PII masking, and deploy the pipeline. No work on your end.
📊
Step 3
Data flows
Query your data in Snowflake. Iceberg tables, PII masking, and real-time row counts — all ready to go.
The Ingest Pipeline
REST API → Queryable Table
📋
Stage 1
Request Generation
Declarative models define what API calls to make — endpoints, parameters, incremental date ranges. Scheduled on your cadence.
→ JSON request payloads
event-driven
🌉
Stage 2
Event Routing
Requests trigger processing automatically. Ordered delivery with deduplication. No polling, no idle compute.
→ Ordered message queue
queued
⚡
Stage 3
API Execution
Generic runtime dynamically assembles auth, pagination, rate limiting, and retry logic per connector. No code deploys per API.
→ Raw API responses
processed
🧊
Stage 4
Iceberg Tables
Incremental merge into Snowflake Iceberg tables. Open format — Parquet on S3. Full DML, time travel, schema evolution.
→ Queryable tables, no vendor lock-in
secured
🔒
Stage 5
PII Masking
Role-based column masking at the query layer. Full redaction, SHA-256 hashing, or IP anonymization. Clients never see raw PII.
→ Secure, compliant client access
🎛️ Self-tuning
Continuously monitors memory, duration, and error rates — auto-adjusts batch sizes. No manual tuning.
| Signal | Response |
|---|---|
| Low memory usage | ↑ Larger batches → fewer invocations → lower cost |
| High memory usage | ↓ Smaller batches → prevent failures |
| Timeout or OOM | ↓ Halve batch size immediately |
| Stuck at minimum | ⚠ Alert — requires investigation |
🔒 Security architecture
Built into the architecture, not bolted on.
🔗Zero passwords — OIDC-based identity federation for all service-to-service auth. Short-lived tokens only.
🔑Vault-managed credentials — API keys resolved at runtime from encrypted secret store. Never in code or config.
🏢Tenant isolation — Separate storage, databases, and IAM policies per client. Row-level security on control plane.
🪣Bring your own bucket — Data stays in your AWS account. Cross-account write via IAM role.
📈 Client dashboard
Full visibility into your pipelines — no black box.
📊
Row counts
Real-time per table
💚
Pipeline health
Status per connector
⏱️
Schedule control
Hourly, daily, or custom
💰
Usage & billing
Per-GB transparency