ScoutGPT
Agentic AI pipeline over 200M+ row OLAP dataset
Parcyl.ai
Problem
Problem
Parcyl.ai needed a way for non-technical users to query and derive insights from a 200M+ row analytical dataset that spanned multiple data domains. Traditional BI tools required SQL expertise and couldn't handle follow-up questions or multi-step reasoning. The challenge was building an agent that could decompose natural language queries into DuckDB-optimized SQL, handle ambiguity, and return explainable results — all with sub-3-second latency on a dataset this size.
Approach
Approach
I built ScoutGPT as a full agentic loop: a LangChain agent with a DuckDB "tool" that generates and executes SQL queries, a Dagster orchestration layer that ingests and transforms data from Snowflake on a schedule, and a Next.js + FastAPI frontend/backend that streams responses. The key insight was using DuckDB's OLAP column store locally rather than querying Snowflake directly — this cut query latency from 8–12s to under 1s for most analytical queries. The agent uses a chain-of-thought approach: it first understands intent, then generates SQL, executes it, interprets results, and formats them for a non-technical audience.
Architecture
Architecture
ScoutGPT — system diagram
Key Technical Decisions
Key Technical Decisions
DuckDB over direct Snowflake querying
Snowflake queries on 200M rows averaged 8–12s even with clustering keys. DuckDB's columnar in-process store brought this to <1s for the analytical patterns the agent generates. The trade-off is a daily Dagster sync to keep the local store fresh — acceptable for the use case.
LangChain ReAct agent over a fine-tuned model
A fine-tuned model would need frequent retraining as the schema evolved. ReAct + tool use with a strong system prompt proved more maintainable and more robust on novel query patterns. The schema is injected into the system prompt on every call.
Streaming responses via Server-Sent Events
Agent reasoning steps take 2–4s. Streaming the chain-of-thought incrementally made the UI feel responsive. Users see 'Thinking: converting to SQL → executing query → interpreting results' as it happens rather than waiting for a single response blob.
Results
Results
- ✓Query latency reduced from 8–12s (Snowflake direct) to <1s (DuckDB local)
- ✓Non-technical users can derive insights without writing SQL
- ✓Agent correctly interprets follow-up questions with conversation history
- ✓Dagster pipeline runs nightly with zero manual intervention
Tech Stack
Tech Stack
Links
Links