AquaGen AI Multi-Agent System

Complete Architecture & Build Plan

Company: FluxGen | Product: AquaGen | Version: 1.0 | Date: March 2026

1. Vision

Build a production-grade, domain-native AI system embedded inside the AquaGen customer dashboard that:

Answers any water-related question a customer asks in natural language
Analyses data across all 12 dashboard modules by calling existing APIs
Generates reports, graphs, and forecasts autonomously
Runs complex background pipelines without blocking the user
Scales from a single smart agent today to a full multi-agent system without rewriting anything

North Star

A customer opens the AquaGen dashboard and gets instant, accurate, conversational answers about their water data — as if a water domain expert is sitting next to them 24/7.

2. The 12 Dashboard Modules

These are the 12 existing AquaGen dashboard pages. Each maps to a specialist AI agent.

#	Module	Page Route	Agent Group
1	Dashboard (Overview)	`page-dashboard`	Core
2	Alerts	`page-alerts`	Core
3	Reports	`page-reports`	Core
4	Water Flow	`page-water-flow`	Water Operations
5	Water Quality	`page-water-quality`	Water Operations
6	Water Balance	`page-water-balance`	Water Operations
7	Water Stock	`page-water-stock`	Water Operations
8	Water Neutrality	`page-water-neutrality`	Sustainability
9	Rainwater	`page-rainwater`	Sustainability
10	Groundwater	`page-groundwater`	Sustainability
11	Energy	`page-energy`	Sustainability
12	UWI (Used Water Intelligence)	`page-uwi`	Intelligence

3. Agent Design

The 12 modules are grouped into 4 agent groups for efficient orchestration. Each group has a lead specialist agent that handles all modules within it.

3.1 Agent Groups Overview

3.2 Core Agent Group

Dashboard Agent
Alerts Agent
Reports Agent

Page: page-dashboard

Responsibility: Provide a unified overview of the customer's entire water system. The first agent activated on any session — always has the broadest snapshot.

Answers questions like:

"Give me a summary of my water system today"
"What's the overall health of my site?"
"What needs my attention right now?"

APIs it calls:

GET /dashboard/summary — site-wide KPI snapshot
GET /dashboard/health-score — overall system health
GET /dashboard/alerts-count — active alert counts by severity

Page: page-alerts

Responsibility: Manage, explain, and analyse all system alerts — leaks, anomalies, sensor failures, threshold breaches.

Answers questions like:

"What alerts do I have right now?"
"Why did the zone 3 alert trigger yesterday?"
"Which alerts have been unresolved the longest?"

APIs it calls:

GET /alerts/active — current open alerts
GET /alerts/history — past alerts with resolution status
GET /alerts/rules — configured alert thresholds
POST /alerts/acknowledge — mark alert reviewed

Page: page-reports

Responsibility: Compile multi-module data into structured reports. Always runs as a background job — never blocks the user.

Answers questions like:

"Generate my quarterly water report"
"Create a compliance report for the audit"
"Give me an executive summary of last month"

APIs it calls:

All module summary APIs (fan-out)
POST /reports/generate — trigger async report generation
GET /reports/history — list previously generated reports
GET /reports/download/{id} — retrieve completed report

3.3 Water Operations Agent Group

Water Flow Agent
Water Quality Agent
Water Balance Agent
Water Stock Agent

Page: page-water-flow

Responsibility: Monitor and analyse water flow rates, consumption patterns, zone comparisons, and spike detection.

Answers questions like:

"Why did my flow rate spike on the 14th?"
"Which zone has the highest flow right now?"
"Compare this month's flow to last quarter"

APIs it calls:

GET /water-flow/realtime — live flow readings by zone
GET /water-flow/summary — aggregated flow by period
GET /water-flow/anomalies — detected spikes and drops
GET /water-flow/zones — per-zone breakdown

Page: page-water-quality

Responsibility: Track water quality parameters (pH, TDS, turbidity, DO), trend analysis, threshold compliance, and improvement tracking.

Answers questions like:

"Is our water quality within safe ranges?"
"How did quality improve this quarter?"
"Which parameter is most out of range?"

APIs it calls:

GET /water-quality/current — live quality parameters
GET /water-quality/history — historical metrics by parameter
GET /water-quality/thresholds — configured safe ranges
GET /water-quality/zones — per-zone quality breakdown

Page: page-water-balance

Responsibility: Analyse inflow vs outflow, identify losses, track water accounting, and flag balance discrepancies.

Answers questions like:

"How much water are we losing in the system?"
"What is our water balance for this month?"
"Where is the biggest discrepancy between inflow and outflow?"

APIs it calls:

GET /water-balance/summary — inflow vs outflow vs loss
GET /water-balance/losses — identified loss points
GET /water-balance/trend — balance trend over time

Page: page-water-stock

Responsibility: Track water storage levels, tank capacities, depletion rates, and refill projections.

Answers questions like:

"How much water stock do we have left?"
"When will our tank reach critical levels?"
"What is our storage utilisation rate?"

APIs it calls:

GET /water-stock/current — current storage levels
GET /water-stock/capacity — tank capacity and utilisation
GET /water-stock/forecast — projected depletion timeline

3.4 Sustainability Agent Group

Water Neutrality Agent
Rainwater Agent
Groundwater Agent
Energy Agent

Page: page-water-neutrality

Responsibility: Track progress towards water neutrality goals, offsetting, net water consumption, and ESG targets.

Answers questions like:

"How close are we to water neutrality?"
"What is our net water consumption this year?"
"What actions would help us reach neutrality faster?"

APIs it calls:

GET /water-neutrality/status — neutrality score and gap
GET /water-neutrality/offsets — offset activities and credits
GET /water-neutrality/targets — configured ESG goals

Page: page-rainwater

Responsibility: Track rainwater harvesting, collection efficiency, utilisation, and contribution to overall water supply.

Answers questions like:

"How much rainwater did we harvest this month?"
"What is our rainwater utilisation rate?"
"How much did rainwater reduce our mains consumption?"

APIs it calls:

GET /rainwater/collection — harvested volumes by period
GET /rainwater/utilisation — usage of harvested water
GET /rainwater/forecast — projected collection based on weather

Page: page-groundwater

Responsibility: Monitor groundwater levels, extraction rates, recharge status, and regulatory compliance for groundwater use.

Answers questions like:

"What is our current groundwater extraction rate?"
"Is our groundwater usage within permitted limits?"
"How is the water table trending?"

APIs it calls:

GET /groundwater/levels — current and historical water table
GET /groundwater/extraction — extraction volumes
GET /groundwater/permits — regulatory permitted limits

Page: page-energy

Responsibility: Analyse energy consumption tied to water operations — pumping, treatment, distribution — and identify optimisation opportunities.

Answers questions like:

"How much energy does our water system consume?"
"Which pump is least energy efficient?"
"How can we reduce energy costs in water operations?"

APIs it calls:

GET /energy/consumption — energy use by equipment/zone
GET /energy/efficiency — energy per litre ratios
GET /energy/benchmarks — industry comparison metrics

3.5 Intelligence Agent Group

UWI Agent

Page: page-uwi

Responsibility: Analyse and provide intelligence on used water — wastewater volumes, recycling rates, reuse efficiency, treatment status, and discharge compliance. Cross-references Water Flow, Water Balance, and Compliance modules to give a complete picture of water after use.

Answers questions like:

"How much used water did we generate this month?"
"What percentage of our used water is being recycled?"
"Are our wastewater discharge levels within compliance?"
"How can we improve our used water reuse rate?"
"What is the treatment efficiency of our used water system?"

APIs it calls:

GET /uwi/volume — used water generated by period and zone
GET /uwi/recycling — recycled and reused water volumes
GET /uwi/treatment — treatment plant efficiency and status
GET /uwi/discharge — discharge volumes and compliance status
GET /uwi/reuse-rate — percentage of used water recovered
Cross-references /water-balance/summary and /compliance/status

Cross-Module Agent

The UWI Agent cross-references Water Balance, Compliance, and Water Neutrality data to give a complete closed-loop view of water — from intake through use to discharge or reuse.

4. System Architecture

4.1 High-Level Architecture

4.2 Request Routing Logic

4.3 Pre-fetch Strategy (Dashboard Load)

When a customer opens the dashboard, all module snapshots are fetched simultaneously in the background before the first question is asked.

4.4 Cross-Agent Communication for Complex Questions

5. Data & Context Management

5.1 Shared Context Object

Every agent reads from a shared context object stored in the Redis session. This is built from pre-fetched snapshots and updated incrementally as the conversation progresses.

interface CustomerContext {
  // Identity
  customerId: string;
  sessionId: string;
  siteId: string;

  // Pre-fetched snapshots (loaded on dashboard open)
  snapshots: {
    dashboard: DashboardSnapshot; // ~80 tokens
    alerts: AlertsSnapshot; // ~80 tokens
    waterFlow: WaterFlowSnapshot; // ~70 tokens
    waterQuality: WaterQualitySnapshot; // ~70 tokens
    waterBalance: WaterBalanceSnapshot; // ~60 tokens
    waterStock: WaterStockSnapshot; // ~60 tokens
    waterNeutrality: NeutralitySnapshot; // ~50 tokens
    rainwater: RainwaterSnapshot; // ~50 tokens
    groundwater: GroundwaterSnapshot; // ~50 tokens
    energy: EnergySnapshot; // ~50 tokens
    uwi: UWISnapshot; // ~60 tokens
  };

  // Conversation
  conversationHistory: Message[]; // Last 5 turns only
  fetchedAt: Date;
  totalTokensUsed: number;
}

5.2 Token Budget Per LLM Call

Component	Tokens
System prompt + agent persona	~300
Relevant snapshots (2–3 modules)	~200
RAG chunks (if knowledge needed)	~500
Targeted API summary (on demand)	~400
Conversation history (last 3 turns)	~300
User question	~50
Total target	< 2,000 tokens

Token Discipline

Never send raw data rows to the LLM. Your backend always pre-aggregates into summaries. The LLM reasons over summaries — your backend does the number crunching.

5.3 Summary API Contract

Every agent-facing API must return analysis-ready summaries.

✅ Correct — Summary API
❌ Wrong — Raw Data API

{
  "period": "last_30_days",
  "total_litres": 118420,
  "daily_avg": 3820,
  "peak": { "date": "2024-01-14", "value": 6200, "zone": "zone_3" },
  "trend": "up 12% vs previous month",
  "anomaly_count": 2,
  "zones": {
    "zone_1": { "avg": 1200, "trend": "stable" },
    "zone_3": { "avg": 980, "trend": "rising +18%" }
  }
}

[
  { "timestamp": "2024-01-01T00:00:00", "flow": 12.3, "zone": 1 },
  { "timestamp": "2024-01-01T00:05:00", "flow": 12.1, "zone": 1 },
  { "timestamp": "2024-01-01T00:10:00", "flow": 11.9, "zone": 1 }
  // ... 50,000 more rows — never send this to LLM
]

5.4 Agent Output Contract

Every agent returns the same standard structure so the Orchestrator can merge cleanly.

interface AgentResponse {
  agentId: string;
  module: string; // e.g. "water-flow"
  summary: string; // Plain English answer
  data: Record<string, unknown>; // Structured data used
  confidence: "high" | "medium" | "low";
  dataFreshness: "realtime" | "snapshot" | "cached";
  followupSuggestions: string[]; // 2-3 suggested next questions
  needsEscalation: boolean; // Flag for human support
  tokensUsed: number;
}

5.5 RAG Knowledge Base

Store and embed the following as vector documents for semantic retrieval:

Source	Used By
Water regulations (by region)	Compliance reasoning
AquaGen product manuals	Technical troubleshooting
UWI scoring methodology	UWI Agent explanation
Historical support Q&A	All agents
Water quality standards (WHO, BIS)	Water Quality Agent
Groundwater permit rules	Groundwater Agent
ESG / water neutrality frameworks	Neutrality Agent

6. Tech Stack

Existing stack — no replacement

The AquaGen frontend (React) and backend are already in production. The AI system adds a new Agent Gateway service alongside the existing backend — no rewriting, no migration.

AI Layer (New)
Backend (Existing + Extensions)
Frontend (Existing + Chat Widget)
Infrastructure (Existing)

Component	Technology	Reason
Primary LLM	Claude Sonnet 4.6 (`claude-sonnet-4-6`)	Best tool use + reasoning for water-domain questions
Classifier LLM	Claude Haiku 4.5 (`claude-haiku-4-5-20251001`)	Fast, cheap intent routing — classifies every query
Agent Framework	LangChain + LangGraph	ReAct loop, tool binding, state management
Embeddings	`text-embedding-3-small` (OpenAI)	RAG vector search for regulations and product docs
Vector DB	pgvector (existing Postgres) or Pinecone	Compliance / knowledge RAG
Tracing	LangSmith	Agent trace, token usage, latency monitoring

Component	Technology	Status
Existing Backend	Python (Flask/FastAPI)	Existing — no changes needed
AquaGen API	`https://prod-aquagen.azurewebsites.net`	Existing — all data comes from here
Agent Gateway	New FastAPI service (Python)	New — add alongside existing backend
Job Queue	Celery + Redis	New — for background report generation
Snapshot Cache	Redis	New — 2-min TTL for pre-fetched module snapshots
Session Store	Existing session management	Existing — JWT tokens reused as-is
Streaming	SSE (Server-Sent Events)	New endpoint — `/agent/chat` on gateway

Component	Technology	Status
Dashboard	React (existing)	Existing — no changes
Chat Widget	New React component	New — single component added to dashboard
Streaming UI	`EventSource` API (built into browser)	New — consumes SSE stream from agent gateway
Notifications	Existing notification system	Existing — reuse for background job alerts

The chat widget is a single React component dropped into the existing dashboard layout. It calls /agent/chat and renders the streaming response. No changes to existing pages or routes.

Component	Technology	Status
Hosting	Azure App Service (existing)	Existing — agent gateway deployed as new app service
Auth	JWT from AquaGen login (`/api/user/user/login`)	Existing — no new auth layer
Database	Azure Cosmos DB (existing)	Existing — conversation history stored here
Container Registry	`fluxgen.azurecr.io` (existing)	Existing — agent gateway image pushed here
Secrets	Azure Key Vault / App Service env vars	Existing pattern — add LLM API keys

7. Phased Build Plan

Phase 1 — Foundation (Weeks 1–4)

Goal: Working agent in dashboard answering real customer questions

Build Summary API adapter layer for all 12 modules
Build pre-fetch service — parallel snapshot loading on dashboard open
Build shared context object + Redis session store
Build Dashboard Agent + Alerts Agent (most common questions)
Build chat UI with SSE streaming
Intent classifier — route to correct module
Deploy to staging, test with real customer data

Phase 1 Outcome

Customers can ask dashboard overview and alert questions and get instant, accurate answers streamed in real time.

Phase 2 — Full Agent Pool (Weeks 5–10)

Goal: All 12 modules live, cross-module questions handled

Water Flow Agent + Water Quality Agent + Water Balance Agent + Water Stock Agent
Water Neutrality Agent + Rainwater Agent + Groundwater Agent + Energy Agent
UWI Agent (cross-references all modules)
Orchestrator for multi-module question routing
Parallel execution — independent agents run simultaneously
Conversation memory — last 5 turns retained
Follow-up question suggestions after each response

Phase 2 Outcome

Full Q&A across all 12 modules. Cross-module questions answered in under 1.5 seconds.

Phase 3 — Reports & Background Jobs (Weeks 11–14)

Goal: Autonomous report generation, proactive AI alerts

Reports Agent + Celery job queue
Full background pipeline: all agents → Report Agent → PDF/Excel output
Push notifications when reports are ready
Proactive suggestions based on data ("2 active leaks detected — want an analysis?")
Graph data generation from Trend + UWI agents

Phase 3 Outcome

Full autonomous report generation. Customers receive proactive AI-driven insights without asking.

Phase 4 — Fine-Tuning (Months 4–9)

Goal: AquaGen-native model — faster, cheaper, proprietary

Collect conversation data from Phases 1–3 (with customer consent)
Build training dataset: water-domain Q&A pairs + module-specific examples
Fine-tune Llama 3.1 8B or Mistral 7B on AquaGen domain data
A/B test fine-tuned model vs Claude Sonnet on accuracy benchmarks
Gradually shift routine queries to fine-tuned model ("AquaGen Mini")
Keep Claude Sonnet for complex cross-module reasoning

Phase 4 Outcome

"AquaGen Mini" — FluxGen's own domain-tuned model. Lower cost per query, faster response, proprietary advantage.

Phase 5 — Intelligence Platform (Months 9–18)

Goal: Predictive, proactive, fully autonomous water intelligence

Proactive anomaly detection — AI alerts before customer notices
Predictive maintenance scheduling
Automated compliance monitoring with early warning
Cross-customer benchmarking (anonymised)
Multi-tenant internal agent — FluxGen team monitors all customer sites via AI
FluxGen Water Intelligence API — expose as a product to water industry partners

8. Latency Targets

Scenario	Target	Mechanism
Simple question — answered from snapshot	< 500ms	No extra API call, stream immediately
Single module — needs one API call	600–900ms	One summary API + stream
Multi-module — parallel agents	900ms–1.5s	Parallel execution + stream
Complex deep analysis	1.5–2.5s	Deep API call + extended reasoning
Report generation	2–4 minutes	Background job, user not blocked

9. Observability

Every agent call is fully traced:

    Trace: question_id = "q_8f3a2"
├── Question: "Why did our used water reuse rate drop last month?"
├── Intent classified: uwi + water-balance + compliance (312ms)
├── Snapshot cache hits: ✅ uwi, ✅ water-balance / ❌ water-flow (fetched)
├── Agents run: 3 (parallel)
│   ├── WaterFlowAgent: 620ms, 1,840 tokens
│   ├── WaterBalanceAgent: 410ms (from snapshot), 980 tokens
│   └── UWIAgent: 680ms, 2,100 tokens
├── Orchestrator merge: 180ms
├── Total latency: 1,090ms
├── Total tokens: 4,920 (input) / 640 (output)
└── User feedback: 👍

Tools: Langfuse (open source) for tracing. Alert on latency > 3s, tokens > 6,000, confidence = low, needsEscalation = true.

10. Security & Privacy

Concern	How It's Handled
Data isolation	All API calls authenticated with customer JWT — agents only access that customer's data
No raw data in LLM	Backend always aggregates before sending to LLM
Conversation storage	Stored server-side in Redis, never in browser
RAG knowledge base	Contains only non-PII regulatory and product docs
API key security	All LLM calls go through your backend — keys never exposed to frontend
Cross-customer isolation	Agent sessions are strictly scoped to `customerId` + `sessionId`

11. Future Vision — FluxGen Water Intelligence Platform

Once the agent system is mature, it becomes a platform, not just a feature.

12. What to Build First

Most Important First Step

Build the Summary API layer before anything else. Every agent, every optimisation, every future feature depends on your backend returning clean aggregated summaries — not raw data rows.

Week 1 Action Items:

Define and build summary endpoints for all 12 modules (/module/summary)
Build the pre-fetch service that calls them in parallel on dashboard load
Wire up a basic chat UI with SSE streaming
Build the Dashboard Agent and Alerts Agent with a focused system prompt
Deploy to staging with one real customer's data

That delivers a working, impressive agent in two weeks — and every phase after is purely additive with zero rework.

Document maintained by FluxGen Engineering. Update this document as modules evolve and new agents are added.

Complete Architecture & Build Plan​

1. Vision​

2. The 12 Dashboard Modules​

3. Agent Design​

3.1 Agent Groups Overview​

3.2 Core Agent Group​

3.3 Water Operations Agent Group​

3.4 Sustainability Agent Group​

3.5 Intelligence Agent Group​

4. System Architecture​

4.1 High-Level Architecture​

4.2 Request Routing Logic​

4.3 Pre-fetch Strategy (Dashboard Load)​

4.4 Cross-Agent Communication for Complex Questions​

5. Data & Context Management​

5.1 Shared Context Object​

5.2 Token Budget Per LLM Call​

5.3 Summary API Contract​

5.4 Agent Output Contract​

5.5 RAG Knowledge Base​

6. Tech Stack​

7. Phased Build Plan​

Phase 1 — Foundation (Weeks 1–4)​

Phase 2 — Full Agent Pool (Weeks 5–10)​

Phase 3 — Reports & Background Jobs (Weeks 11–14)​

Phase 4 — Fine-Tuning (Months 4–9)​

Phase 5 — Intelligence Platform (Months 9–18)​

8. Latency Targets​

9. Observability​

10. Security & Privacy​

11. Future Vision — FluxGen Water Intelligence Platform​

12. What to Build First​

Complete Architecture & Build Plan

1. Vision

2. The 12 Dashboard Modules

3. Agent Design

3.1 Agent Groups Overview

3.2 Core Agent Group

3.3 Water Operations Agent Group

3.4 Sustainability Agent Group

3.5 Intelligence Agent Group

4. System Architecture

4.1 High-Level Architecture

4.2 Request Routing Logic

4.3 Pre-fetch Strategy (Dashboard Load)

4.4 Cross-Agent Communication for Complex Questions

5. Data & Context Management

5.1 Shared Context Object

5.2 Token Budget Per LLM Call

5.3 Summary API Contract

5.4 Agent Output Contract

5.5 RAG Knowledge Base

6. Tech Stack

7. Phased Build Plan

Phase 1 — Foundation (Weeks 1–4)

Phase 2 — Full Agent Pool (Weeks 5–10)

Phase 3 — Reports & Background Jobs (Weeks 11–14)

Phase 4 — Fine-Tuning (Months 4–9)

Phase 5 — Intelligence Platform (Months 9–18)

8. Latency Targets

9. Observability

10. Security & Privacy

11. Future Vision — FluxGen Water Intelligence Platform

12. What to Build First