Production-Grade Async Multi-Agent Order Tracking with Google ADK, LiteLLM, and Monocle

Five specialist AI agents answer a single order question at the same time using Python asyncio and Google ADK. Total wall-clock time ≈ the slowest agent, not the sum of all five. Every agent, tool call, and LLM request is fully traced with Monocle and shipped to Okahu for observability. Most agent tutorials call one LLM at a time, sequentially. Real order-tracking needs answers from several independent systems — OMS, carrier API, warehouse, payment gateway — and there is no reason to wait for them one after another. This project implements the "many at once" pattern: each system is modeled as an independent ADK Agent with its own tools, all launched concurrently with asyncio.gather , with every step traced end-to-end for debugging and evaluation. Key Takeaways asyncio.gather cuts wall-clock time to the slowest agent, not the sum of all agents Google ADK's Agent + Runner provides a clean structure for domain-specialist agents LiteLLM as a model gateway lets you swap OpenAI / Gemini / Anthropic without code changes Monocle auto-instruments every agent, tool call, and LLM inference — zero manual span code Distinct session_id per agent prevents state collisions in concurrent ADK runs return_exceptions=True on asyncio.gather means one failing agent never aborts the batch Why This Project Exists Most agent tutorials call one LLM at a time, sequentially. Real order-tracking needs answers from several independent systems (OMS, carrier API, warehouse, payment gateway) and there is no reason to wait for them one after another. This repo implements the "many at once" pattern: Each system is modeled as an independent ADK Agent with its own tools All agents are launched concurrently with asyncio.gather A coordinator/support agent can optionally fan-out then summarize Every step is traced end-to-end for debugging and evaluation Tech Stack | Layer | Technology | Role | |----|----|----| | Orchestration | Python asyncio | Run agents concurrently (gather, as completed) | | Agent framework | Google ADK (google-adk, google-adk[a2a]) | Agent, Runner, InMemorySessionService | | Model gateway | LiteLLM (google.adk.models.lite llm.LiteLlm) | One interface to OpenAI / Gemini / Anthropic / etc. | | Content types | google-genai | types.Content / types.Part message construction | | Tooling protocol | MCP (mcp) | Tool/transport plumbing used by ADK | | Observability | Monocle (monocle_apptrace) + Okahu | Auto-instrumented traces of agents, tools, LLM calls | | Packaging / runner | uv + hatchling | Env management, track-order console script | Repository Layout order_tracking_asyncio/ ├── pyproject.toml # Deps (ADK, LiteLLM, genai, mcp, monocle), build, scripts ├── uv.lock # Locked dependency graph ├── README.md ├── .gitignore ├── .github/ │ └── instructions/ │ └── okahu.instructions.md ├── .monocle/ # Monocle trace output (JSON spans) at runtime └── order_tracking/ ├── __init__.py # Public exports (root_agent, agents, ORDER_AGENTS) ├── agent.py # 5 specialist agents + root coordinator (LiteLLM) ├── tools.py # Mock OMS/WMS/carrier/payment tools (deterministic) ├── async_runner.py # asyncio orchestration + Monocle setup + CLI entrypoint └── .env.example # Model + provider credentials template System Architecture User / CLI (order id) | async_runner.py — asyncio orchestrator | asyncio.gather | ----------------------------------------------- | | | | | order_ shipping_ inventory_ payment_ support_ status_ agent agent agent agent agent | | | | | ----------------------------------------------- | | tools.py LiteLLM -> LLM provider | Monocle (monocle_apptrace) | | .monocle/*.json Okahu Key idea: the orchestrator never blocks on a single agent. ADK runs each agent's reasoning loop; the agent decides to call a tool; the tool returns deterministic mock data; the agent uses the LLM via LiteLLM to phrase the answer. Monocle wraps all of it transparently. Agent Topology There are five specialist agents plus a root coordinator (used for adk web . interactive testing). | Agent | Responsibility | Tools | |----|----|----| | order status agent | Fulfilment status, placed date, items | get order status | | shipping agent | Carrier, tracking number, ETA | get shipping details | | inventory agent | On-hand stock + reorder flag per SKU | check inventory | | payment agent | Auth/capture/failed, amount, currency | get payment status | | support agent | Friendly customer-facing summary | get order status, get shipping details, get payment_status | All agents are collected in ORDER_AGENTS for the concurrent runner. Models are configured through LiteLLM : model = LiteLlm(model=os.getenv("SHIPPING_MODEL", "gpt-4o")) So you can point every agent at any provider supported by LiteLLM without touching agent code. Concurrency Model The concurrency design is the heart of this project. Each agent task gets its own coroutine and a distinct session_id so concurrent agents never collide on ADK session state. async def run_all_agents(order_id: str) -> list[AgentResult]: prompt = f"What is the status of order {order_id}?" tasks = [ run_agent_task(agent, prompt, session_id=f"{agent.name}-{uuid4()}") for agent in ORDER_AGENTS ] results = await asyncio.gather(*tasks, return_exceptions=True) return results Why return_exceptions=True matters: one failing agent — say the payment provider is down — does not abort the shipping, inventory, and status responses. Each agent's error is captured into its own AgentResult , and the caller decides how to handle it. CLI | asyncio.gather |-- order_status_agent --> AgentResult |-- shipping_agent --> AgentResult (all run concurrently) |-- inventory_agent --> AgentResult |-- payment_agent --> AgentResult |-- support_agent --> AgentResult | list[AgentResult] (wall-clock time = slowest agent only) Observability with Monocle + Okahu Monocle is initialized at import time in async_runner.py , before any agents run: from monocle_apptrace import setup_monocle_telemetry setup_monocle_telemetry( workflow_name="order_tracking_asyncio", monocle_exporters_list="file,okahu", ) What you get automatically — zero manual span code: Workflow span for the entire run ( order_tracking_asyncio ) Agent spans for each ADK agent invocation Tool spans for get_order_status , get_shipping_details , etc. Inference spans for each LiteLLM/LLM call (model, token counts, latency) Trace Tree Structure workflow: order_tracking_asyncio agent: order_status_agent tool: get_order_status inference: LiteLLM -> OpenAI gpt-4o agent: shipping_agent tool: get_shipping_details inference: LiteLLM -> OpenAI gpt-4o agent: inventory_agent tool: check_inventory inference: LiteLLM -> OpenAI gpt-4o agent: payment_agent tool: get_payment_status inference: LiteLLM -> OpenAI gpt-4o agent: support_agent tool: get_order_status tool: get_shipping_details tool: get_payment_status inference: LiteLLM -> OpenAI gpt-4o Exporters file exporter — writes spans to .monocle/monocle_trace_*.json for local debugging and diffing runs okahu exporter — ships traces to Okahu for search, workflow grouping, and evaluation End-to-End Request Flow User | order id (e.g. ORD-1001) async_runner.run_all | start workflow span (Monocle) ADK Runner | create session + run_async(prompt) specialist agent (reasoning loop) | decide next step LiteLLM | chat completion -> "call get_shipping_details(ORD-1001)" tools.py | get_shipping_details("ORD-1001") -> {carrier, tracking, eta} LiteLLM (phrase final answer) | chat completion -> natural-language response ADK Runner | final response event -> AgentResult async_runner | close spans -> file + okahu User (printed results) Data Model (Mock Backends) tools.py ships deterministic mock data so runs and evaluations are reproducible. Orders | Order | Status | Carrier | Tracking | |----|----|----|----| | ORD-1001 | shipped | UPS | tracking number present | | ORD-1002 | processing | — | not yet shipped | | ORD-1003 | delivered | FedEx | tracking number present | Inventory | SKU | Stock state | |----|----| | SKU-APL-01 | in stock | | SKU-KBD-07 | low (reorder flag) | | SKU-MON-22 | out of stock | Each tool returns a {"status": "success" | "error", ...} dict so agents can gracefully report missing orders or SKUs. Setup Prerequisites: Python >= 3.10 and uv # 1. Install dependencies into a local .venv uv sync # 2. Create your env file from the template cp order_tracking/.env.example order_tracking/.env # 3. Edit .env and add your provider key Configuration Models are resolved through LiteLLM. The default model comes from the SHIPPING_MODEL environment variable (falls back to gpt-4o ): model_name = os.getenv("SHIPPING_MODEL", "gpt-4o") # every agent uses: model=LiteLlm(model=model_name) order_tracking/.env.example covers the common providers: # LiteLLM model selection SHIPPING_MODEL=gpt-4o # OpenAI (default) OPENAI_API_KEY=your-openai-key # Google ADK / Gemini (AI Studio) GOOGLE_GENAI_USE_VERTEXAI=False GOOGLE_API_KEY=your-ai-studio-api-key # Google Vertex AI (production) # GOOGLE_GENAI_USE_VERTEXAI=True # GOOGLE_CLOUD_PROJECT=your-gcp-project # GOOGLE_CLOUD_LOCATION=global # Okahu observability OKAHU_API_KEY=your-okahu-api-key Pick credentials that match your SHIPPING_MODEL . If you set SHIPPING_MODEL=gemini/gemini-2.0-flash , set GOOGLE_API_KEY . For gpt-4o , set OPENAI_API_KEY . Running the Service Three Orchestration Patterns Pattern 1 — CLI batch (all 5 agents, concurrent): uv run python -m order_tracking.async_runner ORD-1001 Pattern 2 — Console script shorthand: uv run track-order ORD-1001 Pattern 3 — Interactive ADK web UI: adk web . # opens browser chat -> talk to root_agent -> it routes to specialists Expected Output === Order Tracking Results for ORD-1001 === [order_status_agent] Status: shipped | Items: 2 | Placed: 2024-01-15 [shipping_agent] Carrier: UPS | Tracking: 1Z999AA1012345678 | ETA: 2024-01-18 [inventory_agent] SKU-APL-01: 150 units in stock SKU-KBD-07: 3 units — reorder recommended [payment_agent] Status: captured | Amount: $299.99 USD [support_agent] Your order ORD-1001 has shipped via UPS (tracking: 1Z999AA1012345678) and is expected to arrive by January 18th. Payment of $299.99 confirmed. Wall-clock time: 1.34s (sequential estimate: ~6.7s) Traces written to: .monocle/monocle_trace_20240116_143022.json Production-Grade Considerations Isolation — no shared state between agents session_id = f"{agent.name}-{uuid4()}" session_service = InMemorySessionService() runner = Runner(agent=agent, session_service=session_service, ...) Each agent gets its own Runner and InMemorySessionService . Concurrent agents sharing a session service would produce corrupted conversation history. Fault isolation — one agent failure never breaks others results = await asyncio.gather(*tasks, return_exceptions=True) for result in results: if isinstance(result, Exception): log.warning("Agent failed: %s", result) Model-agnostic via LiteLLM Swap models without touching agent code: SHIPPING_MODEL=gemini/gemini-2.0-flash uv run track-order ORD-1001 SHIPPING_MODEL=anthropic/claude-3-5-sonnet-20241022 uv run track-order ORD-1001 SHIPPING_MODEL=gpt-4o uv run track-order ORD-1001 Observability-first via Monocle The one-line setup captures traces across the entire async execution tree — agent spans nest correctly under the workflow span even though agents run concurrently: setup_monocle_telemetry( workflow_name="order_tracking_asyncio", monocle_exporters_list="file,okahu", ) Extending the Project | Extension | How | |----|----| | Add a new agent | Create an Agent in agent.py, add tools to tools.py, append to ORDER AGENTS | | Real backends | Replace mock dicts in tools.py with actual HTTP calls (httpx, aiohttp) | | Different LLM per agent | Set per-agent env vars and pass different LiteLlm(model=…) instances | | FastAPI service | Wrap run all agents in a POST endpoint — it's already async | | Streaming results | Switch from asyncio.gather to asyncio.as completed to print results as they arrive | | Evaluation | Monocle traces feed directly into Okahu eval pipelines — no extra instrumentation needed | Troubleshooting ModuleNotFoundError: google.adk uv sync # re-run to pick up all extras including google-adk[a2a] AuthenticationError from LiteLLM cat order_tracking/.env # verify the right key for your SHIPPING_MODEL Agents return empty responses python -c "import litellm; print(litellm.model_list)" Monocle traces not appearing in .monocle/ grep -n "setup_monocle_telemetry" order_tracking/async_runner.py # confirm it's called before the first agent run Summary | What we built | Key design decision | |----|----| | 5 concurrent specialist agents | asyncio.gather over sequential calls | | Domain isolation | Independent Runner + session id per agent | | Model-agnostic | LiteLLM as unified gateway | | Full observability | Monocle auto-instrumentation, zero manual spans | | Fault tolerance | return exceptions=True isolates agent failures | | Reproducible testing | Deterministic mock backends in tools.py | The pattern here — concurrent domain agents, model-agnostic LLM gateway, automatic distributed tracing — maps directly onto production AI systems handling real order management, customer support, and operations workloads. Source Code 👉 github.com/anjijava16/order tracking asyncio About the Author Anjaiah Methuku ( @anjaiahspr ) is a Sr Software Engineer (Data & AI) at JPMorgan Chase, working on AI agent infrastructure, observability tooling, and open-source contributions to the Monocle project under the Linux Foundation AI & Data umbrella. Follow on HackerNoon for more deep dives on production AI systems. Originally published on HackerNoon . Republication permitted with attribution. \

View original source — Hacker Noon ↗

ShareShare on X Share on Facebook