Reference Architecture: Enterprise AI Agents for BI | Vol. 02

Architecture Overview

A production-ready AI agent for Business Intelligence requires four interconnected layers: event ingestion, data processing, semantic governance, and AI interaction. Each layer serves a specific purpose while working together to deliver trusted, actionable intelligence.

Enterprise AI Agent Architecture Stack

Interaction Layer

Vertex AI Gemini

Natural Language API

Slack / Teams

Custom Apps

Semantic Layer

Looker (LookML)

Metric Definitions

Access Controls

Caching (PDTs)

Data Layer

BigQuery

Real-time Tables

Historical Data

ML Models

Ingestion Layer

Cloud Pub/Sub

Dataflow

Cloud Functions

Scheduled Jobs

Component Deep Dive

Cloud Pub/Sub

Event Ingestion

Pub/Sub serves as the nervous system of your AI agent, capturing events from across your enterprise in real-time.

Transaction events from POS systems
IoT sensor data streams
Application logs and metrics
External API webhooks
User activity events

BigQuery

Data Processing & ML

BigQuery processes both real-time and historical data, combining storage efficiency with ML capabilities.

Streaming inserts for real-time data
Partitioned tables for cost efficiency
BigQuery ML for in-database predictions
Remote functions for custom logic
Materialized views for performance

Looker (LookML)

Semantic Governance

LookML defines the semantic layer that grounds AI agents in business truth, preventing hallucination.

Centralized metric definitions
Dimension and measure governance
Row-level security controls
PDTs for pre-aggregation
Data freshness with Datagroups

Vertex AI Gemini

AI Interaction

Gemini powers natural language understanding and generation, grounded by the semantic layer.

Natural language to LookML translation
Context-aware conversations
Explanation generation
Recommendation synthesis
Multi-modal data understanding

Integration Pattern: Grounded Agent Flow

The key innovation in this architecture is the grounding loop - every AI response passes through the semantic layer to ensure accuracy and governance compliance.

Python - Grounded Agent Implementation Pattern

from google.cloud import bigquery
from vertexai.generative_models import GenerativeModel
import looker_sdk

class GroundedBIAgent:
    """
    AI Agent grounded in LookML semantic layer
    """
    def __init__(self):
        self.model = GenerativeModel("gemini-1.5-pro")
        self.looker = looker_sdk.init40()
        self.bq_client = bigquery.Client()

    def process_query(self, user_question: str) -> dict:
        """
        Process natural language query with semantic grounding
        """
        # Step 1: Get available semantic context from Looker
        explores = self.looker.all_lookml_models()
        context = self._build_semantic_context(explores)

        # Step 2: Use Gemini to interpret query with semantic grounding
        grounded_prompt = f"""
        You are a BI analyst assistant. Answer ONLY using the
        semantic definitions provided. If asked about metrics not
        defined here, say you cannot answer.

        AVAILABLE SEMANTIC DEFINITIONS:
        {context}

        USER QUESTION: {user_question}

        Generate a Looker API query specification (NOT raw SQL).
        Include: model, view, fields, filters, sorts.
        """

        response = self.model.generate_content(grounded_prompt)
        query_spec = self._parse_query_spec(response.text)

        # Step 3: Execute via Looker (inherits all governance)
        result = self.looker.run_inline_query(
            result_format="json",
            body=query_spec
        )

        # Step 4: Generate natural language explanation
        explanation = self._generate_explanation(
            question=user_question,
            data=result,
            query_spec=query_spec
        )

        return {
            "answer": explanation,
            "data": result,
            "query_used": query_spec,
            "governance": {
                "grounded": True,
                "semantic_model": query_spec.get("model"),
                "access_validated": True
            }
        }

    def _build_semantic_context(self, explores):
        """
        Build semantic context from LookML definitions
        """
        context = []
        for explore in explores:
            context.append(f"""
            Explore: {explore.name}
            Description: {explore.description}
            Available Dimensions: {[d.name for d in explore.dimensions]}
            Available Measures: {[m.name for m in explore.measures]}
            """)
        return "\n".join(context)

Semantic Layer Configuration

The LookML semantic layer must be designed with AI agents in mind. This means rich descriptions, clear naming conventions, and comprehensive metadata.

LookML - Agent-Optimized Semantic Definitions

# Model file with agent-friendly documentation
explore: orders {
  label: "Customer Orders"
  description: "All customer orders including status, revenue, and fulfillment.
                Use this explore to answer questions about sales, revenue,
                order volume, and customer purchasing patterns."

  # Define valid joins for the agent
  join: customers {
    type: left_outer
    sql_on: ${orders.customer_id} = ${customers.id} ;;
    relationship: many_to_one
  }

  join: products {
    type: left_outer
    sql_on: ${order_items.product_id} = ${products.id} ;;
    relationship: many_to_one
  }
}

view: orders {
  sql_table_name: analytics.fct_orders ;;

  # Agent needs clear descriptions for every field
  dimension: order_id {
    primary_key: yes
    type: number
    sql: ${TABLE}.order_id ;;
    description: "Unique identifier for each order"
  }

  dimension_group: order {
    type: time
    timeframes: [raw, date, week, month, quarter, year]
    sql: ${TABLE}.order_timestamp ;;
    description: "When the order was placed. Use for time-based analysis."
  }

  dimension: order_status {
    type: string
    sql: ${TABLE}.status ;;
    description: "Current status: pending, processing, shipped, delivered, cancelled"
    suggestions: ["pending", "processing", "shipped", "delivered", "cancelled"]
  }

  # Measures with business context
  measure: total_revenue {
    type: sum
    sql: ${TABLE}.order_total ;;
    value_format_name: usd
    description: "Total order value in USD. Excludes cancelled and returned orders."
    filters: [order_status: "-cancelled, -returned"]

    # Tags help agents understand when to use this metric
    tags: ["revenue", "sales", "financial", "primary_metric"]
  }

  measure: order_count {
    type: count_distinct
    sql: ${TABLE}.order_id ;;
    description: "Number of unique orders placed"
    filters: [order_status: "-cancelled"]
    tags: ["volume", "count", "orders"]
  }

  measure: average_order_value {
    type: number
    sql: ${total_revenue} / NULLIF(${order_count}, 0) ;;
    value_format_name: usd
    description: "Average revenue per order (AOV). Key e-commerce metric."
    tags: ["aov", "average", "efficiency"]
  }
}

Implementation Challenges & Solutions

Critical Implementation Considerations

Implementing enterprise AI agents requires careful attention to governance, latency, and cost management. Rushing to production without proper guardrails leads to hallucination incidents, security vulnerabilities, and runaway costs.

Challenge 1: Latency vs. Accuracy Tradeoff

Real-time agent responses require fast semantic lookups. Solution: Pre-compute semantic context and cache LookML metadata. Use PDTs for frequently-requested aggregations.

Challenge 2: Cost Management

LLM API calls and BigQuery queries can escalate quickly. Solution: Implement query cost estimation before execution, use result caching, and set hard limits on concurrent agent sessions.

Challenge 3: Governance at Scale

As agents proliferate, maintaining consistent governance becomes complex. Solution: Centralize all semantic definitions in LookML, implement audit logging for every agent action, and use Looker's access controls.

Challenge 4: User Trust

Users need to understand and trust agent outputs. Solution: Always show the source - display the LookML query used, the data freshness, and any filters applied.

Implementation Readiness Checklist

Data Foundation

BigQuery data warehouse deployed
Data quality monitoring in place
Partitioning strategy defined
Real-time ingestion configured

Semantic Layer

LookML models documented
All metrics have descriptions
Access controls configured
PDTs optimized for common queries

AI Integration

Vertex AI project provisioned
Gemini API access enabled
Grounding prompts tested
Fallback responses defined

Governance

Audit logging enabled
Cost alerts configured
Human escalation paths defined
Model update procedures documented

Ready to Implement AI Agents?

Our team specializes in designing and deploying enterprise-grade AI agent architectures on Google Cloud. Let's discuss your use case.

Schedule Architecture Review