Vodafone SuperTOBi: How a 300-Million-Customer Telco Rebuilt Its AI Customer Service From the Ground Up

What Was Vodafone’s Customer Service Problem Actually Costing?

Vodafone handles customer service across 300 million subscribers in markets from Germany to Ghana. At the center of that operation for nearly a decade sat a chatbot called TOBi, first built on IBM Watson, later rebuilt on Microsoft Azure’s LUIS (Language Understanding Intelligent Service). TOBi was capable, for its time, of matching keywords to predefined intents and deflecting routine queries away from human agents. That was also, largely, the problem.

A chatbot designed to deflect is not the same thing as one designed to resolve. In Vodafone Portugal, the most closely documented market before the SuperTOBi rebuild, first-contact resolution for appointment booking sat at 15%. That means 85 out of every 100 customers who called about scheduling a technician visit either failed to complete the task, were transferred to a human, or called back. Each of those outcomes costs money and damages the customer relationship. The chatbot also timed out sessions after two minutes of inactivity, effectively requiring customers to stay engaged just to keep the conversation alive. Those design choices reveal something about what the system was optimized for: deflection volume, not resolution quality.

The 2024 shift to generative AI was not a technology upgrade in the conventional sense. It was a change of objective.

What Is SuperTOBi and How Does It Actually Work?

SuperTOBi is Vodafone’s generative AI customer service system, built on Microsoft Azure OpenAI Service and deployed across all European markets by late 2025. The name is an evolution of TOBi, but the architecture is fundamentally different.

The old TOBi worked on intent classification: a customer’s message was parsed for keywords, matched to one of a finite set of predefined intents, and a scripted response was returned. If the message did not match an intent, the bot failed. SuperTOBi replaces that slot-filling model with a large language model that understands natural language requests without requiring keyword matching, generates contextually appropriate responses, and executes transactions directly in backend systems through structured action tags embedded in its output.

Three layers make up the full deployment. The customer-facing layer is SuperTOBi itself, handling billing queries, technical troubleshooting, service changes, and appointment scheduling through natural language in the My Vodafone app, website, and telephony channel. Importantly, SuperTOBi does not just inform: it executes. When a customer asks to activate a roaming add-on, the system calls the backend provisioning API and completes the action within the chat session.

The second layer is SuperAgent, the internal-facing tool for human call centre agents. It runs alongside an agent’s screen in real time, replacing the 20-page PDF troubleshooting guides that previously structured every call. An agent types a question in plain language and receives step-by-step guidance drawn from a Neo4j knowledge graph that encodes Vodafone’s procedural logic as traversable nodes (steps, conditions, actions, APIs) rather than flat documents. Calls are transcribed and summarised as they happen, so the agent arrives at the conversation with full customer history before saying a word.

The third layer is the “Just Ask Once” service model that wraps both systems. When SuperTOBi escalates to a human, it passes a structured conversation summary. The human agent picks up where the AI left off without the customer repeating themselves. If the issue cannot be resolved immediately, one named agent owns it through to completion, proactively messaging updates. Vodafone reports a 96% delivery rate on this promise.

What Did the Portugal Pilot Actually Prove?

Portugal was the first full SuperTOBi deployment and the most documented before-and-after comparison Vodafone has published. The appointment booking query type is a useful proxy because it is a bounded, high-volume, transactional request where success or failure is unambiguous.

Before SuperTOBi: 15% first-contact resolution. After: 60%. Net Promoter Score improved 14 points, reaching 64. These figures come from Vodafone’s own disclosures and corroborating Microsoft case study materials. Neither source is independent in the audit sense, which is worth holding in mind, but the specificity and consistency across two self-interested but separate organizations lends them some credibility.

The Portugal results shaped the pan-European rollout. Germany and Turkey followed in July 2024, with the rest of the European markets completing deployment by the time Vodafone reported H1 FY26 results in November 2025. The group-level headline at that point: over 70% end-to-end resolution rate, plus 8 percentage points of NPS improvement across European markets, 85% reduction in campaign analysis time for the customer value management team, and over 60% improvement in call centre agent helpfulness ratings.

Traditional Chatbot vs. Generative AI Customer Service

Dimension	Keyword/Intent Chatbot (TOBi)	Generative AI System (SuperTOBi)
Understanding	Keyword matching against predefined intents	Natural language understanding without keyword dependency
Resolution	Deflection-first, scripted responses	Transaction execution via backend API calls
Failure mode	Out-of-scope queries return generic fallbacks	Clarifying questions and graceful escalation
Agent assist	No real-time support for human agents	SuperAgent surfaces knowledge graph + live transcript summary
Continuity	Customer repeats context on every transfer	Conversation summary passed at every handoff
Session management	2-minute timeout, customer responsible for keeping alive	Asynchronous messaging, no timeout
First-contact resolution (Portugal)	15% for appointment booking	60% for appointment booking

What Are the Honest Limits?

The metrics Vodafone discloses carry a significant caveat: every headline figure, including the 70% resolution rate, the +8 NPS points, and the 96% Ask Once delivery rate, comes from investor presentations and company press releases. No independent analyst firm has audited these figures. That does not make them false, but it does mean they should be read as a directional signal rather than an audited result.

The escalation rate to humans is never directly stated. The 70% end-to-end resolution rate implies roughly 30% of interactions still involve human agents, which is consistent with what the old TOBi handled at its best. SuperTOBi’s gains appear to be primarily in the quality and completeness of resolution for the queries it handles, rather than in dramatically reducing human contact volume. Those are different value propositions: cost reduction through deflection versus NPS improvement through better resolution.

The system is also deliberately bounded. Complaints reviews, vulnerable customer interactions, collections matters, and complex technical faults are routed to human specialists. Vodafone invested £2 million in behavioural training for 12,500 human agents specifically for the sensitive query types that the AI does not handle. That is an honest architectural choice, and it is worth noting because the “AI replaces customer service” framing that often surrounds these deployments does not reflect what Vodafone actually built.

Customer reviews on third-party platforms still reflect frustration with chatbot loops and difficulty reaching humans, particularly in markets where the SuperTOBi rollout is more recent. The trust gap from a decade of keyword-matching chatbots that failed is real, and closing it through a better system takes longer than a product launch.

What Does Vodafone’s Full AI Suite Actually Look Like?

The SuperTOBi architecture sits within a broader multi-cloud, multi-vendor AI infrastructure that Vodafone has been building since 2024 under a deliberate strategy of commercial tension between providers.

Azure OpenAI Service provides the primary LLM inference layer for customer-facing SuperTOBi deployments. The orchestration layer in the most detailed publicly documented deployment, Fastweb/Vodafone Italy, uses LangChain and LangGraph with a Supervisor pattern: a central agent handles intent routing and guardrails, and specialized sub-agents handle domain-specific workflows with access to defined API subsets. The knowledge store is Neo4j, encoding procedural steps and troubleshooting logic as a traversable graph. Azure AI Search provides vector retrieval for open-ended queries. LangSmith handles evaluation and observability, running nightly automated scoring of the previous day’s interactions.

The data foundation is Vodafone Neuron, a multi-petabyte data platform hosted on Google Cloud, consolidated from 600 servers across 11 countries. Google Cloud Vertex AI and BigQuery ML power the AI Booster platform running over 600 traditional ML models for churn prediction, fraud detection, and financial forecasting. Google Gemini runs on some analytics and device-side workloads. Anthropic Claude is confirmed as a growing component by Vodafone’s CTO, though without a separately announced contract. AWS Bedrock handles select workloads.

For contact centre infrastructure, Genesys handles the telephony and messaging platform layer in several European markets. ServiceNow TSOM connects to Vodafone Business enterprise customers for AI-driven network fault prediction and service management. Microsoft 365 Copilot is deployed to 68,000 employees globally. The company runs the same applications with one provider’s LLM while consuming data from another cloud entirely, an operational multi-cloud model that CTO Scott Petty describes as commercially deliberate.

Can You Replicate This With Open-Source Tools?

The Fastweb/Vodafone Italy architecture is fully reproducible with open-source components. The differentiating elements are not the technology; they are the data quality, the procedural knowledge encoding, and the backend integration depth.

LangGraph (GitHub: langchain-ai/langgraph, Apache 2.0) is the open-source version of the orchestration layer running SuperTOBi’s multi-agent Supervisor pattern. LangChain (GitHub: langchain-ai/langchain, MIT) provides the agent-building primitives. Neo4j Community Edition (GitHub: neo4j/neo4j) handles the knowledge graph, and neo4j-labs/llm-graph-builder automates converting documents into Neo4j graph structures, which is the most labour-intensive part of the Vodafone implementation. Microsoft GraphRAG (GitHub: microsoft/graphrag, MIT) provides an alternative graph-based retrieval approach that does not require Neo4j specifically. For enterprises that want a complete open-source conversational AI framework rather than assembling from primitives, Rasa (GitHub: RasaHQ/rasa, Apache 2.0) is used by Deutsche Telekom at comparable scale and integrates with LLM layers via RasaGPT. For evaluation, Arize Phoenix (GitHub: Arize-ai/phoenix) provides open-source LLM observability in place of LangSmith.

A realistic three-phase approach works as follows. The first phase is the knowledge audit. Before any LLM is involved, map every procedural document that customer service agents currently use: troubleshooting guides, policy documents, FAQ repositories, API documentation for transactional backend systems. The bottleneck in replicating SuperTOBi’s resolution quality is almost never the model. It is the quality, completeness, and structure of the knowledge base the model retrieves from. The Neo4j graph encoding process, converting procedure documents into step-node-condition-action structures, is where most of the implementation time goes. Budget significantly more for this phase than for model selection or prompt engineering.

The second phase is shadow mode validation. Deploy the LLM-based resolution layer in parallel with your existing system, generating responses that agents or supervisors can see but which do not reach customers. Track the rate at which generated responses match what a trained agent would say, and measure which query types the system handles confidently versus which it hedges on or gets wrong. That accuracy distribution becomes the basis for your escalation logic: high-confidence query types can move to full AI handling, low-confidence types remain human-routed. Set the threshold conservatively.

The third phase is transactional integration. The step from informing customers about what to do to executing actions on their behalf is where the real resolution uplift comes from. This requires API integration with every backend system a customer might need to change: billing, provisioning, scheduling, contract management. Each integration is an engineering project with its own data quality and authorization requirements. Enterprises that treat this as an afterthought tend to build systems that give good answers but require the customer to then go do something themselves, which limits the resolution quality ceiling. Vodafone’s CTO explicitly identifies data quality as the hardest problem encountered in the deployment, not model capability.

What open-source cannot replicate easily is the breadth of Vodafone’s backend integrations, built up over years across 15 markets with different billing systems and regulatory environments. The technical patterns are standard. The integration surface area is not.