Travel

Booking Holdings AI Transformation: How a $27B Travel Giant Cut Costs, Accelerated Developers, and Rebuilt Customer Service with AI

Booking Holdings exceeded its $450M restructuring target by $100M, cut customer service cost per reservation by ~10%, and deployed AI across five travel brands using a deliberate multi-LLM strategy that avoids vendor lock-in.

Your customer service team is growing as fast as your bookings. Your developers are shipping slower than your roadmap demands. And somewhere in your organisation, three different AI pilots are running in silos with no shared infrastructure. Booking Holdings looked at that exact picture and decided to build something different.

What problem was Booking Holdings actually solving?

The company operates five distinct travel brands: Booking.com, Priceline, Agoda, OpenTable, and KAYAK. Each brand had its own customer service model, its own developer workflow, its own vendor relationships. As volume grew, so did the cost structure underneath it.

The core tension was straightforward. Booking Holdings generated $23.7B in revenue in 2024 and $26.9B in 2025, a 13% increase year-over-year. Growing at that pace while keeping cost per transaction flat is hard enough. Growing while actually reducing cost per transaction requires a fundamentally different operating model.

On November 8, 2024, the company filed an 8-K/A with the SEC announcing a formal Transformation Program targeting $400-450M in annual run-rate savings. That filing made the initiative legally accountable, not just a slide in an all-hands deck. The question was whether AI could carry a meaningful share of that target, or whether the savings would come almost entirely from real estate and procurement.

What did they build, and how does it work?

The architecture is deliberately multi-vendor. CTO Matthias Verstraete stated publicly that Booking.com runs four to five different LLMs and built an internal orchestration layer to route between them. That is not an accident. It reflects a strategic decision to avoid lock-in at a moment when every major model provider is repricing, re-tiering, and repositioning.

Booking.com’s Trip Planner was built on the OpenAI ChatGPT API combined with internal machine learning models, and the team delivered it in ten weeks. The grounding layer that sits on top includes hallucination detection, PII removal, and off-topic filtering, all built internally on top of the OpenAI API. Priceline went a different direction: Penny, its conversational travel assistant, runs on a stack that includes Google Vertex AI, OpenAI GPT-4, and Anthropic Claude. Penny Voice, the live voice booking feature, uses OpenAI’s Realtime API with GPT-4o for sub-second spoken responses.

OpenTable moved fastest on the customer-facing side. It deployed Salesforce Agentforce in three weeks and connected it to an AI Concierge launched in August 2025. For developer productivity, Booking.com rolled out a structured enablement program tracked through DX (GetDX), a developer experience analytics platform, and used Glean for internal employee search across its knowledge base. The orchestration philosophy is consistent across brands: pick the right model for the task, build abstraction layers so you can swap vendors, and instrument everything so you can measure what is actually changing.

What did the deployment actually produce?

The headline number is the one that matters most to a CFO. Booking Holdings set a target of $400-450M in annual run-rate savings. By the end of 2025, it had achieved $550M in run-rate savings, exceeding the target by roughly $100M. In-year savings for 2025 reached $250M against a $150M commitment. The 2026 guidance calls for $500-550M in in-year savings. These figures come from SEC-reported earnings and CFO Ewout Steenbergen’s remarks on the Q4 2025 earnings call on February 19, 2026.

Two-thirds of the savings came from process modernisation, procurement renegotiation, and real estate consolidation. One-third came from workforce changes. Headcount specifics were not disclosed, partly due to EU works council constraints on public communication during restructuring. This matters when reading the AI story: AI alone did not produce $550M in savings. It was one lever in a broader operational reset.

The customer service numbers are more granular. Steenbergen reported approximately a 10% year-over-year reduction in customer service cost per reservation, even as total booking volume grew roughly 10%. That combination is the actual signal: costs going down while volume goes up. Agoda separately reported a double-digit year-over-year reduction in customer service cost per booking, noted by CEO Glenn Fogel on the Q4 2024 earnings call. Both figures are self-reported from earnings calls, not third-party audited.

On developer productivity, the DX case study reports that daily AI users at Booking.com achieved 16% higher change throughput compared to non-users, with a 31% total throughput improvement following structured enablement programs. This is a vendor-published case study, not SEC-verified data, and should be read as directional rather than definitive. OpenTable’s Agentforce deployment claimed 70% autonomous resolution of diner and restaurant inquiries, per a Salesforce customer story. OpenTable’s AI Concierge press release claimed 80% of diner questions answered autonomously. Both are company or vendor-issued claims without independent verification.

Revenue attribution from AI features is the notable gap. On the Q4 2025 call, CEO Glenn Fogel said directly that AI revenue contribution numbers are “still small” and declined to offer a timeline for scaling. Management refused to give specific revenue attribution for any AI feature. The story at Booking Holdings right now is about cost and efficiency. The revenue upside is still being established.

How does this compare to traditional OTA operations?

DimensionTraditional OTA ModelBH AI-Assisted
Customer serviceHuman agent-led, cost scales linearly with booking volumeAI deflection layer handles routine queries; ~10% cost per reservation reduction (self-reported, Q4 2025)
Developer productivityStandard sprint velocity, no systematic AI tooling measurementNear-100% AI adoption at Booking.com; 31% throughput improvement after enablement (vendor-published, DX)
Cost structureHeadcount and overhead grow proportionally with revenueRun-rate savings of $550M achieved; two-thirds from process/procurement, one-third from workforce
Partner operationsManual or semi-automated restaurant and supplier communicationsOpenTable Agentforce: 70% autonomous resolution of partner inquiries in 3-week deployment (vendor-published, Salesforce)
AI vendor strategySingle-vendor or no formal LLM strategyIntentional multi-LLM stack across 4-5 models; internal routing layer; no single vendor dependency
Data strategySiloed by brand; limited internal search capabilityGlean deployed for internal knowledge search; grounding and PII layers built on top of LLM APIs

What are the honest limits of what we know?

Start with attribution. The $550M savings figure is real and SEC-adjacent (earnings call from a public company). The breakdown between AI-driven savings and operational restructuring savings is not broken out by the company. Assuming AI caused most of the savings would be a mistake.

The customer service metrics are self-reported by the CFO and CEO on earnings calls. They have incentive to frame progress favourably. The 10% figure is directionally credible, but there is no external audit confirming the methodology for how “cost per reservation” is measured across five brands.

The developer productivity data from DX is vendor-published. DX sells developer experience tooling. The 31% throughput number almost certainly reflects the best-case cohort and the most favourable measurement window. That does not make it wrong, but it should not be treated as a controlled study.

The OpenTable autonomous resolution claims (70% from Salesforce, 80% from the company’s own press release) use different denominators and were published by parties with commercial interests in the deployment looking successful. What counts as “resolved autonomously” is doing a lot of work in that statistic.

The strategic risk is visible in Booking Holdings’ own 10-K filing for 2025, which lists Google Gemini and other LLM platforms as a risk factor for OTA disintermediation. The company is building AI-powered travel planning tools while simultaneously acknowledging that the same AI infrastructure could allow users to book directly through a model rather than through an OTA. That tension is not resolved. It is simply being managed in parallel.

What does the actual AI suite look like, and how could you replicate it?

The five brands share a common philosophy but run different implementations. Booking.com anchors its customer-facing AI on the OpenAI ChatGPT API for Trip Planner, with an internal ML layer underneath and a grounding/moderation wrapper on top. It uses Glean for internal employee search and structured developer tooling through DX/GetDX. The routing strategy across four to five LLMs is managed through an internal orchestration layer, functionally similar to what LiteLLM provides in the open-source world.

Priceline’s Penny runs on a three-model stack: Google Vertex AI, OpenAI GPT-4, and Anthropic Claude. Penny Voice specifically uses the OpenAI Realtime API with GPT-4o for live voice interaction. OpenTable deployed Salesforce Agentforce for B2B partner and diner inquiry resolution and layered its AI Concierge product on top for consumer-facing queries. KAYAK and Agoda have not disclosed their specific model choices publicly, though Agoda’s customer service cost reductions imply meaningful automation deployment.

The connecting thread is the deliberate avoidance of single-vendor dependency. The internal orchestration layer lets the team swap models by task type and reprice against vendors as the market moves.

For organisations looking to replicate similar capabilities, the open-source landscape maps reasonably well to each layer. Rasa provides a self-hosted conversational AI framework for customer service automation, suitable for environments with compliance or on-premise requirements. Botpress offers a closer analogue to the Agentforce deployment: a low-code agent builder designed specifically for inquiry deflection at scale. For multi-agent orchestration, LangGraph from LangChain provides the graph-based workflow architecture that mirrors what Booking Holdings built internally, and several open-source LangGraph travel agent repositories demonstrate booking workflow patterns directly. For multi-LLM routing, LiteLLM replicates the internal orchestration layer with a unified API across OpenAI, Anthropic, Google, and dozens of other providers, complete with cost tracking, fallback logic, and spend controls.

A practical implementation path starts narrow. Pick one high-volume, low-judgment customer service workflow — something like “where is my booking confirmation” or “how do I cancel a reservation” — and deploy an intent-based or LLM-assisted deflection layer against it. Do not attempt to automate complex disputes or multi-step itinerary changes in the first phase. The goal is a proven cost reduction in a controlled slice before expanding scope. Shadow mode is the discipline that makes this work: run the AI response in parallel with the human response for two to four weeks, compare accuracy and resolution rates, and only go live when the numbers justify it.

Once deflection is proven on that first workflow, build a lightweight LLM routing layer before adding a second use case. LiteLLM or an equivalent abstraction layer should sit between your application logic and any model vendor. This is a commercial decision, not just a technical one: organisations that built directly on a single vendor API in 2023 are now negotiating from a weak position as pricing structures change. The routing layer gives you leverage. The third step is the one most organisations skip: pair the AI rollout with a formal restructuring mandate so that savings self-fund reinvestment. Booking Holdings did this explicitly by filing the Transformation Program target with the SEC and tracking it publicly. Without that accountability structure, efficiency gains tend to disappear into budgets that were never reallocated. The open-source tools can replicate the technical architecture. The governance commitment is the harder thing to replicate, and arguably the more important one.

What should your team ask before starting?

Booking Holdings had five distinct brands, a $27B revenue base, and the leverage to build internal orchestration infrastructure that most organisations cannot justify on their own. The more useful question is not “can we do what they did” but rather: if we achieved a 10% reduction in cost per customer interaction this year, where exactly would those savings go, and who in the organisation is accountable for making that happen?

That question will tell you whether you have a technology problem or a governance problem.

← All posts