Goldman Sachs GS AI: How a Major Bank Built a Multi-Model AI Platform, Deployed Autonomous Coding Agents, and What the Productivity Numbers Actually Mean

A developer at Goldman Sachs opens a decades-old service and starts tracing dependencies through an internal codebase that has outlived several technology fashions. In another tab, a policy team member is condensing a long document into something usable before the next meeting. Both know public AI tools could speed this up. Both also know they cannot paste confidential bank data into them.

That is the lesson: in a regulated enterprise, the first AI problem is rarely model quality, it is controlled access.

What Problem Was Goldman Sachs Actually Solving?

Goldman Sachs has roughly 12,000 developers, about one-quarter of its workforce. That scale changes the shape of the problem. Slow codebase maintenance is not an annoyance when thousands of engineers live in it. Legacy-language migrations are not side projects when they become a recurring bottleneck.

The problem was broader than engineering. Knowledge workers were also spending time summarizing documents and drafting internal material. The obvious consumer AI tools were off limits because of data confidentiality rules, so the firm had a capability gap: high-value use cases were visible, but the default tools could not be used safely.

That is the context for “One Goldman Sachs 3.0”, the firm’s AI framework across six workstreams: client onboarding and KYC, vendor management, regulatory reporting, lending, enterprise risk management, and sales enablement. The point was not one chatbot. It was a controlled way to push AI into core workflows.

What Is GS AI and How Does It Actually Work?

Goldman’s answer was not to train its own large language model from scratch. It built GS AI Platform, an internal security and orchestration layer that sits on top of external models. That layer protects private data, routes requests across multiple model vendors, and does not send raw company data to public AI endpoints.

The model layer is multi-vendor by design: OpenAI’s GPT family, Google Gemini, and Meta Llama. That gives Goldman optionality. It can match use case to model and avoid tying the firm to a single provider.

On top of that platform, Goldman deployed tools by role. For developers, GitHub Copilot was rolled out to more than 9,000 developers, about 80 percent of engineers, by mid-2024. Gemini Code Assist was also available to developers by March 2025.

For the broader firm, Goldman built GS AI Assistant on top of the same internal platform. Before its firmwide launch on June 23, 2025, 10,000 employees were already using it. The expectation was that nearly all employees would have access by the end of 2025.

By July 2025, the firm was processing 1 million generative AI prompts per month. That tells you something useful: this was no longer a lab exercise. It had become operating infrastructure.

What Did the Devin Pilot Actually Prove?

In July 2025, Goldman began piloting Devin from Cognition, an autonomous AI coding agent. The rollout started with hundreds of agents. The target use cases were practical: updating code to newer languages, handling multi-step engineering tasks, and reducing developer drudgery.

But what was the operating model? Human developers still supervised the work. They described the task, the agents executed portions of it, and humans reviewed the output. That matters because “agentic” can mean many things in headlines. Here, it meant delegated execution under human control, not unsupervised production changes.

CTO Marco Argenti said agentic AI such as Devin could boost productivity 3 to 4 times versus prior AI tools. Treat that carefully. It was a projection from an executive interview in July 2025, not a measured outcome.

So what did the pilot prove? That Goldman was willing to move from assistant-style AI to task-executing agents inside engineering, and to do it within a controlled governance model. What it did not prove was autonomous software delivery at scale.

Traditional Developer Workflow vs. Goldman Sachs AI-Assisted Engineering

Traditional Developer Workflow	Goldman Sachs AI-Assisted Engineering
Engineers manually inspect large internal codebases	Copilot and Code Assist support navigation, drafting, and code suggestions
Legacy-language migrations move slowly through scarce expert effort	Devin is piloted for updating code to newer languages under human review
Public AI tools are blocked because of confidentiality risk	GS AI Platform provides an internal secure path to external models
Model access is fragmented or unmanaged	GS AI Platform routes requests across OpenAI, Gemini, and Llama
Knowledge stays trapped in teams and documents	GS AI Assistant gives employees a firm-approved interface for summarization and drafting
Engineers complete multi-step drudge work directly	Human developers delegate portions of tasks to agents, then review results

What Are the Honest Limits?

Start with the most abused number. The “30% productivity gain in software development and customer service” was not Goldman measuring Goldman employees. It came from Goldman Research analyzing disclosures from other companies in those two narrow domains, as reported by Fortune in March 2026.

That matters because the figure is often repeated as if it were Goldman’s internal result. It is not. The same research also found no economy-wide AI productivity link.

The internal number that does get cited is narrower and weaker than headlines suggest. Goldman reported a 20% developer productivity improvement in the first year from its developer copilot, based on a self-reported executive interview in Observer in September 2025. There is no disclosed methodology, sample size, or independent audit.

There are other limits. Goldman has disclosed no public financial ROI, no cycle-time KPIs, and no DORA metrics. Devin remains a supervised pilot, with no confirmed autonomous production deployments in the public record. And Goldman’s security and compliance controls are expensive to build, which makes the architecture harder for smaller organizations to copy directly.

What Does Goldman’s Full AI Suite Look Like?

GS AI Platform is the control layer, it handles security, orchestration, and model routing across external vendors.

GS AI Assistant is the firmwide internal interface, it gives employees approved access to AI for tasks like summarization and drafting.

GitHub Copilot is the scaled coding assistant, rolled out to more than 9,000 developers by mid-2024.

Gemini Code Assist is an additional developer tool, available to Goldman developers by March 2025.

Devin is the agentic engineering layer in pilot form, used for supervised multi-step coding work and language modernization.

OpenAI GPT, Google Gemini, and Meta Llama are the current core model vendors underneath the platform.

Anthropic Claude has been reported as part of a 2026 collaboration around banking-task agents, but that should be treated as expansion, not the core production stack described here.

Microsoft is present through GitHub Copilot, but a full Azure dependency is not confirmed in primary sources.

Can You Replicate This With Open-Source Tools?

Yes, but not all at once, and not with the same compliance posture on day one. The closest open-source analogue to GS AI Platform is a model gateway such as LiteLLM, paired with observability tools like Langfuse or Helicone. That gets you the routing layer, basic control plane, and auditability you need before employees start sending prompts into multiple models.

The second phase is role-specific tooling. For developers, Aider or Continue can stand in for commercial coding assistants. For agentic software work, OpenHands or SWE-agent are closer to the Devin category. If you need multi-agent workflows, LangGraph gives you a way to coordinate them without hardwiring everything into one brittle prompt chain.

The third phase is where most teams move too fast. You need policies for what data can be sent where, approval logic for higher-risk actions, and review steps that stay with humans. Goldman could move into agents because it first built the guardrails. Smaller firms should do the same, just with lighter components: gateway first, assistant second, agents third. Run the agents in shadow mode, and validate before acting.