Financial Services

Morgan Stanley AI Debrief and DevGen.AI: What 98% Advisor Adoption and 9 Million Lines of Legacy Code Reveal About Purpose-Built Enterprise AI

Morgan Stanley deployed two distinct purpose-built AI tools: Debrief for wealth advisor meetings and DevGen.AI for COBOL modernisation. Here is what the numbers actually mean, what was not disclosed, and what other enterprises can replicate.

Your best advisor is spending a third of every client meeting thinking about what to type into Salesforce. Your best engineer cannot touch the most business-critical codebase because only two people left understand the COBOL. These are not future problems. They are the problems Morgan Stanley chose to solve first.

What Problem Were Morgan Stanley Wealth Advisors and Developers Actually Trying to Solve?

For wealth advisors, the administrative drag is not a small inconvenience. A financial advisor managing dozens of high-net-worth relationships spends meaningful time after every client call reconstructing what was said, logging it accurately into CRM, drafting follow-up emails, and flagging action items. The cognitive cost compounds: the more time spent on documentation, the less time spent preparing for the next call or deepening client knowledge. At scale across 16,000 advisors, that friction represents an enormous volume of professional capacity consumed by clerical work.

For the technology organisation, the problem runs deeper and carries more existential risk. Morgan Stanley, like most large financial institutions, carries legacy systems written in COBOL and Perl, some of which have been running for decades. The engineers who originally wrote those systems are largely gone. The ones who can read the code are a shrinking population. When a business-critical process lives in a codebase that almost no one can interpret, change management becomes slow and risky, and the institutional knowledge embedded in that code becomes progressively harder to recover.

These are distinct problems, which is exactly why Morgan Stanley built two distinct tools rather than one general-purpose assistant.

What Did Morgan Stanley Actually Build, and How Does It Work?

AI @ Morgan Stanley Debrief launched in June 2024. The architecture is deliberately narrow. When an advisor starts a Zoom call, the client hears a consent request at the top of the call. If consent is given, OpenAI’s Whisper model transcribes the audio. GPT-4 then processes that transcript and generates structured notes, a list of action items, a draft follow-up email, and a pre-filled Salesforce CRM entry. The advisor reviews all of this before anything is finalised or sent. Nothing leaves the session as authoritative until a human approves it.

The compliance architecture matters as much as the AI architecture. Morgan Stanley holds a contractual zero data retention agreement with OpenAI: the firm’s client conversations do not train future models and are not stored on OpenAI’s infrastructure after processing. This is not standard practice, and it required Morgan Stanley’s status as OpenAI’s named exclusive strategic wealth management partner (a designation formalised in March 2023) to negotiate.

DevGen.AI launched in January 2025 and operates on a fundamentally different premise. The tool was built in-house, trained on Morgan Stanley’s own internal codebase including proprietary code variants, not on a generic corpus of open-source repositories. It reads COBOL, Perl, and internal proprietary languages. Its primary output is not code. It generates English-language “functional maps,” plain descriptions of what a module does, what inputs it expects, what rules it applies, and what outputs it produces. A human developer then takes that specification and rewrites the module in Python. The AI is not the author of the replacement system. It is the interpreter of the legacy one.

DevGen.AI also performs regulatory code extraction and limited partial translation, but the documentation function is the core. That design choice is significant and we will return to it.

What Did the Deployment Actually Produce, and How Much Should You Trust the Numbers?

The headline figures for Debrief come from two sources, and it matters which is which. Don Whitehead, a Houston-based Morgan Stanley advisor who participated in the pilot, told CNBC on June 26, 2024, that he saves roughly 30 minutes per meeting using Debrief. That is a single named practitioner speaking from personal experience, not a controlled study. Ted Pick, Morgan Stanley’s CEO, said at the firm’s Annual US Financials Conference on June 10, 2024, that advisors save 10 to 15 hours per week. That figure comes from a named executive on the record, but no methodology was cited and no external body has audited it.

The 98% adoption figure appears in a Morgan Stanley press release and is self-reported. The definition of “adoption” was not disclosed. A firm can count a user who activated the feature once or a user who runs it on every call: these are not the same, and we do not know which definition applies here. What we can say is that a 98% figure across 16,000 advisors, if even roughly accurate, would be an unusually high adoption rate for any enterprise software rollout. The product design, which puts the output directly into the advisor’s existing CRM workflow without requiring them to switch context, likely drives that figure more than the AI itself.

For DevGen.AI, Mike Pizzi, Morgan Stanley’s Global Head of Technology and Operations, told the Wall Street Journal in its June 4, 2025 print edition that the tool processed 9 million lines of code between January and May 2025, with 280,000 developer hours saved. Pizzi is a named executive making a specific claim in a named publication. No methodology for the hours calculation was disclosed. The number likely derives from an internal estimate of time-per-module rather than a controlled experiment, but we do not know. Pizzi also said: “We found that building it ourselves gave us certain capabilities that we’re not really seeing in some of the commercial products.” This is the most substantive claim in the public record about why purpose-built outperformed off-the-shelf.

The broader financial context is worth holding alongside these figures. Wealth Management pre-tax margin hit a record 31.4% in Q4 2025, and Ted Pick referenced “one human team and one AI team” for document review on the January 15, 2026 earnings call, without naming either tool specifically. Whether Debrief contributed to margin improvement is not something the public numbers can establish. The causality chain between an AI note-taking tool and a margin point is too long and too crowded with other variables.

How Does the AI-Assisted Workflow Compare to What Came Before?

DimensionPre-AIAI-Assisted
Meeting notes (Debrief)Advisor writes manually post-callWhisper transcribes, GPT-4 drafts, advisor reviews
CRM data entry (Debrief)Advisor types entries from memory or rough notesPre-filled Salesforce entry generated from transcript
Follow-up email (Debrief)Advisor drafts from scratchDraft generated from call content, advisor edits
Time cost per meeting (Debrief)20-40 minutes post-call adminApproximately 30 minutes reduction (single pilot report)
Data privacy (Debrief)Standard CRM loggingZero-retention contract, client consent per call
Legacy code comprehension (DevGen.AI)Requires COBOL/Perl specialistAI generates English functional spec from any module
Documentation quality (DevGen.AI)Sparse, often outdated or absentStructured functional map generated on demand
Code rewrite bottleneck (DevGen.AI)Specialist reads code, rewrites manuallyDeveloper works from English spec, writes Python
Model training basis (DevGen.AI)Not applicableTrained on proprietary internal codebase
Human authorship of new code (DevGen.AI)Engineer authors from direct code readingEngineer authors from AI-generated spec

Where Are the Honest Limits of What Morgan Stanley Has Shared?

The 280,000 developer hours figure has no disclosed methodology. That number could be calculated conservatively or generously, and the difference matters if you are planning a business case for a similar programme. Hours saved is also not the same as value created: if the saved hours are redirected to higher-value work, the impact compounds. If they are absorbed into the same workload at reduced headcount, the story changes.

The 98% adoption rate for Debrief sits alongside a March 2025 announcement of 2,000 to 2,500 layoffs, roughly 3% of Morgan Stanley’s 80,000 employees, with Bloomberg reporting that some reductions were attributed to AI efficiency. The company said publicly that DevGen.AI would not cause headcount reductions. That statement and those layoffs coexist in the same quarter. We are not saying the company misled anyone. We are saying the tension exists and your board will ask about it.

The compliance dimension of Debrief deserves more scrutiny than it has received publicly. AI-generated notes, reviewed and approved by a human advisor, entering the firm’s Salesforce system of record creates a new question: what is the accuracy expectation for those notes, and who is liable when one is wrong in a material way? Morgan Stanley has not published accuracy rates for the Debrief output. This is not unusual for enterprise AI at this stage, but it is the gap your legal and compliance team will find first.

Finally, the AI Assistant (the 2023 RAG-based system over 100,000 internal documents) also carries a 98% adoption figure. Two unrelated tools reporting the same adoption percentage is a pattern worth noting. Either Morgan Stanley consistently achieves exceptional adoption (plausible given strong integration design) or the metric is being measured in a way that makes 98% easy to reach. The honest answer is that we do not know.

What Does Morgan Stanley’s Full AI Stack Look Like, and What Open-Source Options Exist for Everyone Else?

Morgan Stanley’s AI programme did not begin in 2024. Next Best Action, a machine learning system surfacing client recommendations for advisors, launched in 2018. The AI Assistant, a retrieval-augmented generation (RAG) system allowing advisors to query across 100,000 internal documents using GPT-4, launched in 2023. Debrief followed in mid-2024. AskResearchGPT, a GPT-4o-based tool giving institutional staff access to 70,000 Morgan Stanley research reports through natural language queries, also launched in 2024. DevGen.AI launched January 2025. OpenAI is the exclusive strategic partner across the full current stack.

Jeff McMillan, Chief Analytics and Data Officer, governs the firmwide AI programme under four stated pillars: Senior Supervision (executive accountability for every deployment), Human in the Loop (no AI output becomes a system of record without human review), Robust Evaluation (ongoing performance monitoring before and after deployment), and Second-Line Oversight (independent risk and compliance review separate from the deploying business unit). This governance architecture is worth examining on its own terms, separate from the tools. It is the structure that makes a regulated institution willing to move at this pace.

For the Debrief pattern, the closest open-source equivalent is Meetily (github.com/Zackriya-Solutions/meetily). It combines Whisper for transcription with Ollama for local inference, runs entirely on-device, and has Salesforce integration on its development roadmap. The local execution model means no data leaves your infrastructure, addressing the zero-retention requirement without a bespoke contract with a foundation model provider. For enterprises not yet ready to negotiate an OpenAI enterprise agreement, this is the lowest-friction starting point.

For the DevGen.AI pattern, two options sit at different points on the build-versus-buy spectrum. Microsoft’s Azure Legacy-Modernization-Agents (github.com/Azure-Samples/Legacy-Modernization-Agents) is an open-source multi-agent framework designed to translate COBOL to Java or C# and generate documentation. IBM’s watsonx Code Assistant for Z is the nearest commercial comparator for COBOL-heavy shops. In a documented case study involving Egypt’s National Organisation for Social Insurance, IBM reported a reduction from 8 hours to 30 minutes for a specific COBOL analysis task. That figure comes from IBM’s own published case study, so apply the same self-report scrutiny you would apply to Morgan Stanley’s numbers. The directional signal is consistent across independent sources.

A practical implementation path starts with consent and data architecture, not the AI. Pick one meeting-heavy team and deploy per-meeting consent with Whisper and a long-context model, routing output to structured notes in your CRM. Run parallel human note-taking for four weeks before switching over. The four-week parallel period is not bureaucratic caution: it is the minimum data set you need to identify where the AI gets things wrong. The consent architecture and zero-retention contract with your AI vendor must be in place before the pilot launches, not added later when legal asks where the transcripts are going.

The DevGen.AI pattern starts with documentation, not code translation. Identify the five most business-critical legacy modules in your environment and run the AI against them to produce English functional specifications. Have your most senior engineers validate those specs against the known behaviour of the system. This step is genuinely deskilling-proof: you are using the AI to reduce the COBOL expertise bottleneck without asking it to replace engineering judgment on the rewrite. The engineers who review those specs are doing higher-value work than reading opaque legacy code, and you are building institutional knowledge that does not currently exist in text form anywhere in your organisation.

The third step is what most enterprises skip and what Pizzi explicitly credited for outperforming commercial tools. Build your training data before you deploy. Whether through fine-tuning or a RAG corpus, the performance difference between a generic LLM and a purpose-built one comes entirely from the quality and relevance of the internal data it has seen. Your internal code, architecture decision records, runbooks, and incident post-mortems are the training material that makes an AI genuinely useful for your systems rather than generically capable across everyone else’s. Neither tool should go live as the sole system of record until parallel testing confirms accuracy. Shadow mode is the period during which you discover that the AI handles 92% of cases well and 8% in ways that would have caused a problem. Find that 8% before it finds you.

The Question for Your Team

Morgan Stanley took six years to move from a recommendation ML model in 2018 to a codebase-trained developer assistant in 2025, with a contractual OpenAI partnership and a four-pillar governance framework in between. How many of those foundational steps has your organisation actually completed, and which one is the real bottleneck right now?

← All posts