Model Landscape: AI model usage in project management has grown rapidly, with distinct models for specific tasks.
Meeting Summarization: Different AI models excel at summarizing meetings; model choice affects accuracy and detail.
Stakeholder Communication: AI helps structure content for stakeholder updates, but precision varies across models.
Scope Documentation: Claude is favored for detail-oriented scope tasks, especially where model processes dependencies.
Process Documentation: ChatGPT often better at organizing scattered information into coherent process documents.
We talk a lot about AI project management tools here at DPM — but we haven't yet turned the lens on the AI models themselves and how they're shaping PM work. Until now.
The model landscape has gotten crowded fast. According to Artificial Analysis' AI Adoption Survey, the average number of LLM families used or considered jumped from roughly 2.8 in 2024 to 4.7 in 2025. And T4 Atlas' 'Most Used AI Models' ranking shows which ones keep showing up across the market: GPT-4o, Claude 3.5 Sonnet, DeepSeek R1, Gemini 1.5 Pro, Llama 3, Copilot Model Stack, Grok, and Codestral.
If you've experimented with more than one, you already know they're not interchangeable — and treating them like they are could mean leaving real performance gains on the table. To find out how these models stack up across the most common PM tasks, we went straight to practitioners. Here's what they told us.
Meeting summarization and action items
Claude, Gemini, Read AI
Meeting summarization is one of the most common AI use cases in project management, and also one of the most sensitive to model performance degradation. Joe Troyer, Chief Marketing Officer at Great Lakes Tiny Homes, draws a direct comparison: "Claude keeps action items accurate across hour-long transcripts where GPT starts dropping names and deadlines after the first thirty minutes." The longer the meeting, the more model choice matters.
Claude keeps action items accurate across hour-long transcripts where GPT starts dropping names and deadlines after the first thirty minutes.
But identifying the right model is only half the equation — knowing what good output looks like is the other. Troyer is specific: "Good output lists each item with the speaker's name and a single sentence on the commitment. Bad output adds extra tasks that never came up or merges two people into one owner."
Not everyone lands in the same place. Michael Gold, who manages projects across multiple clients, deliberately cross-references models rather than committing to one: "I ultimately end up using everything like Gemini, Claude, Chatty, etc., even when I've got Firefly. I take Firefly's transcript and put it into something else, because I don't trust Firefly." His approach treats model comparison as part of the workflow itself — not a setup step you do once.
Ryan Gilbreath takes a more structured approach, ranking the options he's tested: "I would say if I had to rate my tool list as far as AI note takers, Read AI right now has been a top one for me. Second would be Gemini just because I work in a Google workspace and it's easy to sync and collect all my notes all in one place. And then I would say Otter AI."
Stakeholder communication and status reporting
Claude Opus, ChatGPT-4
A useful status update does specific things well. As Guillermo Ginesta, Managing Partner APAC at Brinc, puts it: "A useful output has to separate facts, blockers, owner-specific actions, and political risk without inventing certainty." Ken Herron, co-founder at VCONify, frames it just as sharply: "A good output summarizes what changed, why it matters, who owns the next action, and where decisions are required. A bad output reads like a generic meeting recap, treats every issue as equally important, and loses executive context."
With that in mind, practitioners tend to split by what they're optimizing for. Ken Herron, Co-founder at VCONify, reaches for Claude when communication quality is the priority: "When I need to turn meeting notes, email threads, and executive feedback into a concise stakeholder update, Claude [Opus] consistently produces clearer and more nuanced communications than other models I've tested."
Others make the case for ChatGPT-4. Bogdan Condurache, co-founder and CPO of Brizy, uses GPT-4 for stakeholder communication alongside Claude for scoping. He points to GPT-4's ability to shift communications between audiences: "If I need to explain a delay to leadership, simplify a product update for customers, or summarize technical debt for non-technical teams, it adjusts tone and depth well."
Scope documentation and requirements
Claude Opus, ChatGPT-4
When it comes to defining scope, the practitioner consensus leans toward Claude — particularly for complex, multi-dependency work where the cost of a missed assumption is high. Murli Pawar, Vice President at SunTec, describes how his team uses Claude Opus for exactly this: "We use Claude Opus for turning a client's scope document into an actual project plan by breaking it into tasks, sequencing them, and mapping dependencies. Our projects tend to have many interdependent steps, where one stage can't start until another is finished and clean, so this is the task where the planning has to be genuinely reasoned rather than just listed."
What distinguishes Claude in this context, in Pawar's view, is how it handles ambiguity: "When I hand it a vague brief, Claude is the most consistent at catching that step C secretly depends on step A and flagging where the brief is incomplete, rather than quietly inventing assumptions to fill the gaps. The others more often hand me a plan that looks complete but has sequencing that falls apart the moment you check it."
Pawar's definition of good vs. bad output here is worth holding onto: "A good output is a plan I can question. It surfaces its own assumptions. It can also group tasks into logical phases, and tell me where the brief was ambiguous rather than papering over it. I should be editing it, not rebuilding it. A bad output is an overconfident plan that's wrong underneath."
SOP and process documentation ChatGPT
Process documentation puts different demands on AI than real-time meeting capture does — and the gap between a polished-looking output and a usable one can be especially costly here. The practitioners who do this work regularly tend to reach for ChatGPT, often because of its strength at restructuring scattered information into logical sequences.
Hien Nguyen, Co-Founder & Director at Happy Way, uses ChatGPT to consolidate operational knowledge that's "dispersed across various locations — meetings, Messages in Slack, Email, and from individual members of the Operation team." The model's value, in his experience, is its ability to take unorganized inputs and impose structure: "A good output will define the step-by-step process, and include decision points, to such detail that a new employee could use this document to execute the process without needing continuous clarification."
He identifies the core failure pattern clearly: "A bad output will look polished but will not include the practical details. If AI creates new steps, omits vital information about a process, and uses general terms to describe businesses without explaining how the work is completed, the document almost becomes unusable." For SOPs, surface-level coherence isn't enough — the specificity has to be there or the document fails in practice.
Copywriting and written work
Claude, Gemini
For writing tasks — whether drafting project communications, content, or copy — practitioners report a clear divide. Claude and Gemini emerge as the preferred options, with ChatGPT falling behind in direct comparisons.
Jennifer Goebel, Project Coordinator at Baker Marketing Laboratory, finds that Claude is the best writing tool from a tone of voice angle: "Claude seems to be the tool that works best for communications, with less back and forth than ChatGPT and a more natural tone of voice in writing."
Yonelly Gutierrez, who manages project workflows across multiple tools, made a more decisive shift: she dropped her ChatGPT professional subscription entirely after noticing a performance decline. "I have noticed a huge difference, even between GPT models. I used to use ChatGPT all the time, but now I have noticed it hallucinates way too much." She now relies on Gemini for writing: "When it comes to actually writing, I prefer Gemini." Her secondary tool is Glean — not for its writing quality, but for its ability to pull relevant internal project context.
Matching the model to the moment
Across meeting summaries, status updates, process documentation, and scoping work, a consistent theme emerges: AI outputs fail not because the technology is wrong, but because the wrong expectations — or the wrong model — were applied to the task. The PMs getting the most value from AI aren't using one tool for everything. They're treating model selection as a deliberate part of their workflow — and holding outputs to a clear standard of what good actually looks like.
Want more insights like these? Sign up for a free DPM account to hear from more experts like these.
