As AI Gets Smarter, Its Worst Errors Are Easier to Miss

As AI outputs become more fluent and convincing, the most dangerous errors are no longer obvious mistakes but hidden assumptions. In this interview, a Director of Product explains why PMs must sharpen their judgment as AI sophistication increases, not relax it.

Read Now

By Kristen Kerr • Last updated on Feb 26, 2026

Key Takeaways

Human Judgment: AI demands stronger human judgment to prevent misleading outputs from derailing project success.

AI Collaboration: AI supplements human work by assisting in synthesis and execution across project stages.

Adversarial Review: Use AI as a reviewer to test assumptions and identify potential weaknesses in plans.

Flexible Systems: Lightweight, AI-augmented project delivery systems suit high-uncertainty, design-led work best.

Decision Focus: Project delivery is shifting from plan enforcement to building effective decision frameworks.

Evgeny Goncharov is a PhD mathematician who is now the Cofounder and Executive Director of Cantabrium Scholars, where he oversees product and project delivery.

In our conversation, he was emphatic that PMs must become more discerning as AI becomes increasingly sophisticated — not less. He says the errors can be just as disastrous, but they're becoming harder to catch.

From mathematics to product management

I’m a mathematician by training, and my current work sits across education-focused venture building and investment-related projects. During my PhD at Cambridge, I also worked as a technical consultant, splitting my time between long-horizon research problems and applied quantitative work — including high-performance computing, risk modeling, and optimization in financial systems. In both settings, progress depended less on rigid plans and more on clear assumptions and fast feedback.

Alongside my research work, I spent many years as a mathematics educator and mentor. After finishing my PhD, I began scaling that work more deliberately by building structured programs and thinking carefully about designing more complex educational projects. In parallel, I got involved in investment and infrastructure-related projects, which brought real decision pressure and sharpened how I think about execution under uncertainty.

Today, I’m Cofounder and Executive Director (Product & Engagement) of Cantabrium Scholars, a young educational and consulting venture that’s very much in build mode. My role focuses on owning and evolving the product and student experience across programs, from academic design and mentor quality to parent communications and retention. It also includes running multi-stage delivery across teams and geographies, and keeping complex projects coherent as they evolve. I’m also closely involved in forming partnerships and shaping the organization’s academic direction and overall delivery approach as we grow.

Where AI is most valuable in the delivery cycle

AI has been particularly useful for me as I’ve moved from purely research-driven environments into roles that require faster execution, cross-functional coordination, and real-world decision making.

In my current work, AI has become most valuable at two points in the delivery cycle:

Early on, it serves as a fast way to learn new domains and pressure-test ideas while decisions are still being formed. Many strands of the educational and investment-related projects I’m involved in require engaging with unfamiliar subject matter quickly. AI helps me explore options fast, before slowing down to interrogate the assumptions. Concretely, that means I use AI to clarify the structure of a field, highlight typical trade-offs, and act as a sounding board as I refine my own thinking. In practice, this often matters most for error reduction — catching weak assumptions or internal contradictions before they propagate through a project.

Then, once the main direction is clear, AI is particularly effective at reducing execution and coordination costs. I now spend much less time producing repeated versions of the same material — translating content into different formats, adapting it for different audiences or countries, or keeping parallel documents in sync. Maintaining a clear memory of prior decisions and context also makes it much easier to generate new versions of materials quickly without losing intent as projects evolve.

It has also been unexpectedly useful in negotiation-heavy contexts — quickly mapping both sides’ positions, surfacing likely constraints and non-negotiables, and testing where I can give ground versus where objections are principled. In practice, I’ll ask it to draft two or three ‘fair trade’ packages and then sanity-check them against my red lines before I walk into the call. That makes it easier to enter conversations with clear red lines, credible concessions, and a calmer understanding of what a fair trade actually looks like.

Evgeny's Notes

AI has also been unexpectedly useful in negotiation-heavy contexts — quickly mapping both sides’ positions, surfacing likely constraints and non-negotiables, and testing where I can give ground versus where objections are principled.

As a result, more of my attention goes into judgment-heavy work: deciding which questions actually matter, forming the initial structure of projects, and navigating the human side of delivery — alignment, commitment, and trust. Tasks like defining scope, weighing trade-offs, resolving disagreements, and committing to a direction remain firmly human, while AI supports synthesis, consistency, and recall across the delivery process.

I increasingly see AI as a collaborator in the early stages of thinking and a reliable accelerator later on, but not yet as a substitute for original judgment or genuinely creative execution from scratch. That boundary is shifting, but for now most leverage in my work comes from pairing strong human judgment with selective, well-formulated AI support.

Why stronger human judgment is the most important part of AI-enabled delivery

Interestingly, as AI outputs become more convincing, the role of human judgment becomes more demanding, rather than less.

At the current stage, a large part of the work is not just using AI, but actively controlling it: framing questions carefully, making assumptions explicit, and preventing AI from jumping to unstated conclusions too early.

In practice, this means the human has to act as a center of stability in the process. In our work, AI often converges quickly on clean, elegant structures that look internally coherent. When things look "too clean", deliberate human intervention is needed to slow things down, revisit assumptions about student background, motivation, and workload, and avoid locking something in that looks attractive on paper but fragile in practice.

What makes timely human intervention especially challenging is that errors rarely show up as obvious mistakes. As projects develop, AI hallucinations tend to become subtler and more convincing, often taking the form of internally consistent narratives built on incomplete or implicit assumptions. In my educational work, this has shown up as program structures that initially looked sound but would have led to misaligned workloads or unrealistic background expectations if not caught early. When the surrounding text is fluent and well structured, those issues can propagate quietly unless someone deliberately intervenes.

Here's one practical guardrail we use: Any AI-generated structure that looks “clean” or converges too quickly is treated as suspect by default. Before committing, we force at least one pass that explicitly restates assumptions, explores failure modes, and tests the structure under alternative constraints. This has been the most reliable way to catch subtle but high-impact errors before they propagate.

Looking ahead, I expect this tension to increase. As models improve, it will become harder to challenge architectures that are internally consistent and persuasive on the surface. Maintaining that human role, slowing things down at the right moments, questioning assumptions, and taking responsibility for decisions may become one of the most important and demanding aspects of AI in professional services automation and delivery.

How to use AI as an adversarial reviewer

One simple shift can help with this: Use conversational AI as an adversarial reviewer rather than a content generator.

Instead of asking AI to produce finished outputs, I use it to interrogate proposed structures: to surface hidden assumptions, stress-test decisions under alternative scenarios, and articulate how a plan might fail if a single premise turns out to be wrong. This has been particularly valuable in design-heavy and investment-related work, where early decisions carry long downstream consequences and misplaced confidence is costly.

Used this way, AI compresses thinking cycles without collapsing nuance. It allows half-formed ideas to be externalized, challenged, and refined quickly, reducing cognitive load while improving decision quality. In practice, this has led to earlier identification of weak assumptions, fewer late-stage reversals, and more deliberate commitment once a direction is chosen.

The value of this use case depends less on the model itself and more on how the interaction is structured. Small changes in prompts can determine whether AI prematurely converges on a tidy solution or exposes meaningful trade-offs and failure modes. By returning to the same lines of questioning across stages of a project, retesting assumptions as context evolves, this interaction becomes part of the delivery process itself. Over time, that repetition compounds: decisions stay aligned, earlier reasoning remains visible, and overall coherence across projects improves rather than decays.

Here's a small but illustrative example. Asking AI, “Which option is best given these constraints?” tended to produce clean, confident recommendations that converged quickly. Reframing the prompt to, “Under what conditions would this option fail, and which assumptions would have to be false for that to happen?” produced materially different output: it surfaced operational bottlenecks, coordination risks with partners, and timeline sensitivities that weren’t obvious in the original framing. That shift in wording repeatedly changed which risks we addressed early and, in some cases, which options we ruled out entirely.

Evgeny's Notes

Asking AI, “Which option is best given these constraints?” tends to produce clean, confident recommendations that converge quickly. Reframing the prompt to, “Under what conditions would this option fail, and which assumptions would have to be false for that to happen?” produced materially different output.

How AI is changing delivery rituals to focus on decision quality

AI has shifted our delivery rituals away from reporting activity and toward improving decision quality.

Defining scope has become both more explicit and more provisional. Because AI makes it easy to generate plausible plans quickly, we’ve learned to slow this phase down rather than accelerate it. Early outputs are treated as hypotheses to interrogate, not proposals to execute. We use AI to map the option space, surface constraints, and identify where uncertainty still sits before committing to scope.
Aligning teams now centers on shared reasoning rather than shared artifacts. Instead of circulating polished documents immediately, we often use AI to generate concise summaries of assumptions, trade-offs, and unresolved questions ahead of discussions. That equalizes context and allows alignment conversations to focus on judgment rather than interpretation.
Validating work has moved upstream. Rather than reviewing outputs against a checklist or timeline, we use AI to stress-test assumptions: asking how a structure might fail, which assumptions are carrying the most risk, or what changes under edge cases. This has made validation conversations more substantive and reduced late-stage surprises.
Managing execution has become lighter, but more deliberate. Rather than status updates, regular check-ins focus on decisions that need to be made, trade-offs that have emerged, and where human judgment is required. AI supports continuity and synthesis in the background, but ownership and accountability remain clearly human.

Overall, AI hasn’t removed the need for delivery rituals — it’s made them more intentional. The emphasis has shifted from tracking progress to making better decisions, earlier, and with clearer ownership.

What an AI stack for flexibility in project delivery looks like

My project delivery stack is deliberately lightweight, text-first, and AI-augmented rather than tool-heavy. That reflects the nature of my work, which tends to be high-uncertainty and design-led rather than execution-at-scale.

At the core are shared documents and spreadsheets via Google Workspace. Shared docs act as the primary source of truth for assumptions, decisions, ownership, and evolving structures, while spreadsheets are used for cost models, timelines, and scenario comparisons, especially in investment and location-planning contexts. Slack and WhatsApp are used for fast alignment and decision making, not for granular task tracking.

For PM, I use Notion and Linear as needed, and focus on decision logs and milestones rather than heavy task boards.

In practice, this means that over the past year, I've doubled down on shared documents and AI-assisted synthesis, while moving away from heavyweight project management tools, in favor of clearer decision logs and lighter coordination.

Now, I have fewer documents, clearer decision logs, and more explicit assumptions. LLMs now sit on top of this stack as a synthesis and continuity layer. They help me keep decisions, assumptions, and parallel documents aligned as things change. I use them to explore options, pressure-test designs, summarize prior decisions, and maintain context as projects evolve across academic, operational, and partnership dimensions. Here's the breakdown:

ChatGPT (GPT-4-class models): Open-ended reasoning, option exploration, trade-off analysis, pressure-testing designs, and maintaining long-horizon context across documents.
Claude: Clean, structured drafts and careful rewriting with tone and constraints preserved.
Gemini (occasionally): Fast summarization of large inputs or alternative framings, especially when working across mixed document types.

In recent projects, this has translated into faster convergence on viable program structures, fewer late-stage reversals, and clearer decision handoffs across academic and operational workstreams. In practical terms, this has reduced the number of parallel documents we maintain, shortened iteration cycles between drafts, and lowered the coordination overhead of keeping academic, operational, and partner-facing materials aligned.

On the communications and design side, I’ve used Midjourney to align quickly with our designer. It’s been particularly effective for conveying tone and visual direction early, reducing the back-and-forth that often comes from trying to describe aesthetic judgment purely in words.

Looking ahead, a linked knowledge base such as Obsidian would make sense as the volume of material grows, particularly for long-term institutional memory across projects. At the current stage, however, shared documents combined with AI-assisted summarization and recall cover most of that need without introducing additional tooling overhead.

Overall, this stack works because clarity, adaptability, and judgment currently matter more than process optimization. As scale increases, parts of the stack will likely formalize, but for now flexibility has been more valuable than precision.

Why lightweight delivery systems work better in high-uncertainty, AI-enabled projects

So, we moved away from task-heavy, timeline-first project management toward systems that emphasize assumptions, dependencies, and decision points. In high-uncertainty, design-led work, early precision around tasks and deadlines often creates a false sense of progress while obscuring the real questions that still need answering.

I haven’t consciously “rebelled” against traditional project management — it simply stopped fitting the type of work I’m doing.

We opt for fewer rigid plans and more structured conversations. Instead of tracking granular tasks or enforcing artificial milestones, we focus on defining what must be true for a project to succeed, where uncertainty still sits, and who owns the next meaningful decision. Shared documents act as evolving sources of truth, with explicit sections for assumptions, open questions, risks, and ownership. Lightweight documentation replaces dashboards, and decision points replace fixed timelines.

ChatGPT supports this shift by helping us summarize changes over time, surface inconsistencies across parallel workstreams, and keep context aligned as projects evolve — without enforcing a single workflow or rigid process.

The result has been faster convergence early on and fewer painful rewrites later. We see less performative “progress” and more genuine clarity, particularly in projects where the problem itself is still being defined.

How to experiment cautiously with agentic AI

Agentic AI is not a part of our general workflows yet because the nature of the work means that judgment, accountability, and contextual awareness still need to remain very explicit — and human. Premature automation in those settings risks creating false confidence. That’s a risk we’ve been careful to avoid.

So, we’ve been cautious about anything that resembles end-to-end automation. Instead of asking, “What can we automate?”, the guiding question has been, “Where does orchestration improve visibility or reduce friction without obscuring ownership?” So far, that’s led us to keep experimentation exploratory, incremental, and tightly supervised.

In practice, that experimentation includes using AI for summarization, maintaining consistency across evolving documents, and comparing options or scenarios under different assumptions. These are areas where orchestration can reduce coordination costs without shifting responsibility away from humans.

Looking ahead, there are areas where more structured orchestration may make sense, particularly around coordination and feedback at scale. But even there, the goal wouldn’t be to automate judgment or replace human interaction, but to support continuity, reflection, and alignment as complexity increases.

Why project delivery will become less about enforcing plans and more about decision architecture

Over the next five years, project delivery will shift away from task coordination toward decision architecture. As AI increasingly handles synthesis, memory, and comparison across options, the human role will move decisively toward managing uncertainty rather than managing work.

In practice, this means fewer people focused on maintaining timelines and task lists, and more leaders focused on designing the conditions for good decisions: making assumptions visible, defining option spaces, clarifying decision rights, and identifying where judgment is still required. AI will compress the mechanics of delivery, but it won’t resolve ambiguity or own risk.

It's important to be explicit about what is still unknown, where commitments are irreversible, and who is accountable for each decision. Project delivery will become less about enforcing plans and more about maintaining coherence as understanding evolves.

Evgeny's Notes

It’s important to be explicit about what is still unknown, where commitments are irreversible, and who is accountable for each decision. Project delivery will become less about enforcing plans and more about maintaining coherence as understanding evolves.

Why judgment and responsibility still belong to humans

The biggest shift for delivery leaders right now isn’t learning new tools, but relearning where judgment belongs. AI is changing how quickly we can think, iterate, and coordinate — but it doesn’t change who should be accountable for decisions. Keeping that distinction clear is becoming a core leadership skill.

Here's my advice:

Don’t outsource judgment — outsource cognitive load. AI is extremely good at holding context, comparing options, surfacing inconsistencies, and reducing the mental overhead of complex projects. Use it to support thinking, not to replace it. The moment AI starts making decisions feel complete, leaders need to step in and ask what assumptions are carrying that confidence and whether they’re justified.
Make assumptions explicit before optimizing execution. AI makes it easy to generate polished plans early, which creates the illusion of progress. Resist that pull. The most valuable work early on is not task breakdowns or timelines, but making assumptions visible: what has to be true for this to work, where uncertainty still sits, and which decisions are irreversible. Projects fail less often from poor execution than from unexamined premises.
Treat AI as a thinking partner, not a shortcut. Small changes in framing can radically change what AI produces. Learning to prompt well is less about syntax and more about learning to ask better questions. That skill compounds over time, because it forces clearer thinking, sharper trade-offs, and more deliberate decision making.

The teams that will succeed over the next few years won’t be the ones that automate the fastest. They’ll be the ones that use AI to see more clearly, decide more deliberately, and take responsibility for the choices that actually matter.

Follow along

You can follow Evgeny Goncharov on LinkedIn, and take a look at his work in education at Cantabrium Scholars.

More expert interviews to come on The Digital Project Manager!

As AI Gets Smarter, Its Worst Errors Are Easier to Miss

From mathematics to product management

Unlock for Free

Where AI is most valuable in the delivery cycle

Why stronger human judgment is the most important part of AI-enabled delivery

How to use AI as an adversarial reviewer

How AI is changing delivery rituals to focus on decision quality

What an AI stack for flexibility in project delivery looks like

Why lightweight delivery systems work better in high-uncertainty, AI-enabled projects

How to experiment cautiously with agentic AI

Why project delivery will become less about enforcing plans and more about decision architecture

Why judgment and responsibility still belong to humans

Follow along

Meta PM Shares How AI Is Changing Delivery And Where It’s Found Wanting

The AI-Powered PM: How Dr. Nancy Li Launches AI Products in Weeks

Best Training Programs for PMO Leaders in 2026