Architect by Profession, Programmer at Heart
Project Management in the Age of AI
A colleague of mine, a software developer, recently asked if they should move into project management because they’re worried about AI replacing them. Diversifying your skills is always a good idea, I said. Broadening your perspective, learning how projects are run, understanding stakeholder dynamics, all of that makes you more valuable professionally. But if you’re moving into project management because you think it’s safer from AI disruption than software development, then you might be mistaken.
The State of PM
Project management as practiced today has devolved into mostly PMO work: maintaining artifacts (Jira boards, status reports, RAG dashboards) and facilitating rituals (standups, retrospectives, steering committees). The actual hard part of project management, monitoring risk indicators, judging when to trigger mitigation actions, and executing them in a controlled manner, is largely ignored by the discipline.
The Falsification Principle
Karl Popper argued that science progresses not by confirming hypotheses but by falsifying them. You can never prove a theory correct; you can only fail to disprove it. Every surviving theory is one that hasn’t been killed yet. Confirmation is cheap. Falsification is informative.
Translate this to project management. A successful milestone tells you very little. Maybe your plan was right, maybe you got lucky, maybe you’re measuring the wrong thing. The watermelon metaphor is a running gag in project management: Green outside, red inside. A failure that’s properly instrumented tells you exactly where your assumptions were wrong. That’s the information you need.
Two Modes, Two Transformations
Business As Usual
Kuhn called this “normal science”: puzzle-solving within an established paradigm. Most project management operates in this mode: the destination is known, the path is roughly understood, and the job is execution and control.
For BAU projects, the rituals and artifacts are overhead that must be optimized. AI is in a good position to do this. Scheduling, status tracking, risk register maintenance, stakeholder communication, action item tracking. These are structured, repeatable tasks that AI handles well. The BAU side of PM will be transformed. Not augmented, but automated.
Most PM training and certification (PMP, PRINCE2, SAFe) teaches the artifacts and the rituals, not the judgment. The profession doesn’t distinguish between BAU and R&D, and applies the same tools to both. This is like using a ruler to measure the coastline of Britain. The tool works at one scale but is meaningless at another.
R&D: Paradigm Shift Mode
The projects that create new capabilities, enter new markets, or change how work is done operate in paradigm-shift mode. Here, traditional project management is harmful. It treats deviation from plan as failure to be corrected, when deviation is the signal that carries information.
Treat the project plan as a hypothesis to be tested, not confirmed. Every iteration is designed to expose where the plan is wrong, as early and cheaply as possible.
This reframes the entire practice:
- Risk management isn’t about listing risks and assigning probabilities in a register. It’s about designing experiments that would falsify your key assumptions before you’ve committed too many resources.
- Milestones aren’t progress markers. They’re falsification checkpoints: “If X hasn’t happened by this date, our core assumption about Y is probably wrong.”
- Status reporting shouldn’t ask “are we on track?” It should ask “what have we learned that challenges our plan?”
- A project that fails fast and cheaply has generated more value per Euro spent than one that succeeds slowly and expensively while never testing its assumptions.
In R&D, failure is the way to generate value. Success is just cost with a business case.
If you’ve done spike solutions, proof-of-concept phases, or risk-driven planning, you’re already doing some of this. The framework here makes it systematic and names the principle so that AI tools can be directed toward it.
AI’s Role in Each Mode
BAU: AI as Executor
AI automates the PMO layer. Artifact generation, status aggregation, schedule optimization, meeting scheduling, action item tracking, stakeholder updates. The human PM becomes unnecessary for these tasks. What remains is exception handling and escalation judgment, and even those may be automatable as models improve.
R&D: AI as Experiment Designer
In paradigm-shift mode, AI transforms PM in a different way. Rather than automating rituals, AI helps design the experiments that test assumptions: given the current risks, what is the cheapest experiment that would disprove the most critical assumption? What unstated beliefs are embedded in the project plan? What would need to be true for the project to fail, and are those conditions being monitored? Are there weak signals across status updates, team communications, and external events that connect into an early warning?
The human’s role shifts to deciding which experiments promise the highest value and interpreting results in context. The experiments aim at constructive failure: learning that changes direction before expensive commitments are made.
What This Looks Like in Practice
A well-designed falsification experiment has a realistic chance of failure. Ideally, something close to 50%. This is a shock to current management incentives, where the goal is to design plans that succeed. But an experiment designed to succeed isn’t an experiment; it’s a demonstration. It doesn’t generate new information.
A company is building a global mobile app and needs to choose a technology: native development (iOS and Android separately), mobile web, or a cross-platform framework like Flutter. The team believes Flutter is the way to go. It promises one codebase for both platforms, faster delivery, and lower cost. But there are real risks. Flutter isn’t truly native, the framework may become obsolete (Xamarin anyone?), and the user experience might be worst-of-both-worlds. A traditional PM would commit to Flutter based on the business case, build the full app, and discover the problems in production.
Popperian PM asks: what would need to be true for Flutter to be the wrong choice? And how can we find out before we’ve spent the full budget?
The experiment: build a minimum viable version of a key user flow in Flutter. Roll it out. Instrument it with telemetry. Gather user feedback. Better yet, if an existing native app already covers the same flow, run an A/B test: keep the existing version as A, build the Flutter version as B, instrument both, compare. Measure what matters: performance, user satisfaction, crash rates, development velocity, maintenance cost.
If it turns out that users don’t love the Flutter experience, or that performance on lower-end devices is unacceptable, toss it. The experiment cost a fraction of the full build and saved the company from a multi-year commitment to the wrong technology. If the results are comparable, go ahead with Flutter, now with evidence rather than belief.
AI’s role in this is not to make the Flutter-vs-native decision. It’s to help design the experiment: suggest what to measure, identify the user segments most likely to reveal problems, flag assumptions the team hasn’t tested (“you’re assuming consistent performance across device tiers, but have you tested on budget Android phones?”), and synthesize the results into a clear signal.
Each experiment builds upon the previous ones. The goal isn’t to delay delivery. Think of experiments like going to the gym regularly. You don’t want what goes in, you want what comes out.
A Second Scenario: The Database Migration That Wasn’t
Here, the experiment didn’t happen.
A team migrated from a commercial relational database to a document database. The rationale was sound on paper. The document model promised flexibility, the team wanted to move away from expensive licensing, and the new database was popular in the industry. The plan was to rearchitect the data model along the way.
Under time pressure, the rearchitecture didn’t happen. The team migrated the existing relational structures, including stored procedures, into the document database. Tables became collections with table-like schemas. Joins became application-level lookups. The result: performance problems, awkward query patterns, and a data layer that combined the constraints of a relational model with none of the benefits of a relational engine.
In hindsight, the core assumption was: “we can migrate our relational data model to a document database and rearchitect it under project timeline pressure.” That assumption was testable.
The experiment: take one representative module, something with complex stored procedures and relational joins, and migrate it to the document database with the full rearchitecture the team envisioned. Time-box it. Measure how long the rearchitecture takes, what performance looks like, and how much of the original logic survives unchanged. In parallel, migrate the same module to a relational open-source alternative as a control: same schema, same stored procedures, minimal rearchitecture needed.
If the document database migration takes three times longer than planned, or if the team ends up recreating relational patterns in a document store, that’s the falsification signal. The relational alternative becomes the pragmatic intermediate step: escape the licensing cost now, rearchitect toward a document model later when there’s time to do it properly.
If the document migration goes smoothly and the rearchitected module performs well, proceed with confidence.
The cost of this experiment would have been a few weeks of AI-assisted parallel work on one module. The cost of not running it was a full migration that delivered the wrong architecture.
The Human in the Loop
In both scenarios, AI can design the experiment, help execute it (generate the test code, set up the telemetry, build the alternative implementation), and assess the results from the data. What AI cannot do (yet) is decide which experiments are worth running in the first place.
That decision requires understanding which risks are the biggest in the context of the larger picture: the business strategy, the organization’s capacity, the organization’s tolerance for delay, the political landscape around the decision. A document database migration might be technically risky but politically unchallengeable because a senior leader championed it. A mobile framework choice might be technically straightforward but strategically critical because it locks the company into a platform for five years. Knowing which risk to test first, and how to frame the experiment so that the results are actionable rather than ignored, is judgment that depends on context that no model currently maintains.
The project manager of the future is the person who looks at a plan and asks: “What are we betting on, and which bet would hurt the most if we’re wrong?” AI helps answer half of that. The other half is the human value add.
The experiment produces real data regardless of who designed it. If the Flutter A/B test shows poor performance on budget devices, that finding is valid whether a human or an AI chose to test for it. The risk is not that AI designs a bad experiment; it’s that AI designs an experiment that tests the wrong assumption, and the team draws confidence from a result that doesn’t address the real risk. That’s why choosing which assumptions to test remains a human judgment call. The validation is in the results, not the design.
The Tragedy of Agile
This has happened before.
The Agile Manifesto (2001) was designed for the situation we’re describing: uncertain direction, willingness to act, learning through iteration. “Responding to change over following a plan” is at its core. Build something small. See if reality confirms or falsifies your assumptions. Adapt.
Within a decade, Agile devolved into cargo culture. Scrum ceremonies replaced waterfall milestones. Jira boards replaced Gantt charts. Story points replaced time estimates. The fundamental mindset never changed. The artifacts were adopted, the philosophy was discarded. Organizations built the runway and the control tower but never understood why the planes didn’t come.
The same fate awaits any attempt to introduce “constructive failure” or “falsification experiments” as a new methodology. If organizations adopt these ideas as rituals without internalizing the reasoning, we will get:
- “Innovation sprints” that are just regular sprints with a different label
- “Experiment backlogs” maintained in Jira with the same obsessive tracking
- “Failure retrospectives” that quietly punish the people involved
- “Hypothesis-driven development” where the hypothesis is written after the fact to justify what was already decided
Corporate culture, especially in large traditional enterprises, is structured against this. Bonuses are tied to delivery, not to learning. Performance reviews reward green dashboards, not falsified assumptions. There is no color for “we killed an assumption and saved the company money.”
Solving the incentive problem requires changes to governance, reporting, and compensation structures that go well beyond project management. What a project manager can do is frame experiments in terms executives already understand: a small, bounded cost now that reduces the probability of a large, unbounded cost later. The database migration scenario is a good example. A few weeks of parallel work on one module would have cost a fraction of what the failed full migration cost in performance remediation, team morale, and delayed delivery. That’s not a philosophical argument. It’s a budget line.
Why This Time Is Different
When Agile devolved into cargo culture, the penalty was inefficiency and developer cynicism. Annoying, but survivable. Organizations that performed Agile without understanding it were still roughly competitive with organizations that didn’t adopt it at all.
The AI era changes this. AI gives every player the capacity to iterate faster, experiment cheaper, and learn more quickly from failure. Organizations that embrace constructive failure as a way of thinking will outpace those that merely perform it. The gap between “gets it” and “performs it” widens quickly, because the tool that amplifies good judgment also amplifies the cost of bad judgment.
The tragedy of Agile is a cautionary tale. Don’t introduce the new thing as a process. Introduce it as a way of thinking, and let the process emerge from that. Which is what Agile itself was supposed to be.
Advice to the Budding Project Manager
If you’re a developer considering project management as a safer career path. AI will transform PM at least as deeply as it will transform development. The artifact-and-ritual layer that constitutes most PM work today is more exposed to automation than code generation. Switching from one AI-disrupted field to another is not a strategy.
But the picture isn’t bleak.
As AI automates the BAU side of PM, the field will shift. The proportion of projects that need a human project manager will shrink, but the projects that remain will be the interesting ones: high-uncertainty R&D work where the direction is unclear, the assumptions are untested, and the value comes from learning fast. What’s left is the hard, cognitive work of guiding AI toward the right experiments, interpreting ambiguous results, navigating human dynamics under uncertainty, and making judgment calls that no model can make alone.
The value of management is proportional to the uncertainty. If your project is predictable, PM is overhead. If your project is uncertain, PM done properly is essential. The job requires thinking, not ceremony facilitation. It rewards curiosity, not compliance. It values the ability to ask “what would need to be true for this to fail?” over the ability to maintain a Gantt chart.
The advice, then, is not to avoid PM but to prepare for the PM that’s coming. And the single most important skill to develop is a sense of uncertainty.
This sounds simple. It is not. Most company cultures expect the expert to have answers, not questions. Saying “I don’t know” in a steering committee feels like failure. Consultants have built entire industries on projecting certainty, and organizations have learned to reward it. The result is that false certainty is everywhere: project plans that look solid because nobody tested the assumptions, risk registers full of low-probability items because admitting a high-probability risk would raise uncomfortable questions, technology choices that feel inevitable because a senior leader endorsed them.
Developing a sense of uncertainty means learning to notice when you feel certain and asking why. Is the certainty based on evidence or on consensus? On data or on authority? On tested assumptions or on untested ones that everyone has agreed to stop questioning? The best project managers, like the best scientists, are the ones who are most honest about what they don’t know.
From there, the practical skills follow:
- Learn to think in hypotheses and falsification, not plans and milestones
- Build the skill of designing cheap experiments that retire expensive risks
- Understand human dynamics: when to push, when to listen, when to escalate
- Practice asking “what would need to be true for this to fail?” in every project meeting, even when the question is unwelcome
So should my colleague move into project management? I believe the career move matters less than the skills you develop. The boundary between developer and project manager is dissolving. Both roles are converging on the same core: judgment under uncertainty. Pick whichever role gives you the most exposure to hard problems with unclear answers, and run toward them.