The worst AI roadmap is the one built backward: a vendor demo sparks excitement, a headline creates FOMO, and suddenly the team is retrofitting a product problem to justify a model purchase.
Strong PMs invert that sequence. They start from user and business pain, evaluate feasibility with clear-eyed skepticism, and only then ask whether machine learning or LLMs are the cheapest path to a durable outcome.
This lesson gives you a repeatable filter—not to kill ideas, but to kill the wrong ones early.
Not every valuable automation needs AI
If your problem is fully specified by rules humans can write down—eligibility checks, routing by category, formatting, deterministic validations—you should usually reach for rules, workflows, and traditional software first.
Rules are debuggable. They behave predictably. They are easier to explain to legal, support, and customers. AI shines when the mapping from input to output is too fuzzy, too large, or too expensive to encode by hand—not when someone is too impatient to document the logic.
Product implication: Before you fund discovery on a model, ask: “Could a junior analyst with a spreadsheet and a week of time specify 80% of this?” If yes, ship the boring version and measure. You might still add AI later—for edge cases, personalization, or scale—but you will not anchor the roadmap on novelty.
Repetition without ambiguity is the sweet spot
A useful first screen: is the task repetitive for a human? If experts do the same judgment thousands of times—with similar inputs and comparable stakes—you might be looking at a pattern ML can learn.
But repetition alone is not enough. The task also needs a learnable signal. If “good” is purely subjective and inconsistent across experts, the model will inherit that chaos.
Product implication: Spend discovery time observing variance. If two trusted humans disagree often, your problem may be policy or definition—not modeling. Fix the definition before you fund training.
Without training signal, you do not have a supervised product—yet
Many pitches skip straight to “the model will learn.” Learn from what?
Supervised learning needs labels, directly or indirectly. Sometimes labels exist in your logs (user clicks, outcomes, human edits). Sometimes you must create them—with annotation budgets, SME time, or implicit feedback loops.
If you cannot access labels at reasonable cost, your roadmap is not “build AI.” It is “build the data flywheel” or choose an unsupervised or weakly supervised approach with honest UX framing.
Product implication: Add a discovery milestone: evidence of label feasibility. Not a slide that says “we will figure it out.” A pilot with real annotators or a proxy metric tied to business outcomes.
”Mostly right” is a strategy decision—not a technical footnote
AI systems rarely achieve perfection. The question is whether good enough is good enough for the user journey you are designing.
If a wrong suggestion is low cost (user ignores it, one click replaces it), high recall with moderate precision can be delightful. If a wrong output is high cost (incorrect medical dosing language, wrongful enforcement), you need high precision—or a human gate.
Product implication: Write down the cost of false positives vs. false negatives in user and business terms. That document becomes your north star for thresholds, monitoring, and whether you ship at all.
Human review is a feature, not an admission of failure
The mature pattern in many domains is AI drafts, human decides—or AI triages, human handles exceptions. That is not a compromise; it is often the fastest path to value with acceptable risk.
If you cannot route outputs to review when confidence is low, you are betting the model is always safe. That bet rarely ages well.
Product implication: Design workflows, not just models. Ask: who reviews, at what SLA, with what tooling, and how you measure reviewer burden? If review does not scale, your automation target was wrong.
AI as a feature is visible; AI as infrastructure is often where margin hides
AI as a feature is what users notice: summarization, drafting, classification badges, recommendations surfaced in UI. Success is measured in adoption, satisfaction, and task completion.
AI as infrastructure is beneath the surface: search ranking, fraud scoring, routing, personalization, forecasting. Success is measured in conversion, loss reduction, latency, or operational cost.
Both are legitimate. They require different discovery, different stakeholders, and different risk conversations. Infrastructure AI often needs tighter partnership with security, compliance, and finance because failures are silent until they are catastrophic.
Product implication: Be explicit about which game you are playing. Feature AI needs crisp UX promises and visible failure handling. Infrastructure AI needs monitoring, explainability thresholds appropriate to regulators, and often a slower rollout strategy.
The trap of AI for its own sake
You will face pressure to “add AI” as a marketing checkbox. That pressure creates brittle products: bolt-on chatbots that duplicate help docs, vague “smart” modes that users distrust, and demos that crumble under real prompts.
Product implication: Tie every AI initiative to an outcome metric a skeptic would accept: time to complete a task, contact deflection with quality held constant, revenue lift with guardrails, incident reduction. If you cannot name the metric, you are not ready to name the epic.
Rules-based systems still win a surprising amount of the time
Teams underestimate how far you can get with:
- Deterministic workflows and validations
- Retrieval (search) plus templated responses
- Heuristics maintained by domain experts
- Classical optimization where the objective function is known
Sometimes the right “AI roadmap” is retrieval + templates until you have enough logged interactions to learn from.
Product implication: Ask your engineering partners for the cheapest credible baseline. If an LLM cannot beat that baseline on cost, latency, and quality in an eval harness, you should not ship it—no matter how impressive the demo.
Current models are powerful—and still bounded
Models struggle with long-tail domain facts without grounding, with arithmetic without tools, with policies that change weekly, and with adversarial inputs designed to break guardrails.
Product implication: Pressure-test proposals against failure cases that matter in your domain, not cherry-picked prompts. Bring the messy tickets, the edge contracts, the ambiguous tickets, the multilingual corner cases. If the team winces, that is useful information.
Build an AI opportunity assessment into discovery—before engineering starts
Treat AI like any other expensive bet: require a short assessment before backlog commitment. Keep it lightweight, but make it mandatory for anything touching ML or LLMs.
Suggested prompts:
- Problem clarity: What user or business outcome improves, and by what mechanism?
- Baseline: What non-AI solution exists today, and what bar must we beat?
- Data and labels: What evidence exists? What must we collect? Who owns quality?
- Error economics: What happens when the system is wrong? Can we detect it quickly?
- Operational load: Latency, cost per action, retraining, monitoring—who owns ongoing health?
- Governance: Privacy, consent, bias risk, third-party terms—what blocks launch?
Product implication: The output is not a “go/no-go” from PM alone. It is a shared picture that lets leadership fund the right phase: prototype, pilot, or pause.
Competitive pressure is real—and still not a problem statement
You will hear: “Competitors launched an assistant; we need one.” That may be true strategically. It is still not a substitute for a user problem.
Parity features can be rational when distribution depends on perceived completeness. Even then, the work should define what job the assistant performs, for whom, and what outcome improves—otherwise you ship a checkbox that drains support and disappoints users.
Product implication: Translate competitive pressure into a concrete bet with metrics. If you cannot beat “search the help center” on task completion time, your assistant is theater.
The best opportunities often look unglamorous on a slide
High-value AI work is frequently internal: triage, routing, forecasting, anomaly detection. It does not screenshot well for marketing. It does show up in margin, incident rate, and cycle time.
Do not let the roadmap be skewed by what demos nicely in a board meeting. Let it be skewed by durable leverage—especially when the user-facing surface area is small but the operational payoff is large.
Product implication: Run two tracks in discovery: user-visible bets and operations bets. Require the same rigor for both. The internal bet often clears the bar first.
A one-page opportunity brief beats a twenty-slide science project
Discipline is lightweight documentation: problem, user, baseline, proposed approach, data/labeled evidence, risk, and success metrics. The goal is shared mental alignment—not certainty.
Review the brief with engineering, design, legal (when needed), and a domain skeptic. If the skeptic’s questions deflate the room, you saved a quarter.
Product implication: Make the brief a gate before sprint commitment. Not to bureaucracy-win—but to prevent “we started building and discovered we had nothing to learn from.”
Good discovery makes the roadmap boring—in a good way
The goal is not to never use AI. The goal is to use it where its strengths match your problem and your risk budget—and to skip it where simpler tools deliver the same outcome faster.
Next in this track: treating data as a first-class requirement, because the prettiest model cannot outrun a hollow dataset.