Why AI Is So Smart and So Stupid?

Artificial intelligence has entered a strange and decisive phase, and Canadian tech leaders should pay close attention. Today’s best models can refactor massive codebases, discover software vulnerabilities, write production-ready applications, and automate multi-step workflows with stunning competence. Then, in the next breath, they can fail at simple commonsense questions that any human would answer instantly.

The Shift That Changed AI Coding

One of the clearest signals of AI’s recent leap came from software development. Karpathy described a sharp transition that many early adopters felt around December, when coding agents stopped being merely helpful and started becoming genuinely reliable for larger chunks of work.

Before that turning point, AI coding tools often produced partial solutions. A developer could ask for a function, a component, or a small utility and receive something usable, though imperfect. There was still a lot of cleanup. The human remained deeply involved in correcting logic, refining syntax, and stitching outputs together.

Then something changed.

Newer models paired with better agent “harnesses” began producing coherent, extended outputs that held together over longer workflows. Instead of isolated snippets, they could generate working sections of an application, and in some cases entire apps from end to end. That moment introduced what Karpathy famously called vibe coding.

For Canadian tech organizations, this was not just a developer productivity update. It was a structural shift in how software gets made.

The implication is enormous:

Software teams can move faster with fewer handoffs.
Founders can test product ideas with far lower development overhead.
Non-traditional builders can now prototype functional software.
The bottleneck moves away from syntax and toward judgment, architecture, and intent.

That is why Canadian tech executives should treat AI coding not as a novelty, but as a foundational business technology development.

Software 1.0, 2.0, and 3.0: The New Computing Paradigm

Karpathy’s framework for understanding AI development is especially useful because it maps the evolution of software itself.

Software 1.0

This is the classical model. Humans write explicit rules in code. The machine follows instructions exactly.

If an organization wants software to perform a task, developers specify each step. Traditional enterprise systems, internal tools, transaction engines, and web applications all depend on this model.

Software 2.0

In this phase, software is “programmed” through data and neural network training rather than hand-written rules. Instead of specifying every instruction, teams build datasets, define objectives, and train models to learn patterns.

This was the paradigm behind many machine learning systems and computer vision breakthroughs.

Software 3.0

This is where large language models change the game. The model itself becomes a kind of programmable computer, and prompting becomes a new form of programming.

In Karpathy’s description, the context window acts like short-term memory, similar to RAM. The model weights act like the CPU, performing the actual computation. The human no longer writes every step of the process. Instead, the human steers the system through instructions, context, goals, and examples.

That is not a minor upgrade. It is a genuine computing shift.

For Canadian tech companies, this means many established assumptions are being overturned:

Interfaces designed for humans may no longer be the primary interface.
Documentation written for people may become less useful than instructions written for agents.
The product may not be software in the old sense, but a combination of models, prompts, tools, and orchestrated workflows.

In practical terms, Canadian tech builders should start asking a different set of questions. Not “What code should users run?” but “What outcome should the agent achieve?” Not “How do we guide every step?” but “What context and permissions allow the system to complete the task intelligently?”

Why Prompting Is Replacing Many Traditional Instructions

One of the most striking examples Karpathy gave involved software installation. In the old paradigm, installing a tool across multiple environments often required increasingly complex shell scripts. These scripts grew large because developers had to anticipate every edge case, operating system difference, and dependency conflict.

In the new paradigm, that complexity can shift away from hard-coded instructions and into the intelligence of the agent.

Instead of writing a giant installation script, the developer can provide a short block of text that tells the agent what should be installed, what tools are available, and what outcome is desired. The agent then examines the environment, decides what actions make sense, debugs along the way, and completes the setup.

This matters deeply for Canadian tech because it points to a broader transition from procedural software to goal-driven software.

In other words:

Humans specify the destination.
Agents determine much of the route.

That changes product design, internal tooling, onboarding, and enterprise operations. A Canadian SaaS company that continues building entirely for human-operated workflows may eventually look outdated compared with an agent-first competitor.

The Rise of End-to-End Neural Networks

Another major idea from Karpathy is that developers may still be underestimating how far end-to-end neural networks can go.

He used a compelling menu example. In an older software mindset, building an app that turns a photo of a menu into a polished visual output would require many steps:

Optical character recognition
Text extraction
Item parsing
Image generation
Layout rendering
Front-end display logic

That is a classic pipeline made of separate components and hand-assembled logic.

In the newer approach, a multimodal model can often take the menu image and a natural-language instruction, then directly return the transformed output. The neural network handles the entire chain as one broad task.

This reflects what AI researchers often call the bitter lesson: over time, methods that rely on scale, compute, and learning tend to outperform systems built around handcrafted human rules.

Karpathy’s own background at Tesla gives this argument extra weight. Tesla moved from a mixed approach of neural networks plus human-written heuristics toward more end-to-end learning in autonomous driving. The result, by his account, was better performance and lower maintenance complexity.

For Canadian tech, the business implication is clear. Teams should be cautious about overengineering around today’s limitations if those limitations are likely to disappear under stronger end-to-end models.

This does not mean traditional code is dead today. It means the frontier is moving outward, and product decisions should account for that trajectory.

Why AI Feels Brilliant in One Domain and Ridiculous in Another

This is the question that animates the entire discussion: why can AI be astonishingly capable and embarrassingly clueless at the same time?

The answer Karpathy emphasizes is verifiability.

Traditional computers automate what can be specified. Large language models increasingly automate what can be verified.

That distinction is one of the most important ideas in AI today.

What does “verifiable” mean?

A task is verifiable when a system can determine whether the output is correct without requiring a human to interpret every step.

Examples include:

Code: Does it compile? Does it run? Do tests pass?
Math: Is the answer correct?
Structured transformation tasks: Does the output match the expected pattern or objective?

These domains are ideal for reinforcement learning because the model can receive a reward signal when it gets the right answer and iterate toward better performance.

That is why AI appears so “spiky.” Its strengths are not evenly distributed. They peak in areas where outcomes are measurable and feedback is immediate.

This has major relevance for Canadian tech decision-makers evaluating where to deploy AI first. The best near-term opportunities are often not where tasks are broadest or most prestigious, but where outputs are easiest to verify.

That includes many high-value enterprise workflows in:

Software engineering
IT operations
Data transformation
Compliance checking
Testing and debugging
Structured document analysis

For Canadian tech firms serving regulated industries such as finance, healthcare, and public-sector infrastructure, this framework is especially important. AI adoption should start where correctness can be measured and audited.

The “Strawberry” Problem and Jagged Intelligence

Karpathy pointed to famous examples of jagged model behaviour, including the old failure where models could not reliably count the number of “r” letters in the word “strawberry.” That particular weakness has largely been patched in newer systems. But the phenomenon remains.

His updated example is even more revealing: if someone says the car wash is 50 metres away and asks whether they should drive or walk, top models can still respond incorrectly and recommend walking, missing the commonsense issue that the point of going to the car wash is to clean the car.

This is what makes modern AI so disorienting. It can identify vulnerabilities in software yet miss obvious real-world implications.

For Canadian tech leaders, this should temper both blind optimism and lazy skepticism.

The wrong conclusion is that AI is useless because it makes silly mistakes.

The equally wrong conclusion is that AI is generally intelligent because it performs elite tasks in narrow domains.

The truth is more operationally useful: AI is a jagged form of intelligence. It is highly capable in areas where data, rewards, and verifiability align. It is uneven elsewhere.

That is a far better lens for strategic planning.

Why Coding Has Become AI’s Killer App

There is also an economic reason coding stands out. AI labs are strongly incentivized to improve model performance in software development because the commercial value is obvious.

Code has several advantages:

It is highly verifiable.
It has abundant public training data.
Enterprise customers will pay significant amounts for better coding performance.
It can be used to improve the tools used to build more AI systems.

Incentives shape outcomes. Labs optimize where the rewards are largest.

This should matter to Canadian tech founders deciding where to build. If a startup is entering a highly verifiable domain already attractive to the frontier labs, there is a real risk that the platform providers will eventually absorb that capability themselves.

That does not mean startups cannot win. It means they need a sharper strategy.

What Founders Should Build and What They Should Avoid

Karpathy’s advice to founders is subtle but powerful. If a domain is highly verifiable, it may be tractable because teams can use reinforcement learning, fine-tuning, and strong evaluation loops to make systems better. But that also makes it easier for major labs to move in.

For Canadian tech entrepreneurs, the takeaway is twofold:

Verifiable domains are technically promising. If a startup has proprietary data, strong workflow access, or unique customer distribution, it may still build a strong business there.
Verifiable domains are strategically dangerous. If the only moat is generic capability in a domain the labs can easily optimize for, defensibility may be weak.

This is especially relevant in the Canadian tech ecosystem, where startups often need to be more capital-efficient than Silicon Valley rivals. A company in Toronto, Waterloo, Montreal, or Vancouver cannot assume that a generic AI layer is enough. It needs access, trust, workflow integration, distribution, or specialized expertise.

Karpathy also made a provocative point: almost everything may eventually become verifiable to some extent. That suggests many tasks that currently appear safe from automation may only be safe temporarily.

For business leaders, this is a wake-up call. Automation risk is not defined only by whether a task feels complex. It is defined by whether the task can be measured, rewarded, and improved.

Vibe Coding vs. Agentic Engineering

One of the most useful distinctions Karpathy drew is between vibe coding and agentic engineering.

Vibe coding raises the floor

It enables more people to create software, even if they lack deep programming expertise. Someone can describe an app, iterate with a coding model, and produce something functional. This opens software creation to a much larger group.

For Canadian tech, that means more experimentation inside startups, more internal tools inside enterprises, and lower barriers for product validation.

Agentic engineering raises the ceiling

This is not casual prompting. It is the discipline of using AI agents to accelerate professional software development while preserving the quality bar.

That includes:

Coordinating multiple agents
Maintaining security standards
Preventing vulnerabilities
Overseeing architecture
Managing deployment and testing workflows
Keeping output aligned with production requirements

This distinction should shape hiring and training decisions across Canadian tech organizations. The future role of engineers is not disappearing. It is changing. Teams need people who can orchestrate AI systems, enforce standards, and apply judgment at a higher level of abstraction.

The Human Role: Taste, Judgment, and Oversight

Karpathy’s view is that humans still matter because they provide aesthetics, taste, judgment, and direction. Even if AI writes a large portion of the code, the human remains responsible for deciding what should be built, how quality is defined, and whether the result is elegant or brittle.

He also noted a common frustration with current model output: it often works, but the code can be bloated, repetitive, awkwardly abstracted, and fragile.

That is a key issue for Canadian tech teams deploying AI at scale. Functional output is not enough. Maintainability matters. Security matters. Reliability matters.

The open question is whether models will improve enough to internalize “taste” in engineering and beyond. Karpathy believes there is nothing fundamentally preventing that. If labs introduce the right reward structures, models may become more elegant over time.

But even if that happens, organizations will still need leaders who understand the problem deeply enough to define what good actually looks like.

You can outsource your thinking, but you can’t outsource your understanding.

This may be the most important management principle in the current AI era.

Executives can delegate analysis to AI. Teams can automate implementation. But understanding the market, the user need, the strategic context, and the reason the work matters cannot be fully delegated without risk.

Animals vs. Ghosts: What AI Actually Is

Karpathy has also described modern AI systems not as animals, but as ghosts. The point is not to be mystical. It is to emphasize that these systems are not biological intelligences shaped by evolution, intrinsic motivation, curiosity, or embodied experience.

They are statistical entities shaped by data, optimization, and reward functions.

This framing matters because it prevents category errors. AI may appear conversational, agentic, and even creative, but that does not mean it “understands” the world in a human way.

For Canadian tech firms deploying AI in customer-facing or mission-critical environments, this perspective is healthy. It encourages disciplined trust rather than emotional trust. Teams should evaluate systems by performance, failure modes, and operational boundaries, not by anthropomorphic intuition.

The Internet Is Being Rebuilt for Agents

Perhaps the biggest product implication is that much of today’s digital infrastructure is still built for humans, not agents. Documentation tells people what to click. Services ask for email addresses, credit cards, copied API keys, and manual setup steps. That friction becomes absurd in an agent-first world.

Karpathy’s complaint is simple and compelling: people should not have to do tasks that agents could do. Products should increasingly provide a single copy-paste instruction or a secure delegated workflow that lets the agent complete the setup.

This is a massive opportunity for Canadian tech.

Agent-first infrastructure could include:

Authentication systems designed for delegated machine access
Billing systems that allow agents to provision services securely
Developer tools with agent-readable setup flows
Communication systems built for agent-to-agent interactions
Enterprise software designed around machine operators rather than only human dashboards

For startups across the GTA and the broader Canadian tech ecosystem, this could become one of the most important platform transitions of the decade. The winners may not just be those building smarter models, but those rebuilding digital workflows around the fact that software agents will increasingly act on behalf of people and organizations.

What This Means for Canadian Businesses Right Now

The strategic lessons for Canadian tech are immediate.

Invest where outputs are verifiable. AI performs best where success can be measured.
Train teams in agentic engineering. Prompting is not enough for production use.
Design products for agents. Human-only workflows will become friction points.
Do not confuse narrow brilliance with general intelligence. AI can excel and fail at the same time.
Protect human judgment. Taste, oversight, and strategic understanding still matter.

This is especially relevant in sectors where Canadian companies can move quickly with focused domain expertise. The organizations that win may not be those with the loudest AI branding. They may be the ones that most clearly understand where AI is dependable, where it is jagged, and how to structure workflows around both realities.

The Smartest Way to Think About AI

AI is not simply becoming “more intelligent” in a smooth, human-like curve. It is becoming more useful in domains where feedback is strong, rewards are clear, and verification is possible. That is why it can look superhuman one moment and bafflingly shallow the next.

For Canadian tech, that is not a reason for confusion. It is a roadmap.

The businesses that understand this jaggedness will make better bets. They will automate the right workflows, build the right tools, hire for the right skills, and avoid false assumptions about what AI can and cannot do.

The future is not about replacing all human work overnight. It is about shifting the abstraction layer. Humans move upward into orchestration, judgment, and understanding. Machines take on more of the execution.

That future is arriving fast. Canadian tech leaders who grasp it now will be far better positioned to shape it.

FAQ

Why is AI so good at coding but bad at simple commonsense questions?

AI performs especially well in domains that are easy to verify, such as coding and math. A model can be rewarded when code compiles, tests pass, or a math answer is correct. Commonsense reasoning is often harder to verify automatically, so training signals are weaker and capabilities remain more uneven.

What does “verifiability” mean in AI?

Verifiability refers to whether the quality of an AI system’s output can be checked reliably. If a system can determine that an answer is right or wrong without extensive human interpretation, the task is more verifiable. Highly verifiable domains are easier to improve with reinforcement learning.

What is the difference between vibe coding and agentic engineering?

Vibe coding is about making software creation more accessible to more people. It raises the floor. Agentic engineering is about using AI agents in professional software environments without sacrificing quality, security, or reliability. It raises the ceiling.

What should Canadian tech founders build in the age of AI?

Canadian tech founders should look for areas where AI can create strong business value, especially where workflows are measurable and customers have real pain points. But they also need defensibility. If a domain is highly verifiable and generic, major labs may eventually absorb it. The strongest opportunities often combine domain expertise, integration, trust, and distribution.

Will software engineers become obsolete?

Not in the near term. The role is changing rather than disappearing. Engineers still need to direct agents, review output, maintain quality, enforce security, and apply architectural judgment. The most valuable engineers will increasingly be those who can orchestrate AI systems effectively.

Why does agent-first infrastructure matter for Canadian tech companies?

Many digital products are still designed around human-operated workflows. As agents become more capable, businesses will need tools, APIs, authentication flows, and services that allow software to act safely on behalf of users and organizations. This creates a major product and platform opportunity for Canadian tech.

Is Canadian tech ready for a world where agents write code, provision services, and negotiate digital workflows on behalf of businesses?

The answer may define who leads the next phase of the AI economy.