Study Outline & Research Plan

Software Engineering
in the Age of AI

How AI is reshaping agile software development — a two-dimensional framework spanning four modes of AI engagement and five organisational contexts

Strands OF WITH THROUGH TO

The Argument

For two decades, agile software development — and Scrum in particular — has been the dominant paradigm for how software is designed, developed, and delivered. Its core assumptions are human-centred: small cross-functional teams, iterative delivery, continuous feedback, collective ownership of code, and emergent design through collaboration. These assumptions now face their most significant challenge.

AI is not merely adding a new tool to the developer's workbench. It is simultaneously changing what software engineers build, how they build it, the medium through which they build it, and — most disruptively — who (or what) does the building. Each of these four shifts interacts differently with agile principles, and each plays out differently depending on the kind of organisation and software being created.

Most commentary treats "AI and software development" as a single conversation. This study proposes a two-dimensional framework that disentangles it. The first dimension is the mode of AI engagement — four prepositions (of, with, through, to) that describe qualitatively different relationships between software engineering and AI. The second dimension is the organisational context — five categories that capture the radically different constraints, incentives, and cultures within which agile practices operate.

Central Research Question

How has AI changed the agile methodology, and how must agile methodologies — especially Scrum — evolve in response across different modes of AI engagement and organisational contexts?

The Agile Baseline

Establishing what AI is disrupting: the assumptions, practices, and roles that Scrum codifies

Scrum as the Dominant Paradigm

Scrum has become the de facto standard for organising software delivery work. Its adoption across industries — from startups to government — makes it the right baseline against which to measure AI's impact. But that baseline needs to be examined critically: Scrum was designed for a world where all productive work was done by humans, where code was written line by line, and where the primary bottleneck was coordination between people.

Every core Scrum element — roles, ceremonies, artefacts, and principles — now faces questions that its creators never anticipated. The study will examine each element through the lens of AI disruption.

Roles

Product Owner, Scrum Master, Development Team — designed for human-only teams. What happens when AI is a team member?

Ceremonies

Sprint Planning, Daily Standup, Sprint Review, Retrospective — paced for human velocity. AI-augmented velocity changes the calculus.

Artefacts

Product Backlog, Sprint Backlog, Increment — sized for human estimation. Story points and velocity metrics lose meaning if AI changes throughput non-linearly.

Principles

Self-organisation, cross-functionality, inspect-and-adapt — rooted in human cognition and social dynamics. How do they translate to hybrid human-AI teams?

The study will not take an uncritical view of Scrum. Agile practices were already under strain before AI — from "Scrum theatre" (going through the motions without embracing the principles), over-ceremonialisation, difficulties scaling to large organisations, and tension between agile's preference for emergent architecture and the needs of complex enterprise systems. AI arrives into this already-contested landscape, and in some cases it may resolve longstanding problems rather than create new ones.

Dimension 1 — Four Modes of AI Engagement

An escalating spectrum of entanglement between software engineering and AI, from building AI systems to transferring engineering agency

OF AI

Building AI Systems

The engineering discipline required to design, build, test, deploy, and maintain AI-powered software at production scale. Scrum meets stochastic behaviour, data-code entanglement, and ML experimentation culture.

WITH AI

AI-Augmented Development

Using AI copilots, code generators, and test assistants to enhance human developers' productivity. Scrum's velocity metrics, estimation, and review practices face recalibration.

THROUGH AI

AI as the Medium

Creating software by describing intent to AI systems that generate the implementation. Scrum's assumption that development is the bottleneck gives way to specification and validation as the critical path.

TO AI

Handing Over to AI

The transfer of software engineering agency to AI systems — from autonomous maintenance through to independent development. Scrum's human-centred model confronts non-human agency.

Dimension 2 — Five Organisational Contexts

The same AI capability has radically different implications depending on what kind of organisation is building what kind of software

Corporate IT

Internal applications, systems integration, digital transformation. Scrum used for in-house teams and managed service providers. Often constrained by legacy, governance, and risk aversion.

ISVs

Independent Software Vendors building commercial products. Scrum at the core of product development. Competitive pressure to adopt AI first. Revenue tied to feature velocity.

Enterprise Software

Large-scale ERP, CRM, platform vendors. Massive codebases, complex release trains. SAFe and scaled agile frameworks. Ecosystem of partners and integrators.

Startups & SMEs

Small teams, high agility, resource-constrained. Scrum often informal or adapted. AI adoption driven by existential competitive pressure and the promise of "doing more with less."

Safety-Critical

Aerospace, medical devices, autonomous vehicles, defence. Regulated development processes that sit uneasily with agile. AI introduces novel assurance challenges around non-determinism.

The Two-Dimensional Matrix

How each AI engagement mode interacts with each organisational context — the core analytical structure of the study

	SE of AI	SE with AI	SE through AI	SE to AI
Corporate IT	Building internal AI/ML features; MLOps immaturity; data governance as blockerScrum teams lack ML expertise; sprint cycles misaligned with experimentation	Copilot adoption for legacy modernisation and integration work; productivity claims vs. security riskVelocity metrics disrupted; code review burden shifts	Business analysts generating internal tools via AI; citizen development at scaleProduct Owner role transforms; sprint planning becomes specification review	Autonomous maintenance of legacy estate; AI-driven incident responseTeam composition questions; what does an "agile team" manage vs. oversee?
ISVs	AI-native product development; evaluation and safety as product concernsSprint definitions of "done" must include AI-specific quality gates	Competitive pressure to maximise developer velocity; talent strategy implicationsEstimation models broken; pair programming redefined as human-AI pairing	Rapid prototyping and MVP generation; AI-first product designSprint zero collapses; time-to-first-increment measured in hours not weeks	AI-maintained products; autonomous feature development from usage dataProduct Owner as AI supervisor; continuous deployment becomes continuous generation
Enterprise Software	AI features in platforms (Salesforce Einstein, SAP Joule); scaling AI across product suitesSAFe meets MLOps; programme-level coordination of AI capabilities	Thousands of developers using AI tools; governance at scale; IP riskScaled agile ceremonies strained; cross-team AI-generated code dependencies	Platform-level code generation; AI-driven customisation and configurationPartner/integrator ecosystem disruption; who configures vs. who builds?	Self-evolving enterprise platforms; autonomous patching and migrationRelease trains become autonomous; human governance model for AI-driven evolution
Startups & SMEs	AI-native startups; ML as core product; lean teams building complex systemsScrum informality helps; but testing and evaluation discipline often lacking	Force multiplication for small teams; founder-developers using AI extensively"10x developer" thesis tested; dependency risk if AI tools change or pricing shifts	Non-technical founders building MVPs; solo developers creating complex productsScrum may be unnecessary for one-person AI-assisted development; new frameworks needed	AI-maintained products after founding team moves on; autonomous SaaSBusiness model implications; software as self-sustaining entity
Safety-Critical	AI in medical devices, autonomous vehicles, avionics; certification challengesSprint-based delivery in tension with V&V requirements and regulatory cycles	Cautious adoption; AI-generated code in regulated environments; audit trailsTraceability requirements intensify; every AI suggestion must be documented	Largely blocked by regulatory constraints; specification languages more viable than NLFormal methods revival? AI generating provably correct implementations	Autonomous safety-critical systems: the hardest governance challengeAccountability frameworks; certification of AI-as-engineer; regulatory evolution

Each cell in this matrix represents a distinct research territory with its own evidence base, challenges, and implications for agile practice. The study will not treat all 20 cells equally — but it will identify which cells are most consequential and where the evidence is strongest.

STRAND 01

Software Engineering of AI

Building AI systems — particularly those based on large language models, multimodal architectures, and agentic workflows — demands fundamentally different approaches to testing, deployment, versioning, and quality assurance. Classical software engineering assumes deterministic behaviour: given the same inputs, a well-built system produces the same outputs. AI systems are inherently stochastic. This single fact ripples through every Scrum practice.

The data-code entanglement at the heart of ML systems means the training data is as much a part of the system's behaviour as its source code. Scrum's definition of "done" must expand to include data quality, model evaluation, bias assessment, and safety testing — none of which fit neatly into a two-week sprint.

A critical tension exists between ML research culture (notebooks, experimentation, rapid iteration on model architectures) and production engineering culture (reliability, observability, incident response). Scrum sits uncomfortably between these two cultures, and many organisations have developed parallel processes rather than adapting Scrum itself.

Core Research Questions

How should Scrum's definition of "done" adapt for AI/ML deliverables?
What sprint structures accommodate the experimentation-to-production pipeline?
How do teams estimate AI/ML work when outcomes are probabilistic?
What new roles are emerging (ML engineer, AI safety lead) and where do they sit in Scrum?

Key Topics

MLOps and LLMOps maturity within agile frameworks
Evaluation-driven development as agile practice
Data pipeline engineering and sprint planning
Safety engineering, red-teaming, and responsible AI as sprint activities
The research-production handoff and dual-track agile

Impact on Scrum Practices

Sprint Planning: Work estimation is fundamentally harder. Model training runs have uncertain duration; evaluation may reveal the approach is a dead end, invalidating sprint commitments. Spike-heavy planning becomes the norm.

Definition of Done: Must expand to include model evaluation benchmarks, bias testing, safety assessment, data quality validation — significantly lengthening the path to "done."

Retrospectives: Become the most valuable ceremony, as experimentation-heavy work generates more learning than delivery per sprint. Retrospectives must capture experimental findings, not just process improvements.

Variation by Organisational Context

Corporate IT Building internal AI features without ML expertise on the team; reliance on platform services (Azure ML, AWS SageMaker) to abstract complexity

ISVs AI as product differentiator; evaluation and safety become product quality concerns visible to customers

Enterprise AI features across product suites require programme-level coordination; SAFe meets MLOps at scale

Startups AI-native startups where ML is the product; lean teams, informal Scrum, but often lacking production engineering discipline

Safety-Critical Certification of AI systems requires evidence far beyond Scrum's artefacts; regulatory cycles orthogonal to sprint cycles

STRAND 02

Software Engineering with AI

This is the strand receiving the most attention today, driven by explosive adoption of AI coding assistants — GitHub Copilot, Cursor, Claude Code, Amazon CodeWhisperer. The core proposition is augmentation: AI handles the mechanical aspects of development while humans focus on design, architecture, and judgement.

The evidence is substantial but contradictory. Studies show productivity gains of 20–55% on certain task types, but unevenly distributed. Junior developers often see the largest speedups on boilerplate; senior developers gain most on exploration and unfamiliar codebases. Documented risks include automation complacency (accepting generated code without scrutiny) and skill atrophy (eroding foundational understanding).

For Scrum, the "with" strand creates an immediate practical crisis: velocity metrics lose their meaning. If a developer who previously completed 8 story points per sprint now completes 20, has the team's capacity tripled? Or has the definition of a story point shifted? Sprint planning, backlog grooming, and capacity forecasting all need recalibration — and most teams are improvising rather than working from a principled framework.

The Klarna problem is instructive: early claims of dramatic headcount reduction through AI augmentation were walked back as the organisation discovered the hidden costs of AI-generated technical debt, reduced system comprehension, and institutional knowledge loss.

Core Research Questions

How should Scrum teams recalibrate estimation and velocity when AI augments output?
What does code review look like when a significant fraction of code is AI-generated?
How do organisations balance productivity gains against technical debt and skill erosion?
What new team agreements and working practices are emerging?

Key Topics

Velocity recalibration and estimation in AI-augmented teams
Human-AI pair programming as a new working practice
Code review burden shift and quality assurance
Intellectual property and licensing of AI-generated outputs
Security implications and supply-chain risk
Developer training: building skill vs. building dependency

Impact on Scrum Practices

Sprint Planning: Story point estimation breaks down. Teams need new calibration — perhaps measuring complexity-of-specification rather than implementation effort, or introducing AI-adjusted velocity baselines.

Daily Standup: "What did you do yesterday?" becomes partly "What did you prompt for and validate yesterday?" New norms needed around AI tool usage transparency and shared understanding of AI-generated code.

Code Review / Sprint Review: The review burden shifts dramatically. AI-generated code may be syntactically correct but architecturally incoherent, introducing subtle quality problems that only surface later. Reviews must become more architectural and less syntactic.

Team Composition: The optimal team size and skill mix may change. If AI augmentation makes each developer 2-3× more productive on implementation, do teams shrink? Or do they redirect freed capacity toward design, testing, and user research?

Variation by Organisational Context

Corporate IT Risk-averse adoption; IP concerns with AI tools seeing proprietary code; governance frameworks being developed reactively

ISVs Aggressive adoption driven by competitive pressure; developer experience as talent attraction; licensing complexity

Enterprise Thousands of developers; governance at scale; cross-team dependency risks from AI-generated code inconsistencies

Startups Near-universal adoption; "10x developer" thesis tested daily; existential dependency on AI tool availability and pricing

Safety-Critical Cautious adoption; every AI suggestion requires audit trail; traceability requirements may make AI tools counterproductive

STRAND 03

Software Engineering through AI

This is the paradigm shift strand. Where "with AI" treats AI as a tool within an essentially unchanged process, "through AI" reconceives the process itself. Software is created not by writing code but by describing intent — through natural language, examples, constraints, and feedback — to AI systems that generate the implementation.

This includes prompt-driven development, agent-orchestrated engineering (AI agents that decompose requirements, generate code, write tests, and iterate autonomously), and specification-first development where formal or semi-formal specifications are implemented by AI.

For Scrum, this strand is the most disruptive because it challenges Scrum's fundamental assumption: that development is the bottleneck. If a working application can be generated from a detailed specification in hours rather than sprints, the entire sprint cadence, estimation model, and team structure require rethinking. The bottleneck shifts to specification quality, validation thoroughness, and architectural coherence across generated components.

The critical open question is ceiling complexity: at what scale does "through AI" break down? Evidence suggests it works well for small to medium applications but struggles with large-scale systems requiring deep domain knowledge, complex state management, and integration across organisational boundaries.

Core Research Questions

When development is no longer the bottleneck, what is Scrum for?
What specification and intent-description skills does this paradigm demand?
Where is the ceiling complexity, and how fast is it rising?
What new roles emerge: specification engineer, AI solution architect, validation lead?

Key Topics

The prompt-to-application pipeline and its maturity
AI coding agents (Devin, Claude Code, Codex) and agentic workflows
Natural language as a programming interface
Specification engineering as a new discipline
Verification and validation of AI-generated systems
The democratisation thesis: who gains access to software creation?

Impact on Scrum Practices

Sprint Cadence: Two-week sprints may be too long when generation is fast and too short when validation is thorough. A dual rhythm may emerge: rapid generation cycles nested within longer validation and integration sprints.

Product Owner Role: Transforms from prioritiser-of-features into specifier-of-intent. The PO becomes the primary "developer" in a meaningful sense — their specifications are the input that produces software. This demands far more technical fluency than traditional PO roles assumed.

Definition of Done: Shifts from "code works and is tested" to "generated system is validated, architecturally sound, maintainable, secure, and aligned with specification." Validation becomes the heavyweight activity, not implementation.

Team Structure: The classic Scrum team of 5–9 developers may give way to smaller teams of specifiers, validators, and AI orchestrators — or even solo practitioners who can generate significant systems independently.

Variation by Organisational Context

Corporate IT Business analysts generating internal tools; citizen development at scale; governance challenge of ungoverned AI-built applications

ISVs Rapid prototyping and MVP generation; AI-first product design; time-to-first-increment collapsing

Enterprise Platform-level code generation; AI-driven customisation; partner ecosystem disruption as configuration replaces development

Startups Non-technical founders building MVPs; solo developers creating complex products; Scrum may be unnecessary at this scale

Safety-Critical Largely blocked by regulatory constraints; formal methods and provably correct generation more promising than NL-driven approaches

STRAND 04

Software Engineering to AI

This is the most speculative and most consequential strand. "To AI" captures the transfer of engineering agency from human practitioners to AI systems. It ranges from the near-term (AI systems that autonomously maintain, optimise, and evolve existing codebases) to the long-term (AI as the primary "engineer" with humans in supervisory or governance roles).

The distinction from "through" is agency and autonomy. In the "through" paradigm, humans remain the initiators and decision-makers. In the "to" paradigm, AI systems possess sufficient context, judgement, and capability to act independently across the software lifecycle — identifying what needs to be built, making architectural choices, and managing trade-offs.

For Scrum, this strand poses an existential question: what is the role of a human development process when the development is not done by humans? Scrum was designed to coordinate human effort. If AI performs the engineering, the framework must evolve from a development methodology into a governance and oversight methodology — closer to an audit and assurance function than a delivery framework.

The workforce implications require honest analysis. If even a partial transfer of agency occurs, demand shifts from implementation skills to governance, oversight, domain expertise, and ethical judgement. The profession does not disappear but is fundamentally reshaped.

Core Research Questions

What does meaningful transfer of engineering agency look like in practice today?
What governance frameworks replace delivery frameworks when AI is the builder?
How does the software engineering profession adapt, and what roles persist?
What accountability and liability structures are needed?

Key Topics

Autonomous code maintenance, refactoring, and evolution
Self-healing and self-optimising systems
Human-in-the-loop vs. human-on-the-loop vs. human-out-of-the-loop
The future of the software engineering profession
Accountability and liability in autonomous AI development
From Scrum to governance: what replaces sprint ceremonies?

Impact on Scrum Practices

Fundamental Reframing: Scrum evolves from a delivery methodology to a governance methodology. Sprint Reviews become audit reviews. Retrospectives become governance assessments. The Scrum Master becomes a process assurance role.

Product Owner: Becomes the primary human authority — the person who defines intent, sets boundaries, and accepts or rejects AI-generated evolution. This is a more consequential and demanding role than today's PO.

The Team: "Development Team" gives way to "Oversight Team" or "Governance Team." Membership shifts toward domain experts, quality engineers, security specialists, and ethicists — people who can evaluate AI's work rather than perform it.

Cadence: Sprint cadence may decouple from development cadence entirely. AI may develop continuously while human governance operates on a review cadence — daily, weekly, or event-triggered rather than sprint-aligned.

Variation by Organisational Context

Corporate IT Autonomous maintenance of legacy estate; AI-driven incident response; the shrinking internal IT team as overseer not builder

ISVs AI-maintained products; autonomous feature development from usage data; continuous generation replaces continuous deployment

Enterprise Self-evolving platforms; autonomous patching and migration; release trains become autonomous; human governance layer

Startups Software as self-sustaining entity after founding team moves on; AI-maintained SaaS; new business models emerge

Safety-Critical The hardest governance challenge; certification of AI-as-engineer; regulatory frameworks lag far behind capability

Cross-Cutting Themes

Analytical lenses to apply across both dimensions of the framework

Themes That Span the Matrix

Several themes cut across all four strands and all five contexts. These are not separate categories but lenses that must be applied systematically to each cell of the matrix:

Agile Evolution How must Scrum roles, ceremonies, artefacts, and principles adapt in each cell? Where does Scrum break, and what replaces it?

Skills & Workforce What skills do software engineers need in each mode? How does education, hiring, and career development change?

Quality & Trust How do testing, verification, and assurance practices adapt when AI is the builder, tool, medium, or autonomous agent?

Governance & Ethics What accountability frameworks are needed? How does responsibility shift when AI agency increases?

Economics & Value How does each strand affect cost structure, speed, team sizing, and business models of software delivery?

UK Policy & the 9% Problem What do these shifts mean for UK digital strategy, public sector delivery, and the persistent failure of major technology programmes?

Study Plan

A phased approach to producing an authoritative, evidence-based analysis grounded in practitioner reality

Target Audiences

CTOs & CIOs Engineering Leaders Scrum Masters & Agile Coaches Policy Makers Educators Software Practitioners

Research Methods

Literature & Evidence Review

Systematic review of academic research, industry studies, and practitioner evidence. Priority: separating measured outcomes from vendor claims, particularly on AI productivity metrics.

Expert Interviews (20–25)

Semi-structured interviews spanning: enterprise Scrum Masters, ML engineering leads, ISV CTOs, safety-critical SE practitioners, agile coaches adapting to AI, and CS educators revising curricula.

Case Studies (12–15)

Deep-dive cases across the matrix: 2–3 per organisational context, selected to cover at least two AI engagement strands each. Documented evidence over self-reported narratives.

Practitioner Survey (250+)

Quantitative survey of software professionals mapping: current AI tool adoption, perceived agile practice impact, workforce concerns, and organisational readiness — segmented by the five contexts.

Phased Timeline

Phase 01

Weeks 1–3

Framework Validation & Scoping

Validate the two-dimensional framework (4 strands × 5 contexts) with 6–8 expert reviewers. Test the Agile/Scrum baseline assumptions. Refine the matrix — identify which cells are most consequential and where evidence is strongest. Develop interview protocol segmented by organisational context. Design survey instrument with context-aware routing.

Validated framework Matrix prioritisation Interview protocol Survey instrument

Phase 02

Weeks 4–8

Primary Research & Evidence Gathering

Conduct 20–25 expert interviews across organisational contexts and AI strands. Deploy practitioner survey targeting 250+ respondents. Begin case study identification and data collection — prioritising cases where agile practices have been explicitly adapted in response to AI. Build evidence matrix mapping findings to framework cells.

Interview transcripts Survey data (raw) Case study shortlist Evidence matrix (draft)

Phase 03

Weeks 9–13

Analysis & Synthesis

Analyse findings per strand and per organisational context. Develop the "Agile Evolution Model" — a maturity framework for how Scrum practices adapt across the four strands. Draft case studies. Conduct cross-cutting theme analysis. Assess UK policy implications with specific attention to the 9% problem and public sector delivery.

Strand analyses ×4 Context analyses ×5 Agile Evolution Model Case studies (draft)

Phase 04

Weeks 14–17

Drafting & Peer Review

Write the full study report structured around both dimensions. Produce the executive summary and practitioner-oriented recommendations. Develop the "Agile-AI Readiness Assessment" — a self-assessment tool for teams. Submit for expert peer review from both agile and AI communities.

Full report draft Executive summary Readiness assessment tool Peer review feedback

Phase 05

Weeks 18–20

Publication & Dissemination

Finalise report incorporating reviewer feedback. Produce derivative outputs: four-strand summary infographic, matrix poster, newsletter series (one per strand + one per context = 9 editions), policy brief, interactive web version, and presentation deck. Launch via Digital Leaders Network, Digital Economy Dispatches, LinkedIn newsletter, and targeted events.

Final report Policy brief Newsletter series ×9 Presentation deck Web interactive Assessment tool

Proposed Report Structure

Approximately 30,000–35,000 words of substantive analysis, structured to serve both strand-focused and context-focused readers

Proposed Chapters 1. Introduction: Four Relationships, Five Contexts, One Methodology Under Pressure The argument · The two-dimensional framework · Why Scrum as baseline · Methodology 2. The Agile Baseline: What AI is Disrupting Scrum's core assumptions · The state of agile before AI · Known weaknesses and pre-existing strains 3. The Engineering of AI — Building for Stochastic Behaviour Agile impact · Context variations · Case studies · Emerging practices 4. Engineering with AI — Augmentation and Its Discontents Productivity evidence · Velocity recalibration · Klarna and the hidden costs · Context variations 5. Engineering through AI — Intent as Interface The paradigm shift · Ceiling complexity · New roles · Scrum reimagined for specification-first work 6. Engineering to AI — The Transfer of Agency From delivery to governance · Workforce implications · Accountability · The long view 7. The Context Dimension: How Organisational Type Shapes Everything Corporate IT · ISVs · Enterprise Software · Startups & SMEs · Safety-Critical · Comparative analysis 8. Cross-Cutting Analysis: Skills, Quality, Governance, Economics Agile Evolution Model · Workforce transition pathways · Quality assurance in the AI age 9. The UK Context: Policy Implications and the 9% Problem UK digital strategy · Public sector delivery · Education and skills policy · Regulatory readiness 10. The Agile-AI Readiness Assessment Self-assessment framework · Maturity indicators · Transition planning guidance 11. Recommendations for Leaders, Educators, and Policymakers By audience · By organisational context · By time horizon (now, 2 years, 5 years) A. Appendices: Methodology, Case Studies, Survey Data, Evidence Matrix