From Theory to Practice: How 20 Civil Servants Are Building AI Capability Across Government

This blog post was written by George Woodhams, AI Policy and Programmes Manager, Imperial Policy Forum. 

Nine months ago, twenty senior UK Civil Servants set out to answer a deceptively simple question: how can government harness AI effectively and responsibly? Through the Imperial Policy Forum’s AI Policy Fellowship, they have been developing solutions that matter—from pilot AI tools for the criminal justice system to mapping pathways toward sovereign AI capability.

Owen Jackson, Director of Imperial Policy Forum speaks to our 2025 cohort of AI Policy Fellows.

Drawing on Imperial’s community of researchers working at the cutting edge of explainable, secure, and trustworthy AI, Fellows have grappled with real challenges: what is the public perception of government’s use of AI? How can AI applications be adapted to safety critical systems? How do you build capability without creating new vulnerabilities?

At the programme’s final in-person session—one of four held throughout the Fellowship—participants presented their findings and wrestled with these questions alongside Imperial academics and government stakeholders. What emerged wasn’t just a collection of projects, but a shared understanding of the challenges and opportunities for adopting AI in public service.

Emerging Reasoning Capabilities

Opening the session, Professor Tom Coates delivered a presentation on the rapid evolution of AI reasoning models over the course of the nine-month Fellowship. He explored how advances in

prompt engineering and reinforcement learning are enhancing the reasoning capabilities of large language models (LLMs), offering new ways to improve performance.

Participants then discussed how these emerging techniques could be responsibly and effectively applied within the public sector, considering both their potential and the challenges of real-world implementation.

Embedding Evaluation

As Fellows have discovered, adopting AI isn’t just about choosing the right tool. It’s about knowing whether it’s having the desired impact. Colleagues from i.AI, the government’s AI incubator, and the Evaluation Task Force—a joint Cabinet Office and HM Treasury team championing evaluation best practice—joined the session to provide their expert insights and guidance.

Professor Tom Coates is also the course lead for our AI Fundamentals programmes for civil servants.

Their message was clear: robust evaluation must be embedded throughout the AI adoption lifecycle, not bolted on afterward. The discussion yielded five practical principles that Fellows are already applying in their departments:

1. Establish clear benchmarks before deployment. Robust benchmarks are the foundation for measuring impact. Baseline data—such as processing time, staff effort, or error rates—should be captured before implementation to enable comparison. In the public sector, this can be difficult where no clear “ground truth” exists and processes vary. However, even benchmarks that acknowledge uncertainty remain essential for reporting savings and efficiencies transparently.

2. User feedback is complementary to, not distinct from, model evaluation. User feedback provides insight into usability, accessibility, and operational fit. Formal model evaluation provides quantitative evidence of accuracy, performance, and impact. Both are needed: user insights highlight implementation risks and unintended effects, while evaluation establishes whether observed changes can be attributed to the AI system.

3. Distinguish between monitoring and evaluation. Routine monitoring (e.g. data collection and trend analysis) is essential but not sufficient. Evaluation requires analytical assessment to determine whether observed changes were caused by the AI intervention and whether those changes represent value for money.

4. Use random sampling and experimental methods where feasible. Randomised approaches remain the most reliable way to assess causal impact. The data produced by AI systems can enable cost-effective use of controlled trials or sampling. Properly designed evaluations strengthen confidence in findings, particularly when estimating time or cost savings.

5. Design user surveys with precise and measurable questions. Surveys can complement quantitative data but should focus on specific, time-bound measures (e.g. “How long did this task take last week?” rather than “How much time does this usually take?”). Questions about time spent are more reliable than those about time saved, which are often affected by perception bias.

Fellows’ projects

To conclude the day, Fellows shared insights and learning from their projects. Presentations from the Department of Energy Security and Net Zero, Department of Business and Trade, Food Standards Agency. i.AI and the Department of Science and Innovation and Technology covered the following topics:

  • The economic opportunity for export and import of AI for health in the UK
  • The potential for explainable AI (XAI) to be tailored to safety critical applications
  • How to adapt Large Language Models (LLMs) for evidence synthesis in government
  • The potential impact that AI will have on the Civil Service workforce
  • How to holistically assess the UK’s AI capability and explore the potential of decentralised compute

For the Imperial academics who mentored these projects, the Fellowship has been equally valuable. By working directly with Fellows tackling live policy challenges, researchers gained new perspectives on how cutting-edge AI research can inform, shape, and strengthen government decision-making. Dr Ahmed Fetit, one of the mentors on this year’s Fellowship, observed that through his discussions with policy officials, he “began to see not only the immediate challenges of deploying new technologies in the NHS, but also their long-term consequences on the healthcare system”. He further shared that:

“the experience deepened my appreciation of the need to bridge technical innovation with the realities of policymaking through dialogue”.

What Comes Next

Although this marked the final in-person day of the 2025 Fellowship, the work continues. The Imperial Policy Forum will keep collaborating with our Fellowship alumni as their projects evolve, translating research insights into tangible policy impact.

It’s clear that the UK government must build AI capability that’s both ambitious and accountable, innovative and trustworthy. This Fellowship has demonstrated that it’s possible—but only when policy expertise and technical knowledge work align.

Applications are now open for the 2026 AI Policy Fellowship. Join a growing network of innovators at the intersection of AI and public policy and help shape the future of responsible AI in government.