We recently ran a pilot to explore whether an AI assistant could help people find government information and services online.
A smarter way to navigate
People come to Govt.nz to complete tasks. They’re looking for clear information about a range of government services, like how to apply for something, what steps to take or where to go next. Many are dealing with stressful life events and want help navigating government processes without having to jump between different websites.
Govt.nz already does a great job of bringing together content from across government, and users really value this. We wanted to build on that by exploring how a conversational AI assistant could improve the user experience.
We did this by testing whether it could help people navigate government information and services quickly and reduce cognitive load by:
- offering targeted information
- outlining clear next steps
- responding based on where someone is in their journey.
We started with what we already knew about our users
We knew from feedback and interviews that our users are task-focused, prefer to interact with government online and are increasingly using AI tools to find information. But they also face challenges, especially when navigating services during stressful life events. They want clear, reliable information in one place and they want it to be easy to understand.
Trialling an AI assistant
The conversational assistant was powered by a large language model (LLM), a type of AI that can understand and respond to questions in everyday language. To understand how an assistant could be useful to people, we explored how well 7 different LLMs provided information that:
- was in our voice and tone
- was concise, accurate and easy to understand
- included ‘next steps’.
These were Claude Sonnet and Haiku, Gemini, Meta Llama, Mistral and 2 LLMs provided by Jurassic. We tested the capabilities of each and then Gemini and Claude Sonnet were extensively tested in the pilot.
The AI assistant uses a method called Retrieval-Augmented Generation (RAG) to provide answers, which means it only used our selection of government websites to generate answers from.
This pilot gave us valuable insights about what the technology can do, but also about what people actually need when interacting with government.
It showed us where AI can add value and where it needs improvement.
People want tools that make government easier to use
One of the clearest findings from the pilot was that there’s strong demand for tools that simplify finding information about public services.
People often struggle to find the right information — especially when it involves multiple agencies or complex eligibility rules. The AI assistant helps bridge those gaps by guiding users through the information and showing how it applies to their personal situations.
In these cases, using the assistant was quicker than searching online because it gave people information without needing to go through lots of websites. 85% of the study participants stated that their experience using the tool to find government information was more efficient than their previous methods.
The assistant did not just repeat what was on websites. It interpreted it in a way that made sense to users. This conversational approach made the experience feel more natural and less intimidating. Users reported that the conversational format also made the information easier to use than other channels.
Conversation style helps people understand and engage
The assistant’s conversational design was one of its biggest strengths.
Users could ask questions in their own words and get clear, plain-language answers. The back-and-forth format helps people to cut straight to the relevant information without overwhelming them with irrelevant information.
Our testers liked this approach. It made the experience feel more human and helped users feel supported. This shows that a well-designed conversational flow can make a big difference to how people understand and use government information.
An AI assistant is only as good as the content behind it
The assistant’s performance depended heavily on the quality of the content it used. If the source information was outdated, inconsistent or poorly structured, the assistant struggled to give accurate or helpful answers.
Even the smartest AI cannot overcome poor source material. To make AI tools useful, government content needs to be:
- cleaned up
- made consistent
- kept current.
That means working together to improve how information is written, labelled and organised. Agencies also need to work together to reduce content duplication.
Design content for AI and your users: what works and why
Clear structure helped the assistant give better answers
Accuracy of responses also depended on how content is organised. During testing, the assistant sometimes pulled in outdated or less reliable information when better content was available. This happened because there were no clear rules about which sources to trust over others.
To fix this so that AI can find the right information, we need to:
- give direction on how to prioritise trusted sources
- tag topics clearly
- structure websites well.
Strong AI performance depends on both good content and meta data. That means having structured processes for tagging, metadata and source prioritisation.
The assistant needs to understand intent, not just respond
LLMs are designed to respond confidently — and that can be a problem.
During testing, the assistant sometimes misunderstood what users meant or gave incorrect answers with too much certainty. This was especially concerning in sensitive or complex situations.
Thoughtful clarifying questions and building in safeguards against overconfidence helped to make the tool more accurate and safer. But more work is needed to help LLMs understand user intent.
Trust comes from transparency
Even when the assistant was mostly accurate, occasional mistakes (like made-up links or incorrect details) undermined user confidence. Transparency is key. Users need to know where information come from and be able to check it.
We found that showing sources, instructing the assistant to flag when it was uncertain and being honest about limitations helped users feel more confident using the assistant. This led to higher levels of trust, even when answers were not perfect.
Trust is not just about getting things right, it’s also about being clear when things might be wrong.
Users need to understand the assistant’s limitations
Generative AI is powerful, but it’s not perfect. The assistant was helpful, but there’s always a risk of incorrect or misleading answers. To maintain trust, users need to understand these limitations.
This could involve using disclaimers, encouraging users to verify information and making it clear that the assistant’s responses are not guaranteed to be 100% correct.
When users understand the assistant’s strengths and weaknesses, they’re more likely to use it confidently and responsibly.
Simplicity and consistency improve user experience
Users preferred answers that were clear, consistent and easy to follow. Overly long or inconsistent responses made users less confident and more likely to disengage.
Simplicity and consistency are not just nice to have, they’re important for building trust and usability. To support this, we used structured response templates, embedded hyperlinks and relevance filters to keep answers focused and actionable.
The assistant should guide, not replace, transactional tools
The assistant worked well when guiding users but sometimes tried to handle transactions itself — like filling out forms or determining how users met eligibility criteria. This caused confusion and duplicated existing tools on agency websites.
We learnt that AI assistants should support, not replace transactional systems. Routing logic and channel-aware responses can guide users to the right tools and services, without trying to do the job of those systems.
The assistant can help improve government content
One unexpected benefit of the pilot was discovering how the assistant could help improve content across government. It flagged issues like duplicate, outdated or contradictory information and highlighted inconsistencies in how the content was labelled.
Agencies could use the assistant as a feedback tool — not just to help users, but to test how their content performs in other AI-driven environments like Google’s AI summaries. There are opportunities to make government content more consistent, accessible and AI ready.
Prompt engineering improves tone, accuracy and referrals
How we instruct an AI assistant matters. By designing specific prompts and personas for govt.nz, we saw major improvements in how the assistant responded. In early testing, we saw large improvements in LLM performance after implementing prompt engineering. Performance improved by between 77% and 115%.
These changes improved not just accuracy but also tone and the assistant’s ability to guide users to digital self-service options. This highlights the value of investing in prompt design and continuing to experiment to better meet user needs.
Train LLMs to handle diverse and high-risk situations in a government context
Not all AI models are suitable for government use. In our testing, some LLMs maintained appropriate tone and persona during violence and aggression threshold tests. Other LLMs sometimes responded inappropriately, even encouraging harmful actions.
This highlights the importance of selecting models that can be trained to follow ethical guidelines and respond safely in high-risk scenarios.
Further research is needed to ensure LLMs can represent government appropriately across diverse situations.
Final thoughts
The pilot confirmed that LLMs can play a meaningful role in improving digital public services. But it also reminded us that technology is only part of the solution. What people value the most is clarity, relevance and ease of use.
The assistant is not a replacement for existing tools. It’s a complement.
By guiding users, surfacing relevant information and pointing them to the right services, it can help make government more accessible and easier to navigate. The goal is not just smarter technology — it’s a better experience for everyone.
Find out more
If you have questions about the pilot or the findings, email govt.nz@dia.govt.nz.
Published