Voice experiences: 7 things to consider

Voice is quite possibly the most natural form of interaction for humans. So what should you think about when designing voice experiences for brands?

Talking is one of the most natural forms of communication. And it’s no wonder why as we’ve had so much practice. It’s believed talking – in a recognisable form – began almost 500,000 years ago with our Neanderthal predecessors. The link between modern humans and our ancient kin that lead to this conclusion is the fact we both share the hyoid bone. The hyoid is a horse-shape bone that sits high in our throat. Its purpose is to act as an anchoring structure for the tongue and is thus essential for swallowing, coughing… and the creation of sounds. Which becomes important when designing voice experiences for brands.

We’ve grown up with language and speaking almost our entire lives, learning to speak around the same time we take our first steps and developing subtle nuances along the way. The manner in which we speak – as well as body language and physical actions – plays a big role in how we communicate. Which makes it hard to replicate in code.

But let’s face it, humans aren’t going to change how they communicate anytime soon so for voice assistants to be successful we have to teach them to speak the way we do: human. Not robot.

Linguistic principles

When designing voice interfaces we refer to linguistic principles to ensure conversations flow naturally. One of the main areas we look towards when developing voice assistants is the Cooperative Principle suggested by Paul Grice, a philosopher of language. The Cooperative Principle generally means that when two people speak we assume they want to cooperate in order to bring meaning across. To ensure we bring meaning across effectively we have ‘maxims of conversation’ – a set of rules we stick to.

Quality maxim
Be truthful in your conversation.

Quantity maxim
Conversation should be as economical as possible. Giving the maximum information with minimum effort.

Relation maxim
Only reply correctly and be relevant.

Manner maxim
Ensure we give information clearly without confusing the person we’re speaking with.

For conversation to work we rely on each other to be truthful, economical, relevant and clear. These principles are a strong set of guidelines for us test our voice apps against to ensure they feel natural and authentic.

Taking Turns

Another main aspect of spoken language is turn taking. Turn taking organises conversation, allowing participants to speak one at a time and alternating between one another. Turn taking plays an important role in constructing contribution in a conversation, replying to a comment and passing the conversation over to a different speaker. However, this can happen with or without the use of linguistics. You might make eye contact, present a hand gesture, or simply change your position to ‘take the turn’ and allow the conversation to flow.

Voice assistants don’t have this mechanic available to them. Yet. So they have to be incredibly obvious in turn taking which presents a key area of communication where voice assistance feel less natural. To combat this we have to ensure we’re creating exceptional dialogue to progress the conversation quickly and follows the Cooperative Principle.


Context is king. Even though voice assistants are widely available and accessible a report by pwc in February this reveals that 74% of consumers only use voice assistants in the home. Users want privacy when communicating with a voice assistant. This is something to be aware of when creating conversations between users and VAs: we have to make the experience as frictionless as possible by offering simple, natural and valuable conversation in socially safe environments. That’s not to say it will always be this way. If we look back to when mobile phones were invented in the early 70s it took almost 15 years before they became commercially available and widely adopted, changing attitudes and behaviours towards mobile. So much so that mobile phones ended up becoming a status symbol: https://www.youtube.com/watch?v=ZdAM8sy2AqQ&feature=youtu.be&t=31s

Only time will tell if consumer behaviour changes to feel more comfortable using voice assistants in public places.

Describing the user

Developing personas has become such a crucial part of the user experience process. It allows us to identify what a user’s goals, motivations and frustrations might be. With this information UX designers can craft positive user experiences that align with a given target audience. This process is incredibly important when designing a voice user interface.

When we think about developing voice experiences we need to consider the tone of voice and the use of language. Do we want to come across as funny and light hearted, or have a more serious, mature tone? We judge people in a matter of seconds simply by listening to their voice and we actually start to visualise that person’s characteristics such as age, gender, attractiveness, and personality. That means if you don’t design your personas you’re at risk of being associated with negative characteristics that misrepresent your brand. Most brands will have tone of voice guidelines already, allowing them to use this as a guide for writing copy. We can make use of the same documentation. But it’s also important to review how your voice assistants deliver information to your audience. For example if your voice application reads content from your blog do your users want to hear the entire post, a snippet or perhaps a weekly round up? What’s more important to your users when interacting with your brand through voice? Personas are crucial in helping to answer these questions and deliver an experience that aligns with needs and expectations.


Written dialogue and spoken dialogue are very different. We need to remember that when designing voice interfaces so we test our conversations out loud to see if they flow correctly. Any text we code into the experience has to be relatable.

Recovering from errors

People don’t like to experience errors when using interfaces. They feel it’s their fault and get angry or confused. However, the majority of users will try again so it’s important to ensure you have a solid repair strategy to pick up the conversation and rebuild trust. When developing voice interfaces a large part of the design goes into detecting errors, recovering from them, and getting the conversation back on track.

Amazon Alexa’s guidelines breaks this down into four key areas:

Provide guidance
If a user says something Alexa doesn’t understand Alexa gives developers the option to re-prompt the user for a response. If the prompt is activated it gives Alexa the chance to ask users for clear information.

Offer a way out
Occasionally Alexa will mishear what a user says and may open up the wrong skill. In these situations users need to have an obvious way of exiting the skill during the conversation. It’s the same level of frustration users experience when a popup appears on a webpage but there’s no obvious way of closing it.

Don’t blame the user
Amazon recommends developers and VUIs handle errors gracefully with further questioning and not come across as too apologetic in the process. Otherwise users start to lose confidence in the skill or – worse – the brand.

Expect the unexpected
When designing for voice there are far more possibilities for the user to respond with as opposed to a touch display. Voice experiences must undergo a serious amount of stress testing to discover unexpected requests from users. It’s also crucial that a skill’s performance is monitored and customer feedback is taken on board to continually improve the skill to make it even more robust.

Cross platform experiences

Voice assistants are integrating into millions of devices – from smart speakers to cars, our phones and kitchen appliances. Not only can we interact with voice assistants on a huge range of devices, but we’re also able to continue the conversation and experience from one device to the next using different actions such as text, voice, gestures and touch.

Designing for such a wide range of interactions on one system is known as Multimodal Interaction. For example, FX Digital recently built an Alexa skill for our client PrimeResi that has touch screen display features enabled for the Echo Show. The skill enables users to make a request via voice to inform them of PrimeResi’s daily news briefing. The results consist of news headlines and featured images for each article that display on a touch screen. This gives users two forms of interaction: touch screen to swipe through the various news articles, or use their voice to hear more information on a particular story. It’s important to think about all the different ways information can be displayed on the various devices available to users and how this will affect the way they interact with your experience.

The future of voice

Just as there are people today who have been brought up not knowing a life without a smartphone or touch screen device the same shift will happen for voice assistants. They’ll become intimately integrated into our lives based on the current adoption rates. Which is why it’s such an incredibly exciting time to be working on voice user interface design.