As a development out of Virtual Reality, Augmented Reality and Mixed Reality have been around for a considerably long time.
A lot of experimenting has been done since Ivan Sutherland’s 1968 Head-Mounted 3D Display System and the first VR prototypes by Jaron Lanier.
Mobile phone games, like Ingress (2013) and PokemonGO (2016), were based on real-life locations utilising geo-tags. These games were played outside and inside, implementing real-life locations on earth and they added 3D elements to reality, creating virtual things in real landscapes.
As early as the 1980s, we saw several sci-fi movies that were pushing the cultural perception of Human-Computer Interfaces. These early 1980s and 90s movies were introducing bright, often blue and orange UI elements on a HUD-like screen, where the real world is perceived, analysed and enhanced with data in real-time.
Interestingly, this look of blue, green and orange lines on dark, often half-transparent displays, has not changed much in 40 years. The resolution has improved and today’s mockup displays in movies are a lot busier than they were in the 90s. But overall the look is still strongly retro-inspired, from the early days of computing in the 1970s (TRON look).
The 1995 movie “Johnny Mnemonic” explored VR elements when everyone’s idea of VR was that it would strongly distinguish itself visually from reality.
User Interfaces in “The Matrix” are not showing Augmented Reality. The entire world is a VR universe created by a computer and these old-school displays supposedly represent “reality”. These UIs were deliberately retro-styled, with green and blue fonts and lines on dark screens. This kind of perception, that real high-tech in the future must be extremely user-unfriendly, has persisted in movies ever since.
“Minority Report” used gloves as a device to interact with virtual displays, which were blending computer data with real-life views in a mixed reality environment.
The interfaces in “Oblivion” presented extremely dense data sorted in optical boxes. It’s an example of the UI where everything is visible simultaneously, thus, actually diluting information input and making it harder to navigate through decision making.
Another problem with the UIs in “Oblivion” is the practicality when screens are superimposed over real-life images. These displays are military jet fighter-HUD inspired, but they also show the problems of superimposing UI elements on real-life views with a changing background and brightness iterations.
The User Interfaces of “Iron Man” were mostly completely out of touch and didn’t seem to make any sense or serving any function at all. Which was perhaps the point to highlight Tony Stark’s genius with mystery. Again, we see predominantly blue and orange lines on dark background patterns on HUD interfaces.
“TRON Legacy” made no exception to the blue- and orange look. Taking a cue from its 1980s predecessor, the entire movie is based on glowing blue and orange lines.
“The Blade Runner”, 1980, was one of the first films to introduce the iconic “glowing lines on dark-background” look.
“Blade Runner 2049” introduced a patina and an even stronger 1970s retro look, reminiscent of the first Atari arcade games. But these screens are even more obscure, more blurry and made it even harder to identify information or purpose of any given element.
Of course, these UI screens look somewhat attractive and impress by making processes look highly sophisticated and complex. Part of the fascination with them is that we don’t really get what is going on.
But in real life, such complexity would be ultimately overwhelming and throwing all known data at the user wouldn’t serve a good purpose in real-life circumstances – independently of which period it is.
Billions have been invested in research by the military worldwide. Pilots have been flying fighter jets with HUD helmets for more than 20 years. But in our culture and public perception, we are still in a stage where not many people understand the differentiation between virtual, augmented, mixed or blended reality.
More than most changes affecting our interaction with computing devices, Augmented Reality is a transition from the flat, screen-based interaction towards an emphasis of the flow state, driven by a steady exchange between the user, their surrounding environment and the device between them.
There is this challenge ahead of how we will develop the language of User Interfaces in Augmented Reality environments. Pop-cultural influences aside, there are true needs in learning and a better understanding of what happens in real-time perception, analysis and decision making. These are hard UX questions that need a lot more investment in research and development.
And yet, relatively little is known about the User Experience of Augmented Reality. The Nielsen Norman Group summarises in 2016: “AR technologies can impact the user experience by decreasing interaction cost, cognitive load, and attention switching.”
The most intriguing part of Augmented Reality user interfaces may be the intersection of what happens between the user’s perception and the device augmenting their reality. This is perhaps also the most challenging part for the development of modern-day, functional displays.
It is more about a mindset and fulfilled expectations in a flow state, rather than the User Interface and what happens on the screen itself. How the augmentation of reality works with users, how it can assist the user in fulfilling their goals and meeting their expectations is the main question UX in Augmented Reality should evolve.
The UI of a “reality augmenting” device is not just a crutch. It becomes the mediator of time, space, expectations and events.
The true interface between reality, device and user lies in the user’s mind. As developers of UX for the augmented reality, it is our job to look at the user’s perception, cognitive load, assumptions, learnings and expectations. It is our job to meet the user halfway, or all the way, to converse with them, to anticipate what the user wants.
In his book “How Real is Real?” (1977), professor Paul Watzlawick, a philosopher and psychologist, describes reality as something that is only rooted in our perception. We have no proof outside our perception that reality exists, he argues, we can only rely on what we perceive and that may not be an accurate representation of reality in itself.
The human brain is built for decoding perception. From hunting and gathering to evaluating our surrounding and assessing danger, we have an evolutionary history of making sense of our surrounding. Our minds are constantly using the previously learned to further decode and interpret the world around us.
Unfortunately, because our brains and minds prefer the linear narrative, we are also prone to deception – when something is making a thing appear different than it is, which may undermine our ability in sound decision-making. From optical illusions to the cognitive overload of information, we are very vulnerable to interpret something in a way that may have catastrophic results.
The line between enhancing and enriching our perception of reality and diluting it or overloading it with superfluous information is blurry, but it is very important to consider when creating an AR based HUD interface.
Driving a car and interpreting locations, distances, events falsely or too late may lead to accidents. Our understanding of reality is heavily influenced by bias and the cognitive load – how much of “reality” we can process at a given time.
Ideally, HUDs and Augmented Reality could support us by providing better guidelines in real-time to assist us with time-critical decision making.
The perception of our surroundings changes steadily. Especially in a car, as a driver, we are making new decisions based on exterior input every moment while we’re driving. We see street signs, other cars and objects, buildings and landmarks, and everything is registered as input to make instant decisions.
A device employing AR should help us to respond contextually to new external information and account for changes to our environments. We should be able to use it with minimal to no input while driving. And it shouldn’t get in the way or restrict us from operating the vehicle.
Screen graphics in movies have injected a retro-inspired style into pop culture. However, in regards to usability, these User Interfaces are utterly useless. They are mostly obscure and the displayed information is not identifiable. Most obviously, virtually all information is present at all times – which is opposing the advantages of an adaptive, conversational and anticipatory UI, which is in steady flow and exchange with the user.
Augmented Reality interfaces today represent more closely “non-command user interfaces”. Tasks are accomplished with the help of contextual information collected by the computer system, not only through commands explicitly provided by the user.
Our immediate surrounding is full of tactile interaction points: a door grip, a seat adjustment, a steering wheel, the ignition button, the rear mirror – these are just a fraction of real-life cues and interactive points we use to navigate our way throughout a user journey, trying to fulfil our intent.
When it comes to interaction with user interfaces, humans are more capable to adapt than we might assume. The developments of the past couple of years have shown how generations of users grow up with different environments, from Windows to macOS and iOS to Android, and they quickly pick up the variances of these different operating systems.
Humans have the ability to comprehend through learnings, we can program our brains with learned experiences. We are not only basing our learnings on what we already know but transcending to the unknown and making conclusions, solving issues we have never encountered before.
This is why video games are easy to learn, even though they all follow different rules and have not such a unified surface like the expected patterns of an operating system. Each game makes using the UI a game in itself. And that is the same driver of what happened with other interfaces as well.
We can’t make things too hard for users. They need a chance to learn and adapt fluidly, as they are going. Micro-rewards in form of confirmation and showing the user is getting somewhere is building trust, it’s making the user feel they are on the right path and making progress.
Particularly HUDs with user interface elements aiding the driver by adding information to the driver’s view, it is difficult to keep the look and feel balanced, as light may shine through the window, or light might interfere with the bright UI elements on the HUD. Basically no chosen colour can counter the impact of direct light shining through the shield window into the vehicle.
With a user interface in a practical AR application the challenge lies in making the UI blend into the environment, at the same time to pick up on real-life cues and enhance them, as well as stand out when it requires the user’s attention.
This is a
Intent is born out of an expectation: A person expects to get training shoes in a sports clothing or shoe store, so going there with the intent to get a pair of trainers is a direct result of the person’s set of expectations. But beyond “getting new shoes”, the expectation set may be more varied and complex: a certain, but vaguely defined style, a certain array of colour preferences, an overall look inspired by previously owned models, what friends or spouses are wearing, etc.
Expectations are always a precondition for intentions. Expectations are informed by what we already know, the experiences we’ve made – but also by advertising, by user tutorials, how the store presents itself, its reputation and the experiences other people might have had – hear-say, comments on social media platforms, video- and blog reviews…
In short, expectations are in a steady flow and a simplified result of complex input from various sources. User expectations are, by nature, strongly directed by bias. To know what we want, we first need to form an opinion and this leads to bias, the idea that something must be as we expect it to be, with little room for differences.
But we are still able to accept and even anticipate differences to our expectations based on what we’ve learned because we have learned a language of what makes our expectations work.
The same principle applies to user interfaces: If my expectations are fulfilled and the UI guides and assists me in achieving my intent, over time my trust in this application will grow and I will rely on it to fulfil my expectations.
Google Glass was the first HUD User Interface based device posed to hit the mass market. Due to the small market adoption and a disastrous marketing campaign, the product was removed from the market, quickly after it had been introduced.
The main problem was, people couldn’t imagine how and why to use these things in public. Google Glass didn’t seem to tightly integrate into life and given the low capabilities of AI and machine learning at the time, no substantial use case was convincing enough to persuade an audience to invest in this gadget.
As it nearly always is the case with innovative products, it did attract an audience of geeks and first-adopters. Feedback from this testing audience proves to Google that the product was not ready yet for the market and people had not been waiting for glasses they are wearing all the time, every day.
While driving, users looking at a HUD display have to decode both, the depiction of reality in front of them (the street and elements surrounding or moving on the street), as well as the information added to the view (information, data points, questions, decision points, assisted wayfinding and guidance).
So adding information in a steady flow by augmenting reality means an increase of cognitive load. It is crucial to balance out the flow of what can be processed in real-time interaction with reality and the HUD user interface.
If we are moving forward while looking through a HUD user interface, we are experiencing events on a timeline. Our brain is already tuned to accept and process incoming information in rapid flow. It is constantly challenged to output reasonable perception and decision making.
Throughout time (the timeline as we are moving forward) and space (the space in which we move), our interaction is never static and screen-based only. Most of the processing happens internally in our mindset.
A conversational UI asks the user questions, based on intelligent assumptions or guesses about what the user might want to know or do next. Chatbots have been trending in customer support, but they are just the tip of the iceberg of actually machine-learning-driven conversational user interfaces.
A conversational user interface frees the user of thinking about correct input or further processing. It allows users to act largely naturally and humane, and not have to adapt their input to match a given format.
Part of this is enabled by adapting to the user’s needs and intentions. The virtual assistants Alexa and Google Voice cannot only understand initial questions but draw context out of further questions within a limited set of variables, based on machine learning and assumptions about contextual relevance.
Another example of adaptive user interfaces are feeds, such as those applied by Netflix, Spotify, Instagram and Facebook. These feeds change constantly with every interaction you make. They don’t converse with you directly, but they have machine learning algorithms driving what comes up next on your feed, based on several variables you consciously or subconsciously influence. For example, if you are watching a video, depending on how long you are staying to watch, this will influence further items presented to you in the stream.
An adaptive interface is the first step, it changes according to your actions and decisions. One step further goes the anticipatory user interface: it cannot only adapt but more accurately foresee what your next interaction or decision might be like.
Anticipatory user interfaces are the holy grail of the upcoming revolution in AR interfaces. Users will experience interfaces which seem to have an almost magical, empathic quality to them because they will be able to present information not only upon request but even before the user thought of it.