
This year, every major phone manufacturer has shipped broken software.
Software that gets things wrong, doesn’t always work as expected, and can even endanger lives.
This is oddly similar to the launch of Apple Maps back in 2012, where all of the above was applicable. But instead of consumer outrage and swift apologies, generative AI is evangelised as the next big thing.
The original Apple Maps was varying degrees of broken. In some cases, towns and cities were misspelled. In others, they were missing entirely. One of the most egregious errors happened locally: Mildura was located 64km away from its actual location, in the middle of a national park. Australian police called this a “potentially life-threatening issue”.
In a statement, police said that some motorists located by police had been stranded for up to 24 hours without food and water.
Precedent set by Google Maps suggested we should be able to trust a navigation app. Apple broke that trust, and its navigation app is still often maligned, despite being very good now.
Jump forward to 2024, and every flagship phone ships with software you can’t trust. Generative AI has been the marque feature for the Galaxy S24 family, the Pixel 9 family, and the iPhone 16 family.
While generative AI can be mostly harmless in the contexts of photo editing, transcription, or translation, there are times when it simply makes stuff up. I don’t think it’s acceptable to ship software that misleads users. Slapping on a beta tag doesn’t make it better.
Apple, Samsung, and Google all have various levels of guilt here.
Despite being low-stakes, Apple Intelligence has still demonstrated the potential to get it wrong in a very big way.
Notification summaries might seem like a beige use of generative AI, but Apple found itself in hot water with the BBC a few days ago. An Apple Intelligence summary claimed Luigi Mangione (the alleged killer of UnitedHealth’s CEO) shot himself. He had not.
On a more personal note, Apple Intelligence summarised messages from my partner and told me that our house had flooded and that the dog was in trouble. Thankfully I knew this wasn’t the case, but in other circumstances, that would be a troubling notification to receive.
Sure, expanding the notification summary to show the actual messages can clear up misunderstandings, but you shouldn’t have to do that. Why add additional labour?
Samsung also has the summary problem, but on a larger scale. While Apple Intelligence summaries are very short - which in some cases, can reduce the amount of inaccuracy - Galaxy AI summaries are quite a bit longer by default, and have customisable length. This creates a lot more room for error, and I saw a lot of them when testing the Galaxy S24 Ultra. That’s not great when Samsung has actively suggested students should use Galaxy AI summary tools to quickly catch up on readings they didn’t do.
Google is the worst offender, thanks to Gemini. While Gemini isn’t inherently a mobile phone issue, the chatbot is advertised as a flagship feature for the Pixel 9 family. As with all large language models, it’s unreliable and has a tendency to straight-up lie.
During a briefing for the Pixel 9 family, Google suggested you could show Gemini a concert poster and ask it to tell you if you were free that night. When I attempted that, it told me there was simply no way to tell if I was free, because there were no dates in the poster.
More problematic is Gemini’s tendency to fabricate. While generative AI can often do an okay job when you’re looking for a broad answer, it’s very bad when it comes to specifics.
For example, whenever I ask Gemini about the best mobile plans, it regularly makes up plans and discounts that don’t exist. If I made up phone plans, I’d have telcos on the line and readers blowing up my inbox. And given WhistleOut is a comparison site, it would be the kind of thing that could (rightfully) attract unwanted attention from the ACCC. But it’s cool when Gemini does it?
Gemini does come with a disclaimer: “Gemini can make mistakes, so double-check it”. That’s a bit of an understatement. My bet is very few are fact-checking outputs from Gemini. What’s the point of using it if you need to double-check anything it tells you? And from my experience, users aren’t even aware of how big the issue is. I’ve chatted with people who use generative AI for creating summaries at work, and they seemed perplexed when I asked if they’re worried about the tech getting it wrong.
This is pretty concerning, because Google says seven of the top 10 uses for Gemini in Australia are around "getting things done". Tasks ranging from information seeking to "looking for academic help". The kind of stuff where you’d want and expect accurate information.
To come back to the original Apple Maps, it was pretty obvious when it got it wrong. You simply wouldn’t end up where you wanted to, and Google Maps was there as a clearly superior alternative.
There’s no such thing with generative AI. It’s all-new, and on a technical level, it’s really impressive. On a product level, why are you shipping software that tells your customers it's safe to put glue on pizza?
While the glue example should be obvious to most, there are cases where you need to be a subject matter expert to tell when generative AI is lying to you.
Maybe generative will get better, maybe it won’t, but right now it’s a confident misinformation machine. That’s bad for a number of reasons (such as the political climate), but even if we ignore all that, it’s bad for users.
Let’s be generous and say generative AI is right nine out of ten times. Is that good enough? I don’t think so. If your GPS app took you to the wrong place on every tenth trip, you’d rightly be mad.
In the 2024 class of flagship phones, generative AI isn’t some optional experimental feature you can opt into. It’s the headliner, plastered across billboards.
It shouldn’t be controversial to say a phone shouldn’t lie to you. It wasn’t okay with the original Apple Maps. It isn’t now. Many generative AI features simply don’t work as promised, and foisting them upon your customers who depend on your tech shows a lack of respect for users in the pursuit of shareholder value.
Sign up and get money-saving deals, telco news and more.
These are the most popular NBN 50 plans with WhistleOut users this week: