
Commentary: If the future of interaction is via voice, it risks excluding many of the same people that companies need to buy into AI.
Jeff Carlson Senior Writer
Jeff Carlson writes about mobile technology for CNET. He is also the author of dozens of how-to books covering a wide spectrum ranging from Apple devices and cameras to photo editing software and PalmPilots. He drinks a lot of coffee in Seattle.
Expertise mobile technology, apple devices, generative ai, photography
3 min read
The future is becoming awfully chatty, which is bound to make some people uncomfortable.
At Google's recent I/O event and Apple's Worldwide Developer Conference (WWDC), many new features involved interacting with AI by talking to it through your phone (or devices such as smart glasses, in Google's case). And with the new Siri AI, we also saw Apple presenters chatting up their iPhones during WWDC keynote, explaining all the new ways people can interact with the virtual assistant.
This push toward a more voice-focused future sounds like progress, but it assumes that everyone is comfortable thinking out loud, which could further alienate people who may already be wary of AI.
One of the more notable AI advancements in recent years has been the capability to interact with large language models in a conversational way. We've moved from issuing direct commands to responding to loquacious replies from AIs that feel like they're trying too hard to also be your best friend.
In fact, one of the heralded achievements at Google I/O was Gemini's ability to parse our fragmented human speech patterns, including all the ums, ahs and broken sentences, to figure out what we're really saying. I can almost imagine a patient but frustrated AI waiting with a "just get to it already" expression on its virtual face.
But that's the point, isn't it? It's already easy to think of Gemini or Siri (or primarily via text using Claude or ChatGPT) as individual entities and approach them the way I'd talk to a friend while strolling along the sidewalk, bouncing ideas back and forth.
The difference is that when chatting with AI, I'm standing in public talking to myself.
You can argue that this isn't a big deal now. It's commonplace to see people on calls in public wearing Apple AirPods or other wireless earbuds. We've normalized the body language and specific pause-and-reply interaction of someone talking on a call without actually holding a phone up to their ear. Even if we don't see earbuds, we assume that's what they're doing. It wasn't too long ago that taking a mobile phone call in public was considered rude.
But not everyone is so verbal. As a writer, I've taken stabs at using dictation (including a spell where a broken collarbone didn't leave me much choice), but it's always been more natural to make words through my fingers. Speaking and writing are two separate disciplines, even if they share language.
Using one's voice as an interface is great for stage demos, but in many contexts, it's the better (or only) option: Hands need to stay on the wheel while driving, and smart glasses don't have keyboards. And for people who aren't able to easily view screens, I imagine voice recognition and conversational LLMs are genuinely helpful.
This is also a social problem. It's bad enough when people use speakerphone in public for calls (too often with topics that should be private) with no regard for the people around them. Now will everyone need to be subjected to their party planning or attempts to secure a restaurant reservation? It's a further erosion of respect for people around us.
And it throws up another barrier against actual communication. If you see someone wearing an amazing outfit, you might politely ask them where they got it. Now, you can snap a picture and ask AI to identify it -- losing out on a moment of human connection and simultaneously looking like a creeper sneaking snapping a photo.
Norms do change with technology, so I'm sure there will be a level of (begrudging) acceptance of people chatting to seemingly no one as they interact with their devices.
But are we headed for a world surrounded by overlapping conversations where no one is talking to each other? Yapping at our phones, watches, glasses and AI pins sounds like a lot of noise at a time when people are already burnt out on AI.
JEFF CARLSON
Senior Writer
Jeff Carlson writes about mobile technology for CNET. He is also the author of dozens of how-to books covering a wide spectrum ranging from Apple devices and cameras to photo editing software and PalmPilots. He drinks a lot of coffee in Seattle. See full bio


