What happens when AI stops being a tool you type into and starts becoming something you talk to? In this episode, Galen Low sits down with Oliver Shoulson, Agent Design and Engineering Lead at PolyAI, to unpack the surprisingly human problem at the heart of conversational AI: most AI conversations still feel weird.
From clunky chatbot scripts to overly polite “LLM voice” personas that sound like they were trained by a committee of HR robots, Oliver explains why good conversational design is less about mimicking humans perfectly and more about removing friction. The conversation explores the psychology of trust, the mechanics of social presence, and why the future of AI interfaces may depend less on visual design and more on understanding how people naturally speak, interrupt, hesitate, clarify, and collaborate.
What You’ll Learn
- Why conversational AI succeeds or fails based on social presence and trust
- The hidden design flaws that make AI interactions feel uncanny or frustrating
- How timing, interruptions, and conversational pacing shape user experience
- Why voice AI creates different customer behaviors than text-based chatbots
- The role of pragmatics and shared context in natural conversation design
- Why “helpful” LLM behavior often undermines trust instead of building it
- How businesses are using branded AI voices to extend customer experience
- Why language remains one of the hardest problems in AI despite recent breakthroughs
- What visual designers need to rethink when designing conversational interfaces
- How AI-to-AI communication could evolve beyond human language patterns
Key Takeaways
- Most AI conversations fail because they make users work too hard
Good conversational design removes the burden of navigating the interaction itself. As Oliver explains, users shouldn’t have to reverse engineer an AI system’s menu logic or conversational structure just to solve a simple problem. The best AI interactions let people rely on instinctive conversational habits instead of forcing them into rigid workflows. - Voice interactions create trust differently than text does
People still pick up the phone because spoken conversation creates a stronger sense of “social presence.” Real-time dialogue increases trust, adherence, and confidence in the interaction. It’s not nostalgia. It’s cognitive wiring. - The uncanny valley of AI is often linguistic, not visual
The problem isn’t just synthetic voices or robotic phrasing. It’s when AI violates subtle conversational norms humans barely notice consciously. Over-explaining requests, narrating obvious steps, or ignoring shared context all break the illusion of collaborative conversation. - “Helpful” AI often sounds patronizing
Large language models are trained to be broadly safe, agreeable, and explanatory. But in real conversations, too much explanation can imply incompetence or distance. Sometimes the most natural interaction is simply: “What’s your account number?” Not a six-line disclaimer about why it’s needed. - Timing matters more than most teams realize
Human conversation moves fast. Delays longer than a few hundred milliseconds start to feel unnatural. Designing conversational AI means accounting for interruptions, pauses, filler language, and turn-taking dynamics that humans process subconsciously. - Real call center workers already contain the design data teams need
One of Oliver’s strongest recommendations: spend time shadowing customer service teams. Listen to how people actually ask questions, interrupt, clarify, and solve problems. Most organizations already possess the conversational patterns they’re trying to recreate. - AI doesn’t need to fool people to feel natural
The goal isn’t deception. It’s reducing friction. Natural conversational design allows users to focus on solving their problem instead of figuring out how to interact with the system. - We may be approaching the limits of brute-force language models
Oliver argues that today’s LLMs represent an engineering breakthrough, not a full understanding of human cognition. Simply scaling models larger may eventually hit diminishing returns, especially when it comes to reasoning and contextual understanding. - Future AI agents may not talk like humans at all
As AI systems increasingly communicate with other AI systems, human-style conversational norms may disappear entirely. The polite back-and-forth we associate with language could ultimately be more about human cognition than optimal information transfer.
Chapters
- 00:00 — Why AI conversations feel weird
- 02:33 — AI, websites, and navigation
- 06:30 — Voice vs. visual design
- 10:33 — Where voice AI works today
- 13:35 — Fixing “LLM voice”
- 16:37 — Why people still call
- 20:06 — Social presence and trust
- 22:12 — Conversational ethics
- 25:38 — Common AI design mistakes
- 31:06 — Timing and pacing in conversation
- 33:40 — Why over-explaining breaks trust
- 40:16 — Natural conversation design
- 42:22 — Why language is hard for AI
- 46:53 — Designing for voice interfaces
- 51:06 — AI talking to AI
- 54:11 — The future of AI hardware
Meet Our Guest

Oliver Shoulson is the Agent Design & Engineering Lead at PolyAI, where he helps design and develop advanced conversational AI systems that power natural, human-like customer interactions for global enterprises. With a background in AI engineering, dialogue systems, and user-centered design, Oliver specializes in bridging technical innovation with real-world usability to create scalable voice and chat experiences. He is passionate about the future of conversational AI and how intelligent agents can transform customer service, operational efficiency, and human-computer interaction.
Resources from this episode:
- Join the Digital Project Manager Community
- Subscribe to the newsletter to get our latest articles and podcasts
- Connect with Oliver on LinkedIn
- Check out Oliver’s website
- Visit PolyAI
Related articles and podcasts:
Galen Low: Hi, I'm Galen, the Digital Project Manager's Agentic AI-powered virtual podcast host avatar. Just kidding. But how much would you wanna bet that the way you experience this podcast would be different if you thought I was just an AI? That's the topic we're diving into today, the principles around what makes a conversational interaction with AI good versus the unhelpful, rage-inducing, uncanny valley anti-examples that we see come up as memes in our Instagram feed.
The fact of the matter is that many of us are being asked to create agents and AI teammates that our colleagues and stakeholders will interface with, and not just text-based tools, voice-based ones too. But if what we build adds too much friction to the experience, those tools could quickly find themselves gathering dust on the shelf of failed AI experiments.
So to help us start designing better interactions with AI, I've brought in an expert working at the forefront of conversation design. Together, we're gonna unpack the ROI of creating humanistic, friction-free voice interactions with AI, explore where that design effort starts to encounter diminishing returns, and reveal how some tiny tweaks can help you move away from that overly helpful, people-pleasing LLM voice that humans are beginning to hate.
Hope you enjoy the episode.
Welcome to The Digital Project Manager Podcast—the show that helps delivery leaders work smarter, deliver smoother, and lead their teams with confidence in the age of AI. I'm Galen, and every week we dive into real-world strategies, emerging trends, proven frameworks, and the occasional war story from the project front lines. Whether you're steering massive transformation projects, wrangling AI workflows, or just trying to keep the chaos under control, you're in the right place. Let's get into it.
Okay today, we're diving into the role that conversation design plays in how we build trust with AI. We'll be talking through the biggest mistakes that teams make when designing conversation-led AI experiences and how to fix them. We'll be nerding out about why language and AI are somewhat complicated bedfellows, and we'll be making some predictions about how AI will talk to other AI in the future.
With me today is Oliver Shoulson, Agent Design and Engineering Lead at PolyAI. Oliver is a linguist and conversational AI thought leader whose work sits right at the intersection of language, product design, and AI. During his time at Yale, he focused on syntactic variation, studying the way speakers construct and arrange sentences differently within the otherwise structured rules of language. And that's become hyper-relevant today, as Oliver uses his role at PolyAI to help teams move beyond clunky scripts and uncanny valleys toward voice interactions with AI that actually feel natural and useful.
Oliver, thanks for hanging out with me today.
Oliver Shoulson: Thank you so much for having me. It's really great to be here.
Galen Low: This is a really interesting topic to me. I loved our conversations leading up to this because I got to nerd out a little bit. Like, I studied a little bit of, like, English grammar and some linguistics in university.
It was something that always, like, captured my attention. I went a different way. I went into film studies. But as we've kind of walked down this path of, like, AI and chatbots, all of these interactions that are more fluid and they're not, like, this rigid syntax, I've gotten really, like, back into it. So as soon as your name came up, I was like, "Gosh, I gotta talk to this guy."
So I appreciate it. I do hope we go all over the map. There's plenty of rabbit holes that we can dive into, but I am a project manager, so just in case, here's the roadmap that I planned out for us today. To start us off, I wanted to just, like, set the stage just by hitting you with, like, a big, hairy question that my listeners want your take on.
But then I thought of maybe, like, I'd zoom out and talk about three things. Firstly, I wanted to talk about why voice interactions with AI matter and how they're making a tangible impact to businesses and users today. Then I'd like to step through what makes a conversational interaction good and how any teams building an AI agent or an AI teammate can avoid that uncanny valley that makes users cringe.
And lastly, I'd just like to get your take on the future of interaction design and whether visuals might start maybe playing second fiddle to conversational language. How does that sound to you?
Oliver Shoulson: Sounds great to me.
Galen Low: Awesome. Well, let's get into it. I thought I'd start off by asking one big, hairy question, but I'm gonna, like, take a running start at it because, like, my network, the circles I travel in, it's a bunch of people who are digital professionals by trade, folks that, like, build websites or digital marketing campaigns and, like, digital transformation strategies.
And so our ears have been perking up when, like, the headlines have been saying that websites as we know them are about to change forever, thanks, at least in part, to the way AI has been embedded into the web browsing experience with things like AI overviews and AI browsers and AI copilots, et cetera, et cetera.
So I thought I'd ask, how far are we from an internet where conversation replaces navigation, and what might hold us back from going there?
Oliver Shoulson: Yes. I mean, I think there's 100% going to be ways in which the way we're designing webpages and the internet as a whole is geared toward making it navigable for AI agents.
Like, I think that's inevitable, and that's probably a good thing. That's gonna ultimately increase the efficiency with which our agents and assistants can get tasks done for us. That said, I don't necessarily know that I want or anyone that I know wants an internet that you only talk to and never see.
I guess I would say that, you know, I think the task of designers, of interface designers, of interaction designers more generally, whether you're talking about conversation interfaces or visual interfaces, is to use our understanding of the cognitive biases and cognitive architecture that humans have for interacting with the real world to then make intuitive interfaces for interacting with the digital or virtual world.
So, you know, for a long time, we've had really clever graphical user interface designers who found ways of basically appealing to and simulating, like, physical world interactions on a webpage in a way that made those navigations intuitive. So, like, slider bars and switches and buttons and these things that we're used to interacting with physically and that we have, you know, cognitive architecture already in place to teach us how to navigate the world in that physical way, we can then just sort of apply to this digital interface, and then it becomes really intuitive.
We don't have to learn a whole bunch of new stuff for how to interact with it. And so I think that with the capabilities of conversational interfaces, we're really, like, doing the same thing. We're just appealing to a different set of cognitive architecture, which is our linguistic faculties, which until very recently, we didn't really have a way to simulate at all realistically.
That's been sort of the big breakthrough of large language models is that even if they are these sort of massive churning probabilistic engines, they can, with enough data and enough training and enough supervision, produce what feels like natural human language for the first time. And so that just sort of gives us another avenue of a way to interact with something, but I don't think there's anything inherently better about conversation than graphical interfaces.
Like, I think they just appeal to different parts of the human experience.
Galen Low: I really like that tie-in with, like, real world things, and, like, I abstract it away. I'm a digital guy, so I'm like, "Oh, yeah, buttons and, like, you know, radio buttons, checkboxes, and sliders." I'm like, I haven't even equated them recently until like, oh, yeah, of course, we have those in real life too, like physical things, switches and things that we, like, interact with that way.
And then agreed. I think it makes tons of sense then to be like, "Okay, well, we already have this language faculty, and we can use it as part of an experience to interact with something." But it's not necessarily going to be better or it's not going to replace necessarily. It's just that we have these tools, like, available to us.
Oliver Shoulson: Yeah. You know, I consider myself a very visual person. I'm sort of a visual artist in my, like, other life outside of work.
Galen Low: Oh, yeah.
Oliver Shoulson: And so I would be really sad actually if, like, the world of webpage design and graphic design, like, started playing second fiddle to conversation. That's certainly not my goal as a conversation designer, and I hope that's not anyone else's goal.
I think that those are both beautiful things that require unique and interesting skill sets, and I hope that they both continue to thrive.
Galen Low: What I'm thinking about is, like, our sort of anxiety goes into, like, okay, but websites, like, people aren't visiting them anymore. Like, you know, part of the economy around a website is that, you know, there's clicks, people visit the actual site itself, see the interface.
You know, there's talk right now of the AI overviews or some other mechanism kind of being like a, gosh, like an in-between website that is not under the control of the website owners that may have a different interaction design than the intention of that business, which may involve some of the conversation because you might be addressing, you know, Gemini or, you know, our LLM of choice to sort of ask a question.
I guess maybe I thought I'd go there just a bit, like, off the cuff, but you are a visual person. You know, you appreciate visual design. You are a linguist. You appreciate conversation design. And then there's that intention of, like, someone who's created an experience, and then there might be an intermediary that abstracts that into something else.
Is that okay? Is that a bit of a compromise? Could that be a good thing?
Oliver Shoulson: Yeah, I don't know. I actually haven't thought about this a ton, so this is also totally off the cuff for me. I would hope that, like, this becomes that sort of in-between layer becomes sort of a standard part of, you know, branding and web development for businesses so that they get to exercise a certain amount of control over that as well.
Like, you know, we'll get into this, I'm sure, but one of the things that businesses are really excited about when it comes to designing a conversational interface, as they have been doing for their visual interfaces forever, is this opportunity to develop and present a unified brand identity. And I totally hear you that the worry that somehow your brand identity is getting lost or abstracted or mischaracterized in that in-between layer, and then, I don't know, kind of either gets disappeared or mutated into something you don't want it to be, I could see that as being a real concern.
I would imagine that, you know, agencies that are building websites for companies or building digital presences for companies, like, are looking into ways that they can, you know, continue to exert control over that sort of AI summary or AI-targeted in-between layer that you're talking about.
Galen Low: I love that.
It's like the brand voice is part of it. It's part of the experience. It's literally we're asking an assistant or an intermediary to be like, "Can you read me the site? Just summarize it, you know, just tell me, you know, what buttons are there." And we're not getting, like, the actual experience as it was designed.
We're getting information which, you know, may be fast and convenient, but, you know, definitely not something that is maybe even as intentional as if it was the experience from the company, which maybe actually is a good segue. Unintentional segue, but I wonder if maybe we could zoom out a little bit because, you know, your work at PolyAI, it revolves around voice-driven AI user experiences, and in some cases, like, that is standing in for having someone being around to, say, answer the phone at, for example, like a restaurant or a bank or a health clinic.
I'm just wondering, like, what are you seeing at PolyAI in terms of where voice AI is delivering real tangible value for businesses and customers today, and how is that experience, like, fundamentally different from maybe just chatbot experiences that we've seen over the past, like, decade or so?
Oliver Shoulson: You can ask any contact center manager, contact center leader, customer experience leader in, you know, a variety of verticals, and they'll tell you, they'll tell you that people still pick up the phone.
Now, like, you might not think that's the case. You might know a lot of people who hate making phone calls. I certainly do. But the phone is absolutely still, like, a primary channel for customer support. And so sort of as you hinted at, one of the things that, one of the big pieces of value, obviously, that- clients are seeing with the implementation of this kind of voice solution is the ability to address, you know, heavy seasonality, volume shifts, and just general churn in contact center work.
So for example, one of our longest standing clients is a big retail client, and obviously their big season is around the holidays, specifically Black Friday when everyone's buying their Christmas gifts. And since working with us, for the first time ever, they were able to give their call center workers Black Friday off in addition to Thanksgiving.
Galen Low: Oh, wow.
Oliver Shoulson: So that kind of thing, you know, brings value not just to the client and to the customer, but also to those contact center workers. You know, I think about my work at Poly as supplementing human contact centers, not necessarily replacing them. I think that there will always be problems or needs that customers have that require a human touch.
And so I think that what most of our clients want and what we want to provide is a way of automating and reserving those contact center resources for those really crucial human touch necessary complex user intents and requirements, even if that's the vast majority of the actual things that people call in with.
You know, the majority of people call in with the same 20% of the problems. And so if you can automate that 20% of the problems, that's a huge boon to these contact centers And then, you know, the other thing I would say, which is what I also touched on before, is that voice more so than text definitely gives you this opportunity to develop a branded experience where you don't feel like you are suddenly subordinating the persona, the attitude, the identity that you've spent so long developing as a brand to, you know, the kind of LLMEs- Right.
the generic, extremely passive, and kind of flavorless dialogue that you can expect from a kind of generically wrapped LLM in a chatbot. So give an example of this. One of our clients is Fogo de Chão, which is the Brazilian steakhouse chain, really fantastic steakhouse, highly recommend. And they were really into this idea of actually cloning the voice of Selma, who's one of their longtime customer experience leaders at the company.
She's been at the company for decades. And so what we were able to do for them was actually recreate Selma as an AI agent and have her be the one answering the phones when people call to make, modify, cancel reservations, ask about rewards, things like that. And so by doing that, they were able to actually create, like, an extension of their brand identity that is not possible otherwise.
Like, they were able to basically give Selma, this one brilliant woman, you know, a million more voices than she could have had at once, and the ability to interact with a million more people than she could have had at once. So that's something that clients are really excited about, and certainly something that you can't get from text on a screen.
Galen Low: Well, it's funny because, you know, you mentioned the, like, LLM voice, and I think everyone listening probably is like, yeah. Like, we all kind of get what that means, and I know the technology is progressing, but, you know, like, coming back to what we were saying earlier, it's still this sort of its own persona that, you know, arguably starts quite, you know, sycophantic.
It's kind of like this people-pleasing... And it has a personality. It just might not be Selma, right? Like, it might not be this individual's personalized experience. And I think a lot of people listening would be like, "Yeah, you know, I am kinda sick of that, and I'm kinda sick of that LLM voice." I wonder, like, could you step us through it, maybe at a high level, in terms of, like, does Selma, like, come in and just, like, talk in a booth for a week straight?
Or, like, 'cause I'm just thinking of folks who are listening who are like, "Yeah, I wanna build, like, a Selma agent. Like, I don't want it to be like, 'Great. Your suggestions are awesome. Here's three things that I'd correct. Would you like me to go and do a thing now?'" How does it work to sort of, you know, get Selma's personality in there, and is it, like, an arduous task?
Oliver Shoulson: Yeah. Well, so for the actual voice, for like cloning the voice, it does require collecting, you know, a certain amount of training data and then training a custom voice model on it so that we can get like her accent and, you know, the way that she says and and like the little placeholders when she's speaking.
And then in terms of actually designing a persona, that's really where the conversation designer comes in and, like, works at the sort of prompting and retrieval and guardrail layer to actually exert control over what the model returns and, like, figure out its conversation style with the client. We have a lot of ways of exerting those, like different levers and dials that we can pull to try and get out of that LLMs, but that's another one of the things that makes it super useful to have custom-trained in-house models, which is what we work with as well, which is that we can make them especially sensitive to things like persona prompting and ways and conversation style and ways of actually conducting the conversation as opposed to, you know, only very sensitive to instruction following, which obviously of course is what we want as well, but otherwise falling back to that kind of sycophantic bland persona that we've come to expect from LLMs.
Galen Low: I like the 'cause usually at least in the conversations I've had, when we use the word guardrails, we kind of mean like, you know, like safety, ethics, things like that. And what I understood you to mean is almost like guardrails around how much personality to like try and extrapolate from the data that there is because yeah, I mean it could get into like safety, privacy, ethical things, but also even just like that was weird.
Like someone just overshared to me on the phone.
Oliver Shoulson: Yeah. Guardrails around the brand identity as well, right? Like, like obviously we don't want the model to exfiltrate personal information. We don't want it to, you know, hallucinate, say something that could get the business in trouble from a legal standpoint or ethically or whatever.
But you also want to protect that brand, that persona that the company has spent so long developing, and there need to be guardrails for that as well. Like, that's also something at risk when you put an LLM in the system. And so that's something that we take a lot of care and spend a lot of time making sure is in place.
Galen Low: I wanted to come back to something you said earlier. You said people do still wanna pick up the phone. I resonate with your sentiment. I'm like, that's kind of weird. I don't love the phone. It's just, yeah, for some reason I'm averse to it. It's unpredictable. It's kind of like I don't have enough control. I think the control freak in me doesn't like the phone, but like you said, people do wanna pick up the phone and you know, like I am definitely not the majority here.
I'm thinking of it as like, okay, well like why the phone? Like why hasn't the phone become like this anachronism like sending a fax? What do you think it is about people who do wanna pick up the phone and like talk to a human?
Oliver Shoulson: Yes. There's a concept in the human-computer interaction field. It's actually a very old concept.
It's like decades and decades old, and that actually developed around thinking about phone conversations versus face-to-face conversations, which is called social presence. And it's that term for that sense of being with another person, that sense of real-time togetherness. And it's obviously kind of a subjective thing to measure, but we do find there's tons and tons of literature out there that show that an increased sense of social presence is correlated with all kinds of other positive like customer experience and like business outcomes.
So if someone feels a greater sense of social presence in a customer service interaction, they're more likely to trust and adhere to the advice of the person they're talking to. And so that leads obviously to, you know, fewer bounce backs because people didn't follow the directions, and more satisfaction with the outcome of the conversation.
Generally, satisfaction is correlated to this sense of social presence as well. And also, you know, crucially, and this is a big thing around AI interactions, they feel more confident that the security of the information they've provided is being handled sensitively and with care. And so I would imagine, you know, also as someone who doesn't love phone calls, but I can, you know, empathize that like all of these positive experiences and outcomes that are associated with a greater sense of social presence has people seeking out that social presence in these interactions.
So, you know, I think about social presence as kind of a spectrum, you know, the highest degree being a face-to-face interaction with someone. Probably you and I, as we're looking at each other right now, this is probably the second best we can get, we see each other's faces. Over the phone would be a little less, and then, you know, texting each other is kind of the least that you could imagine.
So, you know, people are looking for ways that they can Find that feeling of social presence, I think, in these interactions.
Galen Low: It's really interesting thinking about the role of social presence, 'cause as you were saying that I'm like, "Oh my gosh, how many times have I, like, read an email and, like, not felt any sort of real-time accountability to it?"
So I'm like, I'm skimming it. Did I follow the instructions? I probably did it wrong. I'm like, "Okay," like, 'cause I'm not fully invested or paying attention, and I don't expect or feel like anyone's sort of, quote-unquote, like, taking care of me. Whereas with the social presence, it's real time. I do have this sort of like, you know, social decorum or bond or whatever, like, I should listen to this person and not just kind of like half listen and hope I understood the instructions right.
And also, there's that sort of realtimeness of like, okay, someone is here, you know, caring in some way, shape, or form for me. Like, something between caring and serving. Like it's like, okay, they're here to help, right? And I was just thinking about how that is a great reason for why good design in the conversation will actually create that social presence, because effectively it actually makes the interaction more effective, more efficient.
It's building loyalty. It, you know, it has trust built in. And yeah, the, what you said about, like, just chucking information into a chatbot, like, it does feel like you're like, "I don't know what they're gonna do with this data." You know? I'm, like, looking at my browser bar just to make sure it's, like, a secure connection like five times before I send something.
Versus, yeah, the, like- inbuilt trust of, you know, what happens when there is that social presence.
Oliver Shoulson: Yeah, and I'm glad you said the thing about design because it obviously is something that extends beyond just, like, the channel itself. Like, it's like, oh, okay, phone is better than text in terms of social presence.
But there are also things that we as designers can do in order to induce a greater sense of that social presence as well, and this is things like we try very hard to make it always feel that the agent is not reaching into a canned bag of responses and just, like, giving you ad copy verbatim or something.
Right. 'Cause that's a great way to make it feel, again, like this person is not with you in real time. Like, there's not a sense of rapport, a sense of real time, like, collaborative problem solving, which is what we want to foster. And LLMs have lots of ways in the way that they naturally, out of the box interact that sort of have the ability to undermine that sense of social presence, that make them feel kind of distant or removed or uncanny, sort of as you touched on before, where, like, the familiarity and affinity level that you feel for the agent is, like, not quite there.
So design plays a huge role in that beyond just the channel itself.
Galen Low: It's a good point. I mean, this is a lovely tangent, and I'd love to take us on because I think that a lot of folks forget that something like ChatGPT, these LLMs, are, like, designed to fit almost every possible use case that humans might throw at them, and these large tech companies have built it that way to sort of gather data and to increase uptake and adoption of this technology so it can progress.
But fundamentally, the LLM voice is going to be the safe voice for any situation. You know, if you're like, "Hey, I am," whatever, "I'm concerned about my mental health," or like, "How do I get a cheeseburger from this restaurant?" Like, is... They're both use cases that this technology is meant to handle for everybody, and of course it's gonna be this, like, safe, sycophantic, you know, voice that kind of makes you feel good about yourself and get that thing done, and you're like, "This technology is great."
But it was really funny to watch how quickly people were like, "Oh, I'm training my LLM to be a little mean," right? To me, right? Like, I don't want it to be nice, actually. I want it to be a little bit mean because, you know, they're the kind of people who necessarily need that sort of comfort and safety in the interaction.
But it's like, 'cause, like, one size fits every use case, every user, so of course it's a little bit, you know, vanilla.
Oliver Shoulson: Yeah, and it's so interesting 'cause I obviously would prefer that a large language model is, like, on the receiving end of abuse from an angry customer than a human. But at the same time, like, as a designer, and I think a lot of brands would agree with this as well, like, you kind of want the, even the LLM to be able to set boundaries with your customer.
Yeah. You don't wanna just have to listen to calls day in and day out of people yelling and screaming and cursing at an AI, and the AI just taking it. Like, that feels, I don't know, bad for the soul of humanity. Like, I think we should still be, you know, supporting good principles, and this is the closest I'll get, I guess, to a prescriptive kind of ethical stance in this conversation probably.
But I feel like we should still be fostering good principles of conversation beyond just, like, realistic conversation, if that makes sense.
Galen Low: And I totally agree because things we do in any channel or context, I think do influence what we normalize, you know? And, like, even I was gonna go there earlier because, like, you know, in some ways previously conversation-based interactions with technology have been very frustrating because you have your Alexas and your Siris and, you know, you're like, "Damn it, Siri.
No, that's not what I meant." And, like, you see people being really mean to these assistants, and it's almost like, whatever, like, YouTube comments, right? And we start normalizing this, like, you know, bullying or, you know, like, this bad behavior. It's bound to just seep into other stuff. So especially when everyone was like, "Oh, don't say thanks to ChatGPT because, you know, every time you do, like, there's not enough water to feed a village or something."
You know, like, there's this like, "Oh, okay, well, I guess we don't have to be polite." And we're like no, that's not what we're saying. What we're saying is, like, don't be wasteful with the technology because it does use significant resources." But I don't know, actually, I wonder your take on that. Should I be saying thank you to LLMs?
Oliver Shoulson: Also, I mean, as you're saying it, like, this is a sociolinguistics question that, like, I'm sure there's being research actively done on, specifically this question of, like, how the way that you interact with different kinds of virtual assistants either prefigures or influences or maybe just indexes the way that you interact with humans in the real world.
I feel like I've seen headlines about this, but I've not read, like, any actual good scholarship on it, and I'm sure it's being done, so I don't wanna just sit here and speculate while someone's doing their PhD on this in your own university.
Galen Low: It's totally fair, yeah. Like, the content I have seen has been a sort of inconclusive editorial thought pieces, but I agree with you.
Somebody is, you know, yeah, doing the research here specifically on this, which, I don't know, maybe this is a good segue as well because, you know, you are a linguist by training, and you're working at the bleeding edge of conversational AI experiences. So I imagine that you must see a lot of people, or at least you're, like, tuned in to notice people trying to develop this, like, natural, usable AI tools that, like, pass the cringe test.
But I imagine you're seeing a lot of folks are kinda, like, falling on their face a little bit as well. I'm coming at it from the angle of, like, especially in my world, project management or, you know, any digital industry. It's like, I mean, if you wanna stay relevant, build an agent, you know, build these things, build all these things.
We're not linguists, we're not AI people, and we're just gonna be like, "Go do AI." And sometimes the result's not good. So I'm just wondering, like, what are the most common design mistakes that you see happen when teams build AI teammates or AI agents, and which ones are the ones that create that sort of like, "This feels weird," reaction for users?
Oliver Shoulson: Yeah. So I'll start with some, like, I guess more boring stuff that I think kind of has to be in place in order for the more interesting nuanced stuff to make a difference. Okay. Which is, I mean, for one thing, as you and I have been demonstrating this whole time, spoken conversation is messy. It's rife with interruptions and self-repair, which is when, as you did before, like, you misspeak and you go back and correct yourself.
You know, typically in spoken interaction, people are talking over each other, interrupting each other, or, you know, at the very most, there's like at the level of two to three hundred milliseconds in between every turn. And so there's fundamentally this, like, computational problem, this technological engineering problem of like, well, if we're gonna have a call to a large language model in between every turn, and maybe there's also speech recognition and text-to-speech on either end of that, like that's a lot of computation to do in a very limited amount of time.
And, you know, it's rare these days to see out in the world, like an AI, a voice AI agent that has less than three seconds between every turn, much less, you know, a hundred milliseconds, so like a whole order of magnitude, right? And so the ability to manage the natural flow of conversation is both a usability problem and also this kind of uncanny valley problem, where, for instance, like if I can't interrupt when I need to interrupt, or if the agent can't interrupt me in order to step in over, you know, a misconception that I have and clarify what it was asking me for, or, you know, if I interrupt the agent and it doesn't know how much of its previous utterance it was able to get out before I interrupted, right?
Like, that's sort of a technical challenge of if I'm talking and you interrupt me, I know what you heard, and so if I need to go back and retrace, like I'm gonna pick up where I left off, or if I'm giving you like an either/or question and you only heard the first option and then interrupted and said yes, I know what that meant, right?
But because you meant the option that you heard. But an LLM doesn't necessarily know if it gets interrupted, like how much of its previous thing did you hear? So if it's saying, "Do you need to check the status of a new card or report a lost one?" And you interrupt after check the status of a new card and say, "Yes."
Right. It doesn't know what-
Galen Low: When you said yes and what you said yes to.
Oliver Shoulson: Yes to, 'cause it doesn't hear itself talk necessarily. So like these kind of dynamics of turn-taking in conversation are like really hard to get down technically and like need to be in place in order for, you know, the system to be navigable and usable in the way that I'm describing, but also in order to not give you that sense of like, oh my God, this is like the slowest, most painful, torturous interaction where I'm waiting three seconds after I say everything to hear the agent respond.
The other thing about sort of real-time dynamics of conversation is that, you know, we don't have a record of what was said previously in the way that we do when someone's chatting with an, a chatbot. And so this is a big weakness for large language models that are fundamentally, at least for the time being, mostly text-based, and are trained largely on long-form text and structured text at that.
So the text with bullets and parentheses and paragraphs and headers where it's used to providing information in a way that's digestible in this kind of long form structured text format. And that's just like not how active processing memory works in real-time conversation. Like, I'm giving you a bit of a monologue and a paragraph right now, but normally, like if you were providing me a bit of information and it was a really dense piece of information, say you had to convey an address or a URL or a phone number, like we have multi-turn routines that we follow in order to convey that information and ensure that the recipient has got it.
So, you know, maybe if you ask for an address, whereas over chat, I can just sort of send you the address, and you can click on it and go to Google Maps or whatever. Over the phone, I'm gonna say, you know, "Do you have something to write it down with?" You say, "No, let me go get that." I'm like, "Sure, okay, I'll wait for you."
You come back, you're like, "Okay, what's the address?" I give you the building number and street name. You ask for the spelling of the street name. I spell it out to you. It's this whole multi-turn routine that is completely obviated by a channel where turn-taking and the written record sort of supplements that processing memory.
So those are what I would call kind of the boring stuff. And then, you know, then we can get into some of the more fun for me as a linguist to talk about the nuances of spoken conversation and cooperative dialogue that particularly affect that uncanny valley problem. I guess I could pause there though, 'cause I know I went into a lot.
Galen Low: No, I think that's great. I mean, I do think we should dive in there, but I never even thought about the pace of turn-taking in conversation. And as you're saying that, I'm realizing that like, you know, I don't do a lot of voice AI myself. I'm mostly typing into an LLM like Gemini or ChatGPT or Claude, and I am that person who like will just walk away, right?
I'll just be like, I'm typing this thing and prompting it, hit enter. I'm like, it'll take a while. I'm just gonna go do something else and come back because I've used it like this database retrieval machine, whereas that probably, A, isn't the goal, B, might actually be limiting the benefit I get from the experience because it's not this sort of exchange.
It is sort of like I'm just gonna delegate a task to you, and you come back to me with your long form response, and then I'll copy and paste into a thing. And it's like, as you're saying it, I'm like, oh yeah, like what I'm doing, at least my typing in the text world, is not a dialogue really. It's more delegation than dialogue.
But when it is conversation, there are these things where it wouldn't fly, you know? Like, and there's things that we do, especially call center agents, right, where you're like, you do kind of have these little fillers, right? "Great. Yeah. So just wanted to confirm that blah, blah, blah," or, "Hey, okay, yeah, let me just look that up in the system," so that you have that whatever 200, 300 millisecond delay sort of filled to be like, okay, we're interacting.
This is an exchange It's a really interesting way of, like, thinking of the momentum of dialogue.
Oliver Shoulson: There's something that we do in spoken dialogue, something that's called back channeling, which is when-- So you're listening to someone, and you just did it for me in a visual way by nodding. But, like, verbally, it's when you say yep," when someone's talking to confirm that you're still with them, they still have the floor, and you're just confirming that you're with them, you're hearing what they're saying, you're getting what they want.
And there are ways in which those back-channeling routines are actually, like, really expected to a degree that you end up with problems and failures if they are not provided at the expected time. And this is something I-- a problem I sought out to solve very early on at my time at Poly, so, like, three-plus years ago, where basically we were asking people for their phone number.
And, you know, for US and Canadian phone numbers, we typically give them in that area code, three digits, and then four-digit chunk. And we have this implicit expectation, and you might not even realize this, that after you give one of those chunks, the person is going to kind of confirm that they got it. So people would say something like, you know, "My number is one two three," and then pause And then- Right
if you don't provide a back channel there, and worse, if you think that they've concluded their utterance, then you're gonna be like, "That's not a phone number. That was three digits." So, like, you actually have to teach these models how to provide that kind of conversation feedback in order to even navigate a data collection point like that without failing, because otherwise people actually get confused if they don't receive that kind of feedback.
Yeah, and to touch back to what you were saying about, like, typing a big prompt and then going away, that's absolutely something I do with AI assistants as well. But yeah, like, that just doesn't fly over the phone. That's not how we talk. I guess we could put you on hold, but, like, that's not a good experience.
Right. So, like, we need to actually find out ways to break out that information gathering, and particularly when conveying information, to convey it in a way that is structured for real-time processing, in a way that over text I can go get lunch and then read the paragraph you sent me afterwards, but that's just not how it works.
Galen Low: I love that. So, like, yeah, if I was gonna boil it down, like, even though you say it boring, I find it really interesting, but, like, timing is a thing that people don't often think about when they're doing real-time conversational design, and it is kind of a challenge because the technology sometimes isn't actually that fast yet.
Oliver Shoulson: Yeah, so we have to find ways to sort of supplement it, kind of as you're saying, like, with, like, filler utterances or delay utterances, which again, I like the way that you said, like, that's something that real people have to do also when they're doing something in real time. You know, if I'm clicking around trying to find your account in the database and you're on the phone, I'm not just, like, leaving you hanging there.
I'm sort of giving you updates about what I'm doing, and I, "Give me just a second," like, "It's being slow today," stuff like that.
Galen Low: Have you built an agent that, like, has the, what I'm gonna call the I'm doing something humming song? "Just gonna click this thing. La la."
Oliver Shoulson: Oh, what's funny, actually, you know, it used to be that we worked with, like, real voice actors more, and we would absolutely have them, like, you know, keep their sort of default persona, like when they'll extend certain syllables when they're thinking, you know-
Galen Low: Okay, yeah
Oliver Shoulson: sort of like I just did, and they're like I don't see your account," but, like, that kind of thing- ... which, like, clues you that activity is happening on the other side even if you can't see it. I love that kind of stuff. I mean, and that sort of gets into, I guess, the second group of things that I wanted to talk about, which are these subtle ways that people follow rules for how language is used in context.
In linguistics, this, like, subfield of linguistics is called pragmatics, and it, like, has to do with how language is actually used in context and, like, the way that things we assume about our conversation partner if we are engaged in what we call cooperative dialogue, which we hope is most dialogue. And so these are assumptions that are like, you know, my conversation partner is going to be as truthful as they can be, that they are going to provide as much information as is necessary, but not more, right?
Because actually, if you provide additional information when it's not necessary, that's how we get what we call conversational implicature. So that's when you end up implying something, is when you say more information than the task at hand requires. That's where you get this sort of meta conversation level of meaning that you can end up implying something that you didn't mean to.
So, like, let me just give you an example of that, which is one of the things that LLMs love to do, and they're trained to do this, and it probably helps the sort of coherence of their thinking, is explain everything that they're doing. So explain everything that they're asking for. You know, "In order to look up your account, I'll need your account number.
Could you tell me your account number, please?" Or if they're, let's say it's walking you through doing something on a website, you know, changing your account password. "Okay, go ahead and click on that profile icon on the top right. Let me know when you've done that." And so it does these things where it's, like, sort of giving you the stage directions of conversation in a way that naturally humans don't do because it's actually over-informative.
I don't need to tell you to tell me once you've followed the instruction that I've provided. Like, you actually already know how to do that, and if I tell you to do that every time, I'm, like, implying that you're stupid or something. Like, you-- Because again, I'm giving you more information than is required for the task at hand.
Or, "In order to look up your account, I'll need your account number. Could you tell me your account number, please?" Like, that's way too much explanation. Just ask the account number. The person knows why you're asking for it. And so these are, like, my big pet peeves when it comes to the way that LLMs interact that I actually think have, you know, obviously it matters that it annoys me.
I think it annoys other people. That matters from a user experience standpoint. But I think actually the way that these undermine the sense of real-time involvement in the task, right? It's not the way that two people who are engaged in the same task interact because of this shared, you know, table of context, of salient context that they share in that interaction.
And so by not assuming that shared context, I actually undermine that sense of social presence, that sense that you and I are together in real time navigating this in a cooperative, collaborative way, and I make myself out to be some entity that is in some other place, not aware of what you're aware of, not seeing what you're seeing, not knowing what you and I both should be knowing in this moment.
And so that's this is kind of the stuff that I really like to nerd out about because it's so subtle, but something like that really matters.
Galen Low: I think that's so funny because in a lot of cases, folks are maybe not as conversant in this art have probably been like, "That's great. Actually, we should add more words so that people trust this AI more."
It's like, "Here's why I need your account number, and, you know, please, yeah, let's like go say, like, please, you know, tell me when you're done or whatever." We've designed it to, like, try and build trust, but actually we're betraying the trust because the pragmatics aspect of things is like we're implying that, you know, there isn't this sort of, you know, shared level of understanding or trust or even context.
And I was just thinking, like, the thing that takes me out every time I have this gem that I train and probably pretty poorly, because I'm like, "Yeah, hey, we're gonna be buddies. We're gonna do this thing every week." And no matter what, every time I prompt, it goes, "Hey, nice to meet you." I'm like, "No, you're on my team.
We've done this every week." You know, and it kind of, like, pulls me out of this, and then I am out of that context. I'm like, "Oh, okay, so this person that... Or well, AI in this case, my, my interlocutor does not understand the shared context, so I guess I'll also assume that they know nothing, too." And then it becomes this sort of, like, performative, kind of annoying, right?
To your point, right? It's just annoying to have to do this dance that is unnatural, whereas actually, if I'm picking up what you're putting down, sometimes less is more.
Oliver Shoulson: Yes.
Galen Low: Like, how do you talk? Maybe build that into your agent, and that might actually build more trust and more adoption and actually get people through to their goal faster than if you explained everything.
Maybe it should be explainable. "Oh wait, why do you need my account number? Oh, yeah, okay, because of this." But maybe it's not the default interaction to be like, "Hi, I'm here to help you. I'm gonna need your account number. Don't worry, I'm gonna keep it very safe, and, you know, this is a secure connection and blah, blah, blah, blah, blah."
And you're like okay, yeah, fine." You know?
Oliver Shoulson: 100%. And you know, like, I don't necessarily have a strong stance on, like, a bot introducing itself as a bot early in the conversation or not. Like, I think that there are arguments for both. I certainly don't have a problem with doing that. But, you know, a lot of people understandably worry that, like, if we introduce the agent as an AI agent, you know, no one's-- people are just gonna be like refuse to interact with it.
On the other hand, sort of as you're implying, like a lot of people have worries about transparency and, like, want to convey, make sure that people know what kind of entity they're talking to. I'm absolutely sympathetic to that as well. But I do think that it's the worst way you could possibly do it is be like, "Hello, you're currently speaking with an AI-enabled virtual assistant."
Right. "This conversation is gonna be recorded and blah, blah, blah, blah." Like, we need to introduce it in as human-like a way as possible still, again, not with the aim of deceiving anyone, but just because it sets the tone for the capabilities that the system has and the way that you want the user to interact with it.
Like, do you want it to actually- Right ... try and develop a sense of rapport with your agent, or do you want them to treat it like a legacy IVR where they're just gonna be shouting keywords at it, like, the whole time? So, you know, when we do have those, like, disclosures at the beginning of interactions, I always try and sort of build them organically into the greet utterance.
You know, say something like, you know, give the agent a name. Say, "Hi, I'm Oliver. I'm such and such company's virtual assistant. How can I help?" Or, "Hi, I'm Oliver. I'm such and such company's AI. How can I help?" And, you know, leave that interaction, that first turn really open-ended, keep it as concise as possible, because just as you're saying, like, yes, less is more absolutely in conversation design.
Galen Low: I think that's a really good point, and I know you and I touched on it in an earlier conversation we had, this idea of... And I think I was on this vein of, like, can we take it too far? And I think you had said, like, yeah, the goal is not to, like, replace a human or, like, trick people, I guess. It's still an interaction that is, uses natural language, but, like, we're not trying to be like, "Oh, ha," like, you can't tell the difference, and you'll never know which ones are our human agents and which ones are our AI agents.
We just want it to be natural.
Oliver Shoulson: Yeah, exactly. I mean, first of all, I'd like to say like, yeah, no one's getting gold stars for tricking anyone. Like, that's not something we wanna do. Like, what we want to do is reserve people's problem-solving brain for the actual task at hand, and not the problem of how to navigate the interaction to solve the task at hand, right?
Like, what we find yourself doing when you call a company and you reach a phone menu, is you're like reverse engineering in your head how the person who designed this intent ontology for the phone menu, where your issue fits into that. So you're trying to think like, "Okay, would they have categorized it as account problem or as like..."
And so suddenly you're deploying all these cognitive resources toward like actually just the task of navigating the interaction as opposed to actually toward, you know, resetting your password or solving the problem, right? So like the goal is to allow people to use their intuitive linguistic faculties, sort of to circle back to what we were talking about at the beginning, to fall back on those really intuitive cognitive faculties, those intuitions of how they interact with people in the real world, to reserve their, you know, problem-solving brain for the actual thing that they're trying to do, as opposed to like navigating the interaction.
Galen Low: That's really interesting, and I agree. To build that friction in the experience when you're like, you know, using your problem-solving brain to like reverse engineer how they designed this experience, that's like what UX has been about, you know, removing for a long time now. I mean, is that why conversation is sort of back and sort of popular?
Why is it that language is actually more ideal for this? And I know you've listed like a whole bunch of reasons throughout, but like you know, if that's true, why is it so difficult to nail in terms of like designing an experience? Like why conversation, and why is it hard?
Oliver Shoulson: Well, the question of, like, why is language so difficult to nail and simulate is, like, a paraphrase of, like, the core question of linguistics, which is basically like, like, how do we model this human language faculty, you know, possibly in such a way that we could then programmatically replicate it, like, in a computational system?
You know, the miracle of the human language faculty is that a child, by the time they're three, four, five years old, is, like, fluent in a language with an astonishingly low level of exposure to that language compared to large language models, right? Like, even GPT-2 or 3 was exposed to thousands and thousands, if not millions of times more language than a child is by the time they're completely linguistically competent.
And what that leads linguists to believe is that we have this inbuilt cognitive architecture that is, like, you know, evolved for acquiring language. And so the task of the linguist is, like, basically what does that architecture consist of? Like, what are the dials and knobs? What are the parameters that can be set and unset?
Like, what are the operations that your brain is performing when constructing a sentence and retrieving words from your, like, mental lexicon, and how could we model that in such a way that in theory, a computer could do the same thing? The breakthrough that we've had over the past few years is ultimately a brute force breakthrough.
Like, it's like we haven't actually figured that out any more than, you know, all the great theoretical linguists who are currently still doing that work are doing. We just basically achieved an engineering breakthrough whereby we could kind of, by exposing, you know, by having trillions and trillions of parameters and exposing the entire internet and the entire written history of human text to a model, we could get it to kinda sorta almost do the same thing that a human child does by the time they're three years old.
Like, s- so, you know, the problem with language is far from solved, but I think that the answer to your question about, like, why we're pivoting to language is that suddenly we've had this breakthrough in this brute force approach, which is very useful. But there are absolutely people, myself included, that, like, I think hopes that we will still continue to pursue AI from A direction that models more the way that human cognition actually works, that like what people call, like, symbolic reasoning.
And there are fields of AI that are sort of more focused on this, that are less about like, okay, auto-complete the next token, like what's the next token, and more about actually computing over abstractions of concepts and words and sentence structures in the way that humans actually do it. So actually, like my suspicion is that we're going to run up against a bit of a diminishing returns or a bit of a wall when it comes to, like, this LLM approach where we just add more parameters and train on more text, and that fundamentally, like when it comes to reasoning and the ability to not make all of those stupid mistakes that we constantly see people posting online that LLMs do, like gonna require a complete, a fundamentally different approach, I think, an approach that is actually closer to, like, that kind of symbolic reasoning that humans do.
That was quite a tangent I just went on, but those are some of my like philosophical and like ontological commitments about what thinking and language are.
Galen Low: I think it's so important because, again, like AI today is like the ultimate party trick, right? It looks so capable, but actually we haven't cracked the code on it.
This is early days, and like the way we approach models will have to be different than the brute force approaches. We can only get so far with this, but it is very impressive. And it, you know, coming back to the nobody gets gold stars for tricking anybody, you know? Like I think that's important to remember.
No one's trying to trick anyone there, but it's so captivating to use. You're like, "Wow, this thing, like it knows so much more than anyone I've ever talked to, and like the, you know, it's limitless. I can have a conversation with it." And then you're like, "Oh, but it thinks, you know, four plus two is seven," so like, you know, it's like it's this cognitive disconnect.
But it is because, you know, we've brute forced, we figured out a way to like have natural-ish dialogue almost to gather more data to continue figuring it out. This is like the beginning, not the end. And I think that's actually like a really good reminder for folks, especially who are building this, that it's like, no, it's not because you're dumb, it's because language is really complex and our brains are really complex, and like there's no one way to just like crack it, and suddenly you can have like the best, most natural-feeling agent known to humankind.
But maybe with that, I think we've been kind of like going through this theme of like, yeah, like we are used to visual interaction design. It's sort of modeled on our real-world interactions, and now a lot of folks are kind of, you know, moving into this territory where they're like, "Okay, well, we also need to think about conversational interaction and voice interaction," which arguably, you know, is a bit of a shift.
So like for teams that are used to designing visual interfaces, what's the biggest mindset shift that they need to make when they're designing for voice or for conversational interactions?
Oliver Shoulson: You know what? Actually, I just think they're very-- they're just different skill sets, so they definitely could be complementary.
But, like, we experience information visually and linguistically in very different ways. Like, I-- you know, if you talk to anyone who does visual interface design or has a background in that, like, a lot of it is about finding out ways to represent information hierarchies visually, right? You get this sort of two-dimensional, sometimes simulated three-dimensional field of view in which you can organize and arrange information hierarchically in ways that will be intuitive to people or that will, you know, guide them to the path that you want them to take.
Language and spoken language we experience linearly in time, and then our brains, you know, parse it out into hierarchical structures. There's a lot of ambiguity that you can get in speech because of that. You know, you can say, "I saw the man with the red binoculars," and that has two meanings. It means either the man had the red binoculars or I used the red binoculars to see him.
And, like, that's a product of the fact that I'm only saying a string of words that you experience in real time, but there's actually two different kind of structures that could represent. You know, the with the red binoculars phrase could attach at the seeing level, or it could attach at the man level.
And so I think it's, like, just a different skill set, and what I would say is, like, lean on the people who are already having these kind of spoken language interactions to help identify patterns and edge cases and potential points of friction or confusion. You know, use your call center workers, your people who write the documentation for call center workers.
I, as a conversation designer, like, there is nothing better for me than spending a day shadowing a contact center worker and, like, actually listening to them interact with people and hearing the actual questions that people have, the actual ways that they phrase these questions, the actual kinds of information they're looking for, not just, like, what the business assumes they're looking for.
And so you have great resources available. The people who have experience with this know what it needs to be, basically.
Galen Low: I like that. You know, it is a different thing. I like the way you explained that. We have to make this sort of-- we have to parse it into a visual map in our head because it's like, you know, we experience it temporally in, like, linearity versus, you know, like, a website.
You think, like, we've already kind of painted that picture. Here's what you can click on. Here's the path I want you to follow. But I actually-- I think like, the strongest thing that I think my listeners can grab from that is like, oh, by the way, there's people who are talking to people all the time.
Just, like, pay attention to them. Like, it's like it's not you know, it's not necessarily like I'm thinking of all my, like, you know, interaction designer friends. Like, do they have to, like, back out and become linguists and then go back into, like, you know, design? And maybe the answer is no. It's just kinda like, yeah, it's a different thing to master, but, you know, the principles are similar, and the data is available.
Or, you know, you can shadow, you can listen, you can have more conversations, and you can pick it apart. And, you know, to your point earlier, it's not necessarily just adding a bunch more words and, you know, all the like, making it, you know, a very clunky experience. Sometimes language is messy and we interrupt one another.
We, you know, like there's all these things that happen. There's like, you know, multiple meanings to things. There's like the speed of it. And I love that it kind of comes back down to like this like social presence that is needed to, you know, connect the parties that are dialoguing, build trust, and actually, you know, achieve a goal.
Oliver Shoulson: And I would imagine that, and I have no evidence for this, but I would imagine that the sort of soft skills, or I guess over easy skills that make someone a good visual designer are transferable in the sense that you're asking yourself to sort of introspect and look inward about what an experience makes you feel, what the points of frustration are, and then actually, like, break that down to first principles and, like, say, "Okay, why?"
And you know, this is why I say that linguists are good at this job in the first place, which is that, like, a lot of what linguists, particularly theoretical linguists do, is sort of think about their own language faculty and say, "Okay, that sentence sounded grammatical to me, the other one didn't."
That's sort of a kind of subjective experience where one thing feels awkward, one doesn't. Why actually is that, and can I write those rules in such a way that in theory a computer could replicate them? And that's sort of like what you're doing as a designer in general, and I think, like, that's what makes linguists good at this.
And so I could imagine that someone even without a linguistics degree, in fact, I'm sure that even someone without a linguistics degree could apply those same kind of sensitivities and intuition and analytical reasoning to creating conversational interfaces.
Galen Low: I really like that. I wondered if maybe we could round out just by, like, talking a bit about the future.
You know, speaking of things where neither you and I have any, like, necessarily research or data or crystal ball to see into the future. But I've been thinking a lot about AI and how it's manifesting as hardware. You know, you have devices like Rabbit, and soon there's that OpenAI and Jony Ive hardware collaboration.
And I'm thinking about this thing, you know, we've been talking about, yeah, people still wanna pick up the phone and have, you know, the social presence and their experience. But at a certain point, we're gonna have our own sort of personal hardware agents talking to the agents of that business or the restaurant or the health clinic or you know, the customer support center.
When that happens, will the agent-to-agent experience continue this, you know, I put charade, the charade of human politeness and, like, this professional decorum and these things that we've been talking about, right? The back channels, the pauses. Will that still be a thing agent-to-agent, or will the language of agents be, you know, almost unrecognizable to us humans?
Oliver Shoulson: So I have no idea about this. And, like, I think that if you plugged, you know, Claude Opus 4.6 into, like, any of these interactions, it would behave the way that it was trained to, which is as though it were interacting with a person. So, like, I think that, like, fundamentally we're still in a place where these, at least the consumer models are mainly targeted at interacting with people, and so they will just default, default back to interacting in that way.
I think the question of whether we start training models that are more specifically engineered toward, like, more efficient communication with other models, like, I'm sure that's gonna be something that will start happening, particularly as, like, these, you know, agent harnessing frameworks like OpenClaw and, like, or agent orchestration start becoming more and more popular where you have- Agents spinning up sub-agents and, you know, reporting back to the original agent and, like, there's gotta be a more efficient way that they could be communicating with each other.
Like, maybe they're just gonna be like beep boop, beep boop- Right. ... and like, you know, a terabyte of information that, you know, it would take a million years to convey in human speech. I would think that, like, assuming that those agent orchestration, multi-agent bot harnessing kind of frameworks continue to take off, that we'll start getting models that are, like, specifically trained to communicate with other bots, but I have no idea.
Galen Low: Honestly, good answer though because I do agree. I think like, you know, when you think about it, language m- between humans and now I guess with AI, it's like there's almost something wonderfully inefficient about it, you know? Like it's not perfect. It's imperfect. That's kind of what's great about it.
That's how we evolve. That's what we want from the interaction. Do machines want that? Maybe not, you know. And fundamentally, it's how we program them, at least right for now, it's how we program them. But I think it's a really interesting idea, just this like, is it because human, right? Like, not because language is great, but because that's just, like, what we're good at as humans.
Anyways, thank you for entertaining that, you know, massive question and arguably a sci-fi tangent at the end. But yeah, I just wanna say, like, I really appreciate this conversation. I had a lot of fun. I love nerding out on this. You obviously are really deep into this. Just for fun, do you have a question that you wanna ask me?
Oliver Shoulson: Well, I'm-- Like, you s- you said this thing about, you know, hardware AI. Like, I'm really curious if you think that's going anywhere because my feeling is like absolutely, like people will start to ha- interact with AI models on the go and more. But like why would I need something other than my phone, like I guess is my question.
Like, I have my phone already. Like, it's hard for me to imagine any of these hardware things taking off, but I'm prepared to be totally wrong about that.
Galen Low: No, I think you're right. There's two things for me is like just the economy in general, right? Like consumerism meets technology, meets, you know, brands trying to carve out their space in the market, so I could see that just differentiating.
The actual practical use case, I'm not sure, but I know that like we see this with tools already, especially in my space, project management, right? Where it's like as soon as a shiny new tool comes out, everyone like jumps on it. There's this thing where we just want tools that we don't need because we think it promises something that's different, but usually it's not.
So I think it will land somewhere in between. I think there might be more devices than the sort of usual suspects in the smartphone world. But yeah, I don't see it necessarily quickly becoming like, you know, oh, am I gonna have an extra pager on my belt? I don't know if that's gonna be like good. But then at some point, there's like the robotics angle too, right?
Where it's like, okay, well maybe it's not a pager on your belt. Maybe it's just like, you know, your cyborg walking next to you. And then I'm like, "Okay, now I'm done thinking about it for now," because I'm like that's like too far. But yeah, I do think that there's a level of gimmickiness, but also like do people want this?
Let's see. But I don't know that it's gonna be that different.
Oliver Shoulson: Well, what an interesting, I mean, speaking of like cognitive biases that seem almost anachronistic, like it feels almost like caveman brain to like want like a dedicated tool for like one specific thing- ... as opposed to adding like additional software to your phone just because it's not like another thing you can hold.
Right. And like, it feels like something ancient is going on there.
Galen Low: Remember, I remember like, like gosh, like cargo pants, right? Which was like, oh, because now I can carry all my stuff. And it's like, no, we have less stuff.
Oliver Shoulson: Yeah.
Galen Low: Awesome. Oliver, thanks so much for spending the time with me today. This has been like a lot of fun.
Just for folks listening, where can people learn more about you?
Oliver Shoulson: So you can follow me on LinkedIn @oliverhs. Oliver Shoulson is my name. I also have a website, olivershoulson.com, where I occasionally post stuff. I need to update that more frequently, but other than that, you know, check out poly.ai, check out the work we're doing there.
We're always publishing lots of interesting case studies. You can actually hear Selma, the Fogarty Challenge- Okay ... on our latest case study there.
Galen Low: Awesome. I will also add those links to the show notes for folks listening so they're easy to click on. And yeah, Oliver, thank you so much.
Oliver Shoulson: Thank you so much.
Galen Low: All right folks, that's it for today's episode of the Digital Project Manager Podcast. If you enjoyed this conversation, make sure to subscribe wherever you're listening. And if you want even more tactical insights, case studies, and playbooks, create a free account with us at thedigitalprojectmanager.com.
Until next time, thanks for listening.
