Trust in AI isn’t a vibe—it’s something you can intentionally design for (or accidentally break). In this episode, Galen sits down with Cal Al-Dhubaib to unpack “trust engineering”: a shared toolkit that helps cross-functional teams (engineering, UX, governance, risk, and business) talk about the same trust risks in the same language. They get into why “boring AI is safe AI,” how guardrails and human handoffs actually preserve trust, and why the biggest failures often aren’t the model—they’re the systems (and incentives) wrapped around it.
You’ll also hear real-world examples of trust going sideways—from biased outcomes to hallucinated “gaslighting,” to AI-assisted deliverables causing accuracy issues—and what project leaders can do to prevent finger-pointing when it happens.
What You’ll Learn
- What “trust engineering” is—and why it’s really a shared communication framework, not just a technical checklist
- Why “less AI” (strategic AI minimization) can be the most responsible and effective approach
- How to design AI experiences with expectation management, decision design, governance, and trust infrastructure
- Why clear ownership matters: shared responsibility still needs an accountable owner
- Practical ways to measure trust through incident prevention, observability, and user behavior signals
- How trust differs across cultures and contexts—and why there’s no single universal definition to “encode”
Key Takeaways
- Trust fails by default when nobody owns it. Cal’s blunt point: when trust is “everyone’s job,” it becomes “no one’s job.” Teams need a named owner who ensures the right questions get asked—especially when things get messy and blame starts flying.
- “Boring AI” isn’t a downgrade—it’s risk management. The safest AI is often the least flashy: narrowly scoped, grounded in controlled data, and designed with off-ramps. If your AI experience tries to be endlessly open-ended, you can overwhelm users and increase risk.
- Use “strategic AI minimization” to protect outcomes. In the transcript example processing project, 90% accuracy wasn’t the win—knowing which 10% fails was. They routed higher-risk inputs (like certain international or non-standard transcripts) into a human queue instead of gambling on automation.
- Guardrails aren’t just technical—they’re experience design. The Behr Paint “Chat Hue” example shows how trust is built with intentional boundaries: it can recommend colors, but it hands off to humans for higher-stakes guidance (like mixing or application) and avoids unsafe or off-brand outputs.
- Governance is three questions, not a 200-page doc. Cal’s practical framing:
- What did you do upfront to reduce harm?
- How will you know it’s making mistakes in production (observability)?
- What do you do when it fails—and have you stress-tested that plan?
- Trust measurement can be surprisingly familiar. Beyond “avoiding incidents,” trust shows up in whether users can complete goals, whether they return, and where they drop off. If the tool isn’t useful, people won’t “trust” spending time with it.
- Sometimes the trust issue is human incentives, not the AI. When people’s bonuses or performance are threatened, they may be motivated to reject (or sabotage) AI recommendations. Trust engineering has to include organizational design, not just model design.
- There’s no single moral “truth” to encode. The self-driving ethics example lands the big idea: trust and ethics vary by cultural context. AI can’t magically “solve” trust better than humans if humans don’t agree on what trust should mean in the first place.
Chapters
- 00:00 – What Is Trust Engineering?
- 03:58 – Who Owns Trust?
- 05:27 – AI Incident Database
- 08:41 – Regulated Industry Risk
- 10:34 – Strategic AI Minimization
- 12:11 – Trust Engineering Defined
- 15:29 – Defining Fairness
- 17:10 – Four Pillars of Trust
- 18:03 – Behr Paint Case
- 23:15 – Voice AI Failures
- 25:53 – Governance in Action
- 31:31 – Avoiding Finger-Pointing
- 35:31 – Measuring Trust
- 37:46 – AI, Ethics & Culture
- 42:13 – Why Trust Is a Framework
Meet Our Guest

Cal is a globally recognized data scientist, entrepreneur, and innovator in responsible artificial intelligence for heavily regulated environments, including healthcare, energy, and financial services. He leads AI and Data Science at Further, a digital transformation partner that helps some of the world’s most recognized brands create smarter, more personalized customer journeys with data and AI.
Before joining Further, Cal founded and scaled Pandata, an AI design and development firm that partnered with organizations such as Cleveland Clinic, Progressive Insurance, and FirstEnergy and was acquired by Further in 2024. He is a frequent keynote speaker on AI innovation, governance, and literacy, reaching thousands of leaders at major industry events including MAICON, ODSC, AI Summit, TDWI, and DataCamp.
Resources from this episode:
- Join the Digital Project Manager Community
- Subscribe to the newsletter to get our latest articles and podcasts
- Connect with Cal on LinkedIn
- Check out Further
- AI Incident Database
Related articles and podcasts:
Galen Low: What is trust in engineering and how does it lead to safe AI solutions and experiences?
Cal Al-Dhubaib: It's a shared communication toolkit to help various different personas who collaborate around the design and adoption of AI solutions to talk more productively about preserving trust. There needed to be some shared way for all these people to talk about the same thing and have this similar understanding of the ways in which trust can be violated with AI, and then the toolkits of how to like defend against that.
Galen Low: How can a project team share the responsibility of building trust into their solutions and avoid the finger pointing?
Cal Al-Dhubaib: An interesting thing about accountability is if it's everybody's responsibility, nobody owns it, and so you ultimately need an individual who is going to own this aspect. We actually have one assigned to everyone of these AI projects so that we can make sure that we're asking the right questions.
Galen Low: How long do you think it will take for AI to understand humans' concept of trust better than humans do?
Cal Al-Dhubaib: I really think that the definition of AI is so circular? How can AI understand trust better than humans? I don't know, but also—
Galen Low: Welcome to The Digital Project Manager podcast — the show that helps delivery leaders work smarter, deliver smoother, and lead their teams with confidence in the age of AI. I'm Galen, and every week we dive into real world strategies, emerging trends, proven frameworks, and the occasional war story from the project front lines. Whether you're steering massive transformation projects, wrangling AI workflows, or just trying to keep the chaos under control, you're in the right place. Let's get into it.
Okay, today we are talking about how trust can be engineered into AI powered experiences, why that's so important for organizations rolling out AI solutions and regulated industries, and what the role of the project team and especially the project leader plays in creating that trust.
With me today is Cal Al-Dhubaib, the Head of AI & Data Science at Further, a company that helps transform customer experiences with data and AI. Cal is an AI and data science expert, entrepreneur, prolific public speaker, and the new host of the Open Data Science podcast. He uses his deep experience in data science to help organizations bridge the gap between AI innovation and ethical responsibility so that their AI rollouts are aligned with their values and goals.
His work in the Cleveland Tech sector has also seen Cal recognize as being among Crane Cleveland's top 20 in their twenties, a notable immigrant leader and a notable tech executive will also being a four time winner of the Cleveland Smart 50 Awards.
Cal, thanks so much for joining me here today.
Cal Al-Dhubaib: Galen, thanks for having me. I'm excited for this conversation.
Galen Low: I'm excited as well. I really have loved our conversations leading up to this. You're absolutely someone who knows your stuff. You've been doing the circuit. Congrats on becoming the podcast host for the Open Data Science podcast. You've got a lot of like great things going on.
I know that you are deep into this, you're passionate about it, and I hope this conversation will actually go to like unexpected places. But just in case, here's the roadmap that I've sketched out for us today. So to start us off, I just wanted to like set the stage by getting your hot take on like a big juicy question that my listeners want to know the answer to.
But then I wanna unpack that and just like talk about three things. Firstly, I wanted to talk about what it means to engineer trust into an AI solution and what the project team's responsibility is in upholding trust in the solutions that they're designing. Then I'd like to explore ways that trust can be measured and audited, especially within the framework of regulated industries, but you know, also beyond compliance.
And lastly, I'd like to just get your perspective on like the future and how our perspective on trust might evolve over the next few years as we sort of collectively gain a better understanding of how our data is being handled. And what it can be used for in the age of AI. That was a lot. But how does that sound to you?
Cal Al-Dhubaib: Oh yeah. We're gonna cover so much ground. It's ambitious and I'm here for it.
Galen Low: I love it. Let's dive in. I wanted to just start us off with like one big hairy question and one of the big topics that you talk about is this notion of engineering trust into AI solutions and experiences, particularly that users, businesses and regulatory organizations won't adopt AI fully if they can't trust what it will do.
So my hairy question is this. When building and rolling out AI solutions, who owns trust as a solution requirement, and what happens when that trust is absent from an AI driven experience?
Cal Al-Dhubaib: So the easy answer to that is by default, no one owns it, and that's the big problem. That's part of what I'm trying to solve for with building this course that I'm working on intra to trust engineering.
And frankly, it reminds me of the early days of data science where somebody in analytics could own it, or somebody in it could own it, or somebody in finance could own it. It didn't really matter so much where it lived so long as you had that clear ownership of who's responsible and accountable for that capability.
So the same is true with engineering trust into these AI solutions, and you probably guessed it, if it isn't a part of the design process, ultimately the solution ends up resulting in some unintended consequence, so there's no shortage. One of the resources I frequently cite and reference folks to is the AI incident database just pulled the latest numbers to look at a whole of 2025.
And unsurprisingly, the number of incidents have continued to rise year over year without fail.
Galen Low: That's interesting. Could you tell us more about how the database approaches or defines an incident?
Cal Al-Dhubaib: So I love it because what they do is they scour news sources and they look at an incident from multiple different coverage perspectives.
And so if multiple independent news sources have covered a particular story, they will organize all of that into a single incident and you can review all the coverage that happened around it. And some early examples of this were, for example. Apple launching their credit card and it initially would give lower credits to women versus men, all else being equal.
And you know other examples in that incident database a little bit more recently, any scale. So they are well-known for their agent coding solution and they had an issue with users being randomly locked out. And of course, their AI powered chatbot would respond with a hallucinated answer Ironically. So that actually ended up confusing users.
They felt gaslit and then they ended up canceling their subscriptions. And then as recently as last fall, one of the big four consulting firms had submitted a particular deliverable. So this wasn't even AI solution, this was just a deliverable that JI was used in the process of creating. And the deliverable was submitted to the government of Australia, and it contained significant accuracy issues that resulted from hallucinations.
And so we have examples that go back to 20 16, 20 17, early days of ml, and we have a growing number of incidents over the past few years. And so these are all logged and tracked within the AI incident database.
Galen Low: What I like about the three examples that you gave is that it kind of covers a bit of a spectrum of incidents. A, there's like sort of this almost bias that is probably programmed in, right? It's not sort of coming outta thin air. Right. It's probably in the training, our data.
Cal Al-Dhubaib: For sure it exists in our data.
Galen Low: Second one was, you know more on the, I guess, AI gaslighting, it's like the best word. I like that phrasing of it. It's like, no, you are the problem actually, and we're locking you out. And then it's oh, actually, I guess AI is smarter than me. Like maybe I should be locked out. Right? Wait no, I shouldn't be.
Cal Al-Dhubaib: Oh yeah, you're right. I hadn't considered that.
Galen Low: Yeah, exactly. And then I like the third one, right? It's kind of like where accuracy can go wrong. It plays into the hallucination, it plays into humans in the loop and trust and just doing our due diligence, I guess.
Cal Al-Dhubaib: A hundred percent.
Galen Low: I think it's like a really good launching point for this conversation because I hope we kind of cover all of those things and more.
A lot of it's more than just, Hey, we're being transparent about what we're doing with your data. The end, it's actually a lot more than that in terms of like what makes a solution sort of trustworthy. Maybe we can zoom out and get into this a little bit. You focus a lot on heavily regulated industries.
Things like healthcare, financial services, and defense. You know, these industries where ethical and responsible use of sensitive data is not just like the right thing to do, but there's also pretty severe consequences for being non-compliant. And then expanding on that and sort of notion of trust, like not necessarily just like, you know, data privacy and data security, but like the two ideas that I've seen you talk about in your work are the notion of trust engineering.
And also this idea that like boring AI is safe AI.
Cal Al-Dhubaib: I love it. We try to make it shinier than it is, and I learned this very early on in my career. The theme with working in these heavily regulated environments is also you have like the high cost of making a mistake and you get one guarantee with AI systems, by the way. They make mistakes.
Galen Low: Yeah, fair enough.
Cal Al-Dhubaib: And so early on, I remember working with some hospital systems and we would show them how we could use machine learning to build various different predictive models. And then we'd start talking about, well, we need to make sure that these models perform well across all demographics.
You don't want it to be underperforming, for example, in certain minority groups and resulting in potential issues. What would end up happening is that in those early days, I'm talking 20 16, 17 18, the response sometimes was, wow, that sounds like a lot of liability, and we're not sure that the lift we would even get across the whole population is worth going through this project.
And so a lot of projects ended up being put on the shelf 'cause the risk tolerance was just not there, and there wasn't a satisfactory way of being able to say, 1, 2, 3, 4. Here's how you handle this.
Galen Low: Talking with some others, I had Lauren Wallace on the podcast and we were talking about this notion that businesses that are operating in heavily regulated industries, they have this risk muscle built in.
They have tolerances defined, and it doesn't surprise me really, that they're like, okay, this goes beyond our risk tolerance, our risk threshold. We can't really proceed. That's just how we roll. It's not that the technology is bad, it's not that anyone did anything wrong, but like, yeah, we can't run this risk until the technology is ready, until our data is ready, until our policies are ready, until like legislation is ready.
And I actually think it's like, it's kind of admirable. It's safe, it's boring, right? A lot of people would be like, oh gosh, would a failure, would a waste? But it's still a step in the right direction and a pause for the right reasons to not just fly into it.
Cal Al-Dhubaib: A hundred percent. In a funny statement that I heard that really resonated with me is sometimes the best AI is less AI.
So one of the things I talk about is strategic AI minimization, but to give you a recent example, we were working on with a higher ed institution around processing student transcripts, and we were trying to go beyond just your traditional robotic process automation and actually use LLM based extraction to give them higher accuracy.
These transcripts come in so many different formats. And the best we were able to get was 90% accuracy, which is pretty good for an AI based solution in this context. And the first thing they ask is like, okay, which 10% of the time is it wrong? That makes sense. Otherwise, you're just going to the lotto and then you're rolling the dice and then one 10 times you're gonna get a mistake and they can't run a business off of that.
And so what we ended up doing was studying when the AI was most likely to be wrong. And so there were transcripts that came from international institutions. They were the transcripts that came from the trimester system versus quarter system. And what we ended up doing is coming up with a rule list of, well, if the transcript comes from any of these sources, we're going to have that go into a manual human queue.
And then we're gonna have AI process the rest. And so it's not automating a hundred percent of the workflow, but it's automating enough of it in an area where it's safe and you're more likely to have a more accurate result in the column that you trust. And so that's just one example of trust engineering and practice.
Galen Low: Maybe we can go in there 'cause I love that you are in the process of building a course for LinkedIn learning on trust engineering. Can you just like give us the definition, like what is trust engineering and like how does it lead to safe AI solutions and experiences?
Cal Al-Dhubaib: So it's a shared communication toolkit to help various different personas who collaborate around the design of AI solutions and adoption of AI solutions to talk more productively about preserving trust.
And so I noticed that you'd have different individuals approaching this trust problem with different lenses, and they weren't all sharing the same language. You have AI engineers, many of whom now are familiar with the statistical issues, like statistically harmful biases and data sets, but you also have risk management folks and auditors who look at it from the lens of like, all right, what's the exposure to our organization?
You have governance folks who are thinking about how do we document assumptions? You have the user experience designers who are thinking about how do we create the ways in which individuals interact with the system. And you have the business stakeholders who are defining the requirements and the business problem.
And there needed to be some shared way for all these people to talk about the same thing and have this similar understanding of the ways in which. Trust can be violated with AI and then the toolkits of how to like defend against that.
Galen Low: I love that it's a shared language and a framework to kind of diagnose and almost work through some of these problems.
What I love about it is like earlier you were talking about data science and the role of the data analyst, and I know in certain spaces, especially customer experience where you know, they're like, oh, we'll just have a data person. Come in, they'll just solve everything. And it's their responsibility. It's centralized.
They're like responsible for data cleanliness, data accuracy, this, that and the other. And everyone quickly finds out that it can't just rest with one. It's a team effort. This is a collaborative thing and you know, I love that this is like a cross-functional framework to get people working together.
Cal Al-Dhubaib: A hundred percent.
Galen Low: Tying back to what you were saying, strategically minimizing AI where it doesn't have to be all or nothing. Like is this workflow a hundred percent AI or not AI, I think it just plays right into this idea that is the collaboration between AI and humans. As it stands right now today, it's that we need to understand AI, everybody, right, and have a shared vocabulary to like be having conversations that are meaningful in the design and implementation process.
And also we need to understand one another. And I think what's so relatable about that, in my head, I was like. Project managers, we nerd out about methodologies. We're like, oh, okay, which methodology is best? What's gonna be best for this project? And sometimes we get too deep into the weeds about like all the fine details and like the specifics.
But fundamentally, it's meant to be a way to just kind of get on the same page about working together and delivering a thing. And I'm like, dammit, I need this thing. Or like translating between stakeholders so that we're all talking about the same thing. Because like, gosh, you know, I'm sure everyone has been involved in some kind of project where like it's just lost in translation.
We all have different specializations.
Cal Al-Dhubaib: It's easy to assume, oh, that's covered.
Galen Low: And that's what I like about this. It kind of makes it a team sport. Here's a framework for trust engineering. We're all in this together. I imagine it probably does go into privacy, but I think that's the other thing where it comes up is like accessibility and privacy and things that like no one person can own in any complex solution, even simple ones.
Cal Al-Dhubaib: No, and like a simple example of that is like, let's talk about the notion of fairness in building a model for student admissions, for example. This is something that we dealt with pretty recently. And part of the requirement was making sure that the model met quality expectations across various different demographics.
And so the client asks us, so how are you gonna assert that? Well, we come back to them with, we have to understand how you define fairness, because there's 22 different ways of calculating fairness. And frankly, that's outta scope for the engineer. I can tell you what formula is and how to calculate them in various different ways.
I'm gonna need to lean on you to say, what is your policy? How are you approaching this definition and what is correct for you?
Galen Low: You know, there's a lot of, I don't know if I can call them AI naysayers or like doom and gloom folks about how AI is just gonna like send us in this direction without our sort of influence.
But I think the other thing is happening in reverse. It's forcing us to look at things we've been doing for decades. Are they ethical? Are they fair? Have we just been making off the cuff decisions? Now we need to decide. Just like the autonomous vehicles pilots have been deciding for a long time whether to crash into a school or crash into a field where there's like two houses.
It's not a comfortable decision, but there is policy around it and we're like, oh yeah, it's a judgment call, case by case, blah, blah, blah, and we kind of soften it for ourselves, but.
Cal Al-Dhubaib: And it can get that severe, but it can also be something a lot easier, a lot more pedestrian, if you will. A simple example of this is these systems, they tend to be so open-ended, generative AI systems in particular.
You can ask almost anything of this solution and what we've seen is that can overwhelm the consumer.
Galen Low: Interesting.
Cal Al-Dhubaib: Actually, it's nice to have a limited range of what can I do with this tool? And so I've got these four pillars in trust engineering, it's expectation management, decision design, we've got governance, and then how do you manage trust infrastructure is the last one. And so that's your tooling, that's your risk. That's like how do you engineer this stuff? What's in the toolkit? But going back to this open-endedness and managing decision design, how do you prompt the users with the right amount of information and the right amount of guardrails so that they know how to use the system well?
Galen Low: Can we go into some examples? I love those four pillars, expectation management and decision design. It's a really interesting thing. If we can dive into an example and then maybe even just back out of it in terms of who would need to. Be involved to arrive at that level of trust in the experience.
Cal Al-Dhubaib: So I'll give you a very simple example because it's easy to talk about life or death, or we're talking about patient readmissions and I wanna give you an example that is something that we publicly can talk about 'cause it's not a case study.
And it's with a company that you might know, Behr paint.
Galen Low: Yeah.
Cal Al-Dhubaib: So really well known. You go to Home Depot, like you got the big display. They found that one of the biggest drop off points in consumer consideration was getting overwhelmed with color selection. I happened to be a color challenged individual despite all of my experience learning about paint for the past couple years.
But my color pals maybe about 10 colors. I get very overwhelmed if I have to like pick, oh my God, which version of Eggshell White do I want here? Right? So they decided to build a generative experience around product selection. And this is actually not as trivial as it sounds because it has to respect color science.
It has to respect style guides. It also has to be grounded in products that actually exist. And there were some of the decisions that we had to make along the way about what type of advice was acceptable and non-acceptable. And so selecting a color is fine. Figuring out how to formulate it or how to mix paint, or how to apply it.
Was something that they decided, for example, that they wanted to have a higher level of quality control over it. So if at any point in the conversation it starts to veer into that direction, it quickly prompts the user with, here's our hotline for help with this particular inquiry.
Galen Low: Okay, that's interesting.
Again, that's sort of like 90 10, like, stop here, don't keep going. Don't try and make something up because you know, poor AI under so much pressure to have all the answers. Right. And tell it's okay. And actually let's send it down a human path.
Cal Al-Dhubaib: And then on top of all that, we had to make sure that no matter what, it doesn't recommend competitor paints. It also doesn't get bullied into saying things that you don't want it to say and get screenshot and put all over the internet, right?
Galen Low: Yes.
Cal Al-Dhubaib: And so I think this is a great example. Some of these elements of trust engineering, where we have everything from decision design, what is the decision that we're ultimately trying to influence it?
And what are the guardrails that go into that? How do we manage the expectations of the user of how they can use this tool and how they can't use this tool with appropriate governance on the backend? And of course, all this communicated to our stakeholders in a user friendly system card.
Galen Low: And the interface was conversational, is that right?
Cal Al-Dhubaib: Yeah. So if you go to behrpaint.com, free promo.
Galen Low: There you go.
Cal Al-Dhubaib: Chat bot like pops up and it says, hi, how can I help you? It's called chat hue.
Galen Low: Okay. I like that. It's a really good one because it's a great example of all those things you mentioned, right? You had mentioned color theory. There's a specialist on the project team and going like, you know, you couldn't put these two together for folks who have a limited spectrum of color in their vision, but also not so judgy that it's like, ugh.
No purple and green are not gonna go together. What are you thinking?
Cal Al-Dhubaib: If you try to do that, it'll gently coach you into here's some coordinating colors that you might wanna consider.
Galen Low: And then it's like there's this sort of a copywriter or a linguist and it's like, okay, well how can we like explain.
Cal Al-Dhubaib: The brand voice?
Galen Low: Yeah, it's the brand voice. Yeah. And I'm imagining. You train it on Behr's catalog of colors.
Cal Al-Dhubaib: Yeah. All this depends on UpToDate, of course, digital product data, and so the big part of this initiative is also making sure that everything was suitable and updated and mastered and owned. We're working on a similar type of experience, but more for navigating internal knowledge base for one of our other financial services clients.
A big part of the solution isn't just architecting. Here's how the AI model retrieves information, but also having human led processes for who owns that database or that knowledge base, and how it gets updated over time.
Galen Low: That's really cool.
Cal Al-Dhubaib: Big part of this is also having those human led processes that keep the data fresh, that the AI system depends on.
Galen Low: And it is that sort of cooperation, and it doesn't have to be all or nothing where, you know, AI maintains itself and you know, fixes itself and does its own like testing and QA and updates. It's like still a role for humans here.
Cal Al-Dhubaib: Some of these people that I think should know better are out there promoting this whole, like AI is gonna be running everything with AI agents for AI agents, and frankly, all the production use cases I've come across require a healthy dose of human intervention and rebalancing where AI is directed in what humans have to do on top of it.
Galen Low: And I think it's like realistic and you know who's to say where the technology will go, but I'm like, oh, because agents that manage agents, that's an idea that sells because usually an organization's main expense, it's payroll, so it's like, hey, you don't need staff.
Just kidding. You actually, you'll still need staff, you know, potentially. It has the power to sort of, you know, get you there. It's like that's the marketing hype.
Cal Al-Dhubaib: It's very alluring.
Galen Low: It's very alluring. Right. I wonder if we can dive into some like anti examples of maybe some experiences. We don't have to name names, you don't have to shut anyone out, but are there some anti examples of AI solutions out there maybe that we actually use, that we a lot of us use that actually don't really go outta their way to establish trust that are a bit of a black box?
Like what are some elements of an experience that, you know, trust hasn't been engineered into it?
Cal Al-Dhubaib: So I love referencing these incidents in the AI incident database. I mentioned a couple of them early on, but you know one that might be fun to pick at is McDonald's had some issues with their voice ordering and their drive-throughs, and they had been working with a particularly well-known software consulting company.
And they finally last year had to put the system to rest because they couldn't consistently get it to place accurate orders. And I don't think that this was a failure in terms of the technology so much as a failure to put the right guardrails around the technology. So you have dominoes on the other hand where 80% of their call in orders are now being handled by an AI system.
Galen Low: Oh, wow.
Cal Al-Dhubaib: And what I really like about contrasting these two stories against each other is there was an attempt to fully make it AI, an attempt that has individuals supervising live calls. Taking over when the AI system isn't getting an accent or isn't understanding an individual, there's appropriate auditing measures in place to make sure that orders are continuing to match up with what the user said.
So I think failures like this comment about not because the technology didn't work. We know that AI is gonna have some inherent error rate. The failures come about because the right systems weren't designed around it.
Galen Low: I sort of like this notion of, you know, again, like what you had said, it's not all or nothing in some ways.
I got my project manager hat on, right? Imagine being the person who like, you know, wielded this multimillion dollar budget to do voice ordering for McDonald's. And then eventually we were like, no, it's not good enough. And it's on a shelf gathering dust somewhere. I think it's an important point that it's not a failure.
Probably the wrong idea in terms of where the, you know, as you say the technology is going. The wrong idea would be to go, oh, AI doesn't work. Let's just go back to a hundred percent human call center agents, order takers. Like, let's just go, human AI isn't gonna work. Versus, and we see it in the CX space a lot, right?
Even decades of chatbots now and then hand over to a human if it's like really just not going. Where we want it to go. And having that hybrid model, but still training the technology on that data. Right. Like you said, like accent data. We probably have a lot of it, don't get me wrong, but not as much as however many people order McDonald's.
Cal Al-Dhubaib: Sure.
Galen Low: We're gathering so much more data now that we can then process and eventually the technology should get there to the point where it's like we have enough data, we have enough guardrails, we've designed it in a way that, you know, we can actually make this viable.
Cal Al-Dhubaib: In some of the design practices that we see where you're considered de-risking the AI system, the easiest is, Hey, you know what?
We're just gonna equip call center agents with access to gen AI and maybe that gen AI is accessing live transcripts or within the chat and able to synthesize recommendations. Or maybe it's something that they can just query. And so they're doing it on behalf of the user and they're able to make an expert judgment.
Now, they're still able to work faster and more effective, but they're that safety layer. There's times where you could have a filter of like, is it appropriate to use AI or do you fully launch an AI experience carte blanche? And then you have some maybe triggers of when things get redirected, but there's always gonna be some balance of humans in the loop.
And you know, whenever I talk to clients about like their governance practice, I say you wanna answer three basic questions when it comes to these AI systems. What have you done upfront to test and assert that you've minimized the potential harm or potential risks inherent to this AI solution? Two, once it's in production, how are you going to know when it's making a mistake?
So this is observability. What is your criteria for those guardrails? What are your diagnostics and how are you going to know when it's not meeting expectations? And then the third question, and this is the most important question that often gets overlooked, is what do you do then? How do you get it done?
Have you actually stress tested that plan? So let's say you have this hypothetical call ordering system just to continue picking on our McDonald's friends, and you've now kind of said, okay, once certain triggers happen, we're gonna have a human override. How able is that human to then quickly get up to speed on what has transpired.
Galen Low: Right.
Cal Al-Dhubaib: Whether there the right experiences and interfaces and controls to be able to read quickly what has happened, diagnose the potential issues, and are you able to do that gracefully in a way that preserves consumer trust?
Galen Low: In a way, it's actually the biggest pressure test to put it in a quick serve restaurant. Right, because call center agent, right? You're calling about your whatever insurance policy. Like it's acceptable in the experience to be like on hold for three minutes.
Cal Al-Dhubaib: I don't think I wanna be on hold for three minutes for a big match.
Galen Low: Well, exactly right. Like you have to, and then, you know, we're demanding so much of a workforce that is quite junior usually.
Stereotypically, you know, quite a junior workforce in some cases. Preparing food and taking orders at the same time and trying to get up speed on like a transcript from an AI conversation from the drive-through. Within like a minute, whenever I visit a McDonald's or a quick serve restaurant, I'm always looking at that screen, you know, that has a countdown.
I'm like, oh man, they only have 15 seconds before their orders is out of tolerance. You know, it's very high pressure and it's a very real time use case for this sort of thing. Part of me is like, I'm happy. Folks are trying this and yeah, it's not always gonna work. You mentioned earlier that like in this incident, database incidents are going up, but I mean, would you say that it's because the technology just isn't working or it is getting worse?
Or is it just because more of us are using it and actually implementing it and therefore there's more mistakes and there's more stories and it's not gonna be perfect?
Cal Al-Dhubaib: So I think it's a function of two things. It's getting easier to build these types of tools and systems, and so the surface area has grown.
Even if our own ability to manage the risks has gotten better, it's actually a pretty good sign that it's not rising as exponentially as the adoption of AI is. And then the second thing, I think we're getting a little bit more experienced with spotting AI issues. I think we've exited the era of inherent trust in digital information.
DeepFakes a few years ago were very concerning and they're still concerning today, for example. But the average person has a heightened sense of like, oh, maybe that's AI. And you can go to almost any social media thread and you click on the comments and then you'll see. AI, the average person is starting to be very skeptical of information.
Galen Low: And I like that. It's like that used to be taken for granted that we trust digital experiences. You know, we trust our Google and what have you, news media. We trust our news outlets and blah, blah, blah. Now everything's a little bit different and maybe not in a bad way. Right. We're being a little bit more critical of it. We're thinking critically.
Cal Al-Dhubaib: And so we're able to spot these issues.
Galen Low: Yeah. And can improve it. And then, you know, tying it all the way back can actually identify and have conversations about. When it's actually maybe like just a human policy that was unfair to begin with. And that's kind of where it traces back to.
I wonder if we could actually dive into that because we're talking about, you know, like trust is like this kind of shared, cross-functional responsibility and I think the teams building these, I mentioned earlier like, okay, yeah, maybe there's a linguist on board, maybe there's, you know, a team of data scientists on board.
The shape of these teams are actually changing. I like the idea of the trust engineering toolkit, but I also, you know, I think there's different dynamics now in a team and different expectations, and I guess maybe what I wanted to ask is just like how can a project team share the responsibility of building trust into their solutions?
Avoid the finger pointing and like throwing folks under a bus because I can imagine it, right? Where it's like, oh, like that McDonald's project. Yeah, it's in the news in a bad way and everyone's gonna like start pointing. Well that's because you know, data science didn't do the thing. Or like the linguist didn't do the thing or wasn't coded right, or blah, blah blah.
Like what are some of the elements of a team dynamic that kind of makes it actually like a shared responsibility where they all feel some kind of ownership and accountability rather than going straight to blame.
Cal Al-Dhubaib: An interesting thing about accountability is if it's everybody's responsibility, nobody owns it, and so you ultimately need an individual who is going to own this aspect.
Something that we've invested in at Further that I'm particularly excited about is credentials for AI governance. So we're members in IAPP, the International Association of Privacy Professionals. They were one of the first large scale associations with credibility in the space. They had certifications for privacy professionals, privacy engineers.
They launched a AI governance professional certification, recognizing that individuals might come from a background in law. They might come back from a background in engineering. They might come from a privacy or security cybersecurity perspective, but they needed to learn the rest. And this was in part what inspired my framework for trust engineering.
But going back to these AI governance professionals, we actually have one assigned to every one of these AI projects so that we can make sure that we're asking the right questions.
Galen Low: That was gonna be my next question. In a lot of cases, especially with digital projects, you know, it's the project manager that kind of ends up holding the bag on something because they are, you know, we're typically in the center of a bunch of, you know, very smart specialists doing their thing.
But that coming together of things, or like the understanding of the requirements or the understanding of the risks often falls to us. But now it's ballooned, right? Like the responsibility, I like the idea of a company having a chief privacy officer. It does have that ownership bit to it. And then what you just said there is like having someone on each project that is the sort of governance owner, regardless of the background.
Like that's what we really want to have so that the buck stops somewhere, even if it's a shared language and like a cross-functional responsibility or conversation around trust, there's somewhere for that to sort of be held accountable.
Cal Al-Dhubaib: A hundred percent. And this is not like, I feel like sometimes when we have these conversations, clients can hear that and then they think, oh my God, that sounds like way too much effort.
And the goal actually is for this to reduce the total amount of effort. The goal for this is to have that. Thoughtful friction where we stop, we say, where is this likely to go wrong? Let's actually look at lookalike projects. Let's look at some potential failure modes, and then determine how likely is it to go wrong?
What is the cost of that, if it does go wrong? And where do we need to invest the most in mitigating these trust risks?
Galen Low: That's what I love about, I never thought I'd say this, but that's what I love about regulated industries is that, you know, they do have this notion of helpful friction and it's okay to pause, you know, versus some of like agency startup world where it's like, go.
If we've paused, we're failing. Versus an organization that's like, let's like be thoughtful about this and let's see what the value is in aggregate. Versus the risk in aggregate, not just delivering the thing and getting it live to our customers, but also beyond that, right? Are we going to be saddled with like lawsuit after lawsuit?
Are we going to be noncompliant? Are we gonna get penalized? Are people just gonna like hate on us? And Reddit, you know, like there's like, it's, there's these, you know? Yeah.
Cal Al-Dhubaib: You don't want that.
Galen Low: I mean, that's can see where trust, right? It's just written down and I think that is the question, right? It's like, okay, well I can't possibly staff a person to own governance on every project.
Probably what will happen more immediately is like, great, you own governance on 70 projects here, we're just gonna spread you thin across, you're gonna take that 40 hours, you're gonna spread it thin. But I'm hoping, I dunno if you're seeing this, but I'm hoping that the appreciation of the value of that individual will rise over time.
Cal Al-Dhubaib: For sure, and I think it's a new job category. I think AI control, AI quality assurance, AI risk management, are the growth job functions as we see more AI adoption anyways, and the good news is it really is a role where you're asking questions, and so the art of the role is asking the right questions so that everyone else can get aligned around that.
Galen Low: Love, that, love that.
I'm just wondering, you know, we're talking about trust and how it's a kind of a requirement. Someone can own it, can it be measured? Can a project team know whether they've achieved a trustworthy experience? Can they measure along the way? Like what does that look like?
Cal Al-Dhubaib: I think that's a great question.
It's really interesting because some of this is like in the service of preventing an incident from happening. And so you can kind of claim, hey, number of incidents that occur is maybe a potential metric. But there's also some really interesting things you can do with user behavior analysis, for example, especially if you're embedding it into an application.
And the simplest form of that is, is the user accomplishing your goal? With our chat, hu example, are they actually getting to the call to action where they're clicking on learning more about a particular paint spec, for example? So. That's one way of actually seeing is your solution actually being effective?
Are individuals completing the interactions and are they coming back to the tool?
Galen Low: I like this too 'cause they're actually things that we're quite familiar with, right? Risk profile, right? Like what is the risk profile of this? What could go wrong? Are we gonna be in that database, you know, that Cal always draws from.
And then the other thing is just like, yeah, goal and success, right? Like the user experience design. Is the user getting value from it? And where is there friction? And might that be. Because of trust, because you know the AI's not explainable or because we haven't told them how to use it or because it's just overwhelming in terms of options and like circling back on that. And I like those both because.
Cal Al-Dhubaib: We're simply not useful because that's also a part of trust.
Galen Low: Right? Yeah, that's fair. I'm not gonna trust my time with this 'cause it's not actually gonna generate the result that I want. You know? I'll order my Big Mac from a human. Thank you very much. I thought maybe we could like round out by looking at the future, the like not so distant future.
I know everything's changing so fast, you know, in other conversations I've heard you in, you've mentioned like a few stories where it's like, not about whether the AI can be trusted, but whether, you know, the humans in the loop can be trusted. You had mentioned this one thing about like a platform where decisions are, were actually tied to an individual's bonus, right?
And so like there was like inherent bias, but like of the most self-serving kind and maybe even like invisible, right? Maybe subconscious, and maybe I'm being too kind.
Cal Al-Dhubaib: Yeah. They were motivated to not trust the recommendations of the AI.
Galen Low: Yeah. Because their bonuses tied to it.
Cal Al-Dhubaib: Yeah. And they didn't trust the AI.
Galen Low: And like, you know, my takeaway from that was like, you know, humans value trust, but sometimes we really suck at it. Like we're like bad at trust, we're bad at being trustworthy. I guess I wasn't sure if I was gonna ask it this way, but maybe I'll just try it out. How long do you think it will take for AI to understand humans' concept of trust?
Better than humans do, and what would the ramifications be if that happens?
Cal Al-Dhubaib: So I really think that the definition of AI is so circular, especially when you start thinking about how certain folks are defining artificial general intelligence. So this idea that it's an AI system that can do anything that a sufficiently trained human could do.
And we're a tool using species. My ability to get work done today and craft a LinkedIn post in part is influenced by my ability to use AI. And so now my human capability is defined with my own subject matter expertise plus my AI tool. So it's this very circular thing. And so how can AI understand trust better than humans?
I don't know, but also it's trust is so subjective. I think it looks different in different contexts and different moral frames of reference. Great example that I introduced in my class is a study that was conducted out of MIT. They were looking at the ethics of driving decisions for self-driving cars.
And so it's a moral machine is the name of the website and you can go to it and it gives you scenarios. Take no intervention or make the car swerve. Either way, you're gonna harm certain individuals. And there's different combinations.
Galen Low: Are trolley problems.
Cal Al-Dhubaib: Yeah, exactly. It could be a grandma crossing the street on one side and then three kids playing on the other crosswalk. Or it could be somebody who's jaywalking versus somebody who's being lawful.
Galen Low: Interesting. Yes.
Cal Al-Dhubaib: So like a whole bunch of different scenarios and they did a really interesting study of this data. Once they started to get a larger sample from different global perspectives. So what they found was there were these different clusterings, there were shared values in Western countries, there were shared values in South American countries, and then there were shared values in Asian countries.
And the moral frameworks were all correct in those areas, but were inherently incompatible. So for example, in the Western countries, they tended to value, on average, saving a larger number of lives in the Asian cultures. If there was one elderly individual that mattered more than any number of individuals on the other side.
Galen Low: Interesting.
Cal Al-Dhubaib: And that's their moral frame of reference that was most common. And again, individual preferences vary, but there's no single universal moral truth. For what trust looks like. I know this very well because I'm originally from Saudi Arabia, my mother's American. I grew up between two cultures that are often incompatible with each other, and so how do you get an AI system to then navigate this when humans aren't even able to navigate and it's so subjective.
Galen Low: It's actually a really interesting place to land, right? Where it's actually not so much about trust and having a single definition of what is trustworthy. The question should have been maybe. When will AI understand human culture better than humans do? And maybe they don't have that far to go. 'cause you know, like there's culture clouds everywhere I look and maybe that is a defining characteristic.
Cal Al-Dhubaib: It is. Well, I think that's the point. Like we have to put in the work to understand that it's never black and white. You're gonna have to make some assumptions no matter what. You have to make assumptions.
Galen Low: What do they say? AI is a great mirror. It's a mirror and a magnifying glass. At the same time, it's forcing us to look at ourselves.
Trust is a part of that. It can be about users just getting to their goals without being frustrated. It can be about doing an acceptable threshold of harm or risk.
Cal Al-Dhubaib: Yeah. You have to define what is correct, what assumptions are you going to assert interactions or not assert or maybe divert back to the user to choose intentionally between one option and the other.
But these are all design choices that can happen as a part of trust engineering.
Galen Low: And that's what I like about the trust engineering as a toolkit and a framework is that we're not gonna arrive at it in a vacuum. We have to have conversations about this.
Cal Al-Dhubaib: Absolutely.
Galen Low: Cal, thanks so much for being on the show. For folks who wanna learn more about your course when it comes out, where can they go? How can they learn more about you?
Cal Al-Dhubaib: Absolutely, so I'm very active on LinkedIn. That's probably my best platform. You can find me there, but I'll be announcing more details on that very soon.
Galen Low: Fantastic. I will definitely include a link to your profile and also grab that link from the database of incidents afterwards as well. Definitely worth a good scan. Great conversation piece. And Cal, great conversation today. Thanks again.
Cal Al-Dhubaib: Yeah, thanks for having me, Galen.
Galen Low: Alright folks, that's it for today's episode of The Digital Project Manager podcast. If you enjoyed this conversation, make sure you subscribe wherever you're listening. And if you want even more tactical insights, case studies and playbooks, head on over to thedigitalprojectmanager.com.
Until next time, thanks for listening.
