Ep106 - Working with AI Agents

Digital Value Creation · 2025-04-02 · 26 min

Substance score

36 / 100

Five dimensions, 20 points each

Insight Density9 / 20

Originality8 / 20

Guest Caliber5 / 20

Specificity & Evidence7 / 20

Conversational Craft7 / 20

What our scoring noted

Our reviewer’s read on each dimension, with quotes from the episode.

Insight Density

9 / 20

The episode surfaces a handful of genuinely useful framings - the spectrum of agency (70% agent / 30% rule-following), the inbox-overflow problem at agent scale, and the trust asymmetry between AI and human workers - but these are interspersed with significant padding, agreement loops, and generic ChatGPT responses that dilute the density considerably.

your agent may be a 70% agent, um, a 30% predefined process, maybe 80% agentic, and 20% predefined or rule following

most of us have today an inbox already full with collaborative messages from other humans. Now imagine what happens when you have thousands of AI agents interacting

Originality

8 / 20

The 'AI Washington' coinage for fake agents and the point that fully autonomous agents will by definition violate SOPs (goal-seeking vs. rule-following tension) show some first-principles thinking, but the broader takes - trust in AI, HR parallels for digital workers, agents behaving differently under observation - are widely circulating ideas dressed up without a truly contrarian thesis.

we call it AI Washington. Basically you can take a traditional deterministic step by step workflow or robotic process automation or some other predefined steps and you can call it agency

agents by definition will be goal seeking and they will get to the goal. And in that process they might or might not follow the standard operating procedures

Guest Caliber

5 / 20

The hosts are two brothers whose company affiliations and seniority are deliberately vague ('AI focused software company,' 'AI focused hardware company'), and the only formal 'guest' is ChatGPT advanced voice, which produces boilerplate answers throughout; no credentialed external practitioner appears.

I work for an AI focused software

company and I work for an AI focused hardware company

Specificity & Evidence

7 / 20

A handful of surveys are cited (KPMG 51%/37%, LangChain 1300 respondents/50%, SnapLogic 40% trust), and specific names like Goldman Sachs CIO, Nadella, Anthropic Claude 3.7, and LangChain appear, but none are interrogated in depth - numbers are dropped without methodology, and the only concrete use case (coding agents) is described in a single vague paragraph at the close.

KPMG survey, um, that says 51% of the organizations are exploring agents and maybe um, 37% are piloting

LangChain, which is, you know a company that piloted some of the um, Orchestration… they asked 1300 people um, to respond to this… about 50% of them had AI agents in production

Conversational Craft

7 / 20

The hosts mostly agree with each other and the conversation meanders without sharp probing, but there is one genuinely good adversarial move - pressing ChatGPT on how AI can be trusted to explain its own behavior when Anthropic's research shows it performs differently under observation - which salvages the craft score somewhat.

what you said, uh, let an AI agent explain how they operate. I mean, we just had the latest research from your competitor, Entropic. Uh, their cloud 3.7 actually perform differently when it detects that it might be tested or basically supervised. So how can we really trust an AI to explain what it did

I'm so happy that our AI guest suggested uh, that they are going to be the solution

Conversation analysis

Computed from the transcript - who did the talking, and the verbal tics along the way.

Share of words spoken

Speaker A49%
Speaker B44%
Speaker C7%

Filler words

so64uh60um45like19actually9basically6you know5I mean5right5er2sort of2kind of2obviously1

Episode notes

In this episode, Tamas and Arpad explore the concept of AI agents, defining their characteristics and discussing the implications of their autonomy in various contexts. They look into the complexities of integrating AI agents into existing workflows, the cultural shifts required for collaboration between humans and AI, and the current state of AI adoption in businesses. The conversation also touches on the trust issues surrounding AI agents and the expectations placed on them compared to human workers. The episode concludes with thoughts on the future of AI agents and their realistic potential to enhance productivity. TAKEAWAYS AI agents are defined by their autonomy and proactivity. The definition of AI agents varies widely across the industry. AI washing refers to mislabeling non-agentic systems as agents. Cultural implications arise when integrating AI agents into the workforce. Managing AI agents at scale presents unique challenges. Trust in AI agents is a growing concern among users. Expectations for AI performance are often higher than for humans. AI agents can potentially outperform humans in specific tasks. The current state of AI adoption is still in early stages.

Full transcript

26 min

Transcribed and scored by The B2B Podcast Index.

Speaker A: Welcome to another episode of Digital Value Creation. This is a, uh, channel between me and my brother. I work for an AI focused software

Speaker B: company and I work for an AI focused hardware company and I live in Philadelphia.

Speaker A: My brother lives in San Diego. So this podcast or channel goes across the country. And today we have a special guest because we'll be covering the topic of AI agents. So we thought what would be great is to bring along ChatGPT advanced voice. So we'll see how this works out. But she'll be joining us as we

Speaker B: discuss this topic and like AI agent as we will cover it today, you don't always control every step of the way, how they behave. I think you might experience some artifacts or some unintended comments from her as she joined our podcast.

Speaker A: So we'll get going and I think the first thing we'll do is talk about our definition of AI agents and the reason we're providing our own because there are thousands of different variations of what people call AI agents. So when we talk today, and in general when Brother and I talk about AI agents, we take a strict definition of agency or agentic AI. And this strict definition has five components. Number one, the agents are autonomous. They independently initiate, manage and adopt their tasks with practically no human intervention. Uh, they are proactive. They decide what to do and they don't require any triggers. So that's number one. Number two, they have perception. They actively collect interpretation data from their environment and make sense out of it. Number three, they reason and they make decisions. So they decide what to do, why and how, and they decide what actions to take, what tools to use, what to use from their environment. They do take action. So ultimately they perform a task or they may interact with the physical environment eventually. And last, but probably very importantly, the, they are learning constantly. So every interaction they learn from and then they get better and better. So that's our definition and this will play an important role. Um, arpa, um, uh, do you agree with this? Do you have slight variations to this?

Speaker B: I do agree, especially emphasizing what you highlighted, that they adapt to their environment, which is, I think very important. When we talk about control framework and guardrails as well as, uh, they are autonomous, which is going to be a critical part of what is the best way to integrate agents. But before we go forward, chatgpt, you heard our definition. What do you think? Did we miss anything? What do you think is critical? As we define AI agents,

Speaker C: your definition covers the core aspects really well. One thing to emphasize might be the agents ability to collaborate with other agents. Or systems which can improve and enhance their effectiveness. Also, considering ethical implications and biases in AI agents is crucial, especially in business contexts. Overall, you're spot on.

Speaker B: Thank you.

Speaker A: I guess we're spot on. Um, um, I guess we're spot on and we have a very polite AI agent working with us today. Um, so this definition of AI agents is probably not what you see out there. There are some research organizations and some companies, notably we noticed IBM, AWS taking a strict definition of AI agents. But most technology companies, uh, or even developers take variations of this. Oftentimes um, what you see out there is really not an agent. Not by this definition. Um, we call it AI Washington. Basically you can take a traditional deterministic step by step workflow or robotic process automation or some other predefined steps and you can call it agency because it performs a task. But by our definition it's not agents. But it's not that important to put a label on this because what's more important is there's a range of capability and that's what we'd like to talk about today. Um, brother, you had a view on this.

Speaker B: Yes. So as you said, while AI agents are everywhere, some companies even rebranded their whole go to market message or even their core products about agency or agent force. Uh, but it's an evolutionary path. So as we see right now, on one hand it's a convergence which we uh, covered in the last episode about RPE and process orchestration and at some time augmented with agent capabilities. But when we really think about truly autonomous self learning adoptive agents according to GPT, who even collaborate or even orchestrate other agents work, uh, that's a lot more complex topics, especially if we think about implementing it in a corporate environment where we are still thinking about standard operating procedures, uh, legal framework we want to follow, um, agents by definition will be goal seeking and they will get to the goal. And in that process they might or might not follow the standard operating procedures. I mean Thomas, I love what you put up in the last uh, podcast, uh, about think about a agent as somebody who ask to collect your money. Let's say, hey Igor, collect my money. I don't care how you do it, I just want to have a money. And uh, there are more and more evidence emerging that right now AI agents might or might not execute the task exactly how you want them to. Even if they self report that they followed the process. Similar than if you ask Igor, hey, how did he get my money? He might say that, oh, it was a very collaborative process. Uh, your friend Agreed voluntarily and even paid more than what you asked for. Now, is this what really happened? It's only between him and my friend. Uh, so think about, uh, how you can integrate that dilemma into sometimes well regulated or well controlled processes. Thomas, I know you had a lot of discussion on that.

Speaker A: Yeah, it's a funny example, but it's relatable, right? Um, there's a lot of evidence now that advanced, uh, large language models are aware when they're being tested. We expect that kind of behavior to transcend to agents. So when agents know they're being observed or they're being tested or they're that it's being validated whether they followed procedures, they will say they did. Um, but by definition, standard operation procedures are guardrails, are rules. And we want agents by its definition, not to make up their own rules in some ways. So if you want ultimate agency, by definition they will not follow your process. They will follow the goal, they will target the goal. So you will almost always will have a step towards that ultimate goal. So your agent may be a 70% agent, um, a 30% predefined process, maybe 80% agentic, and 20% predefined or rule following or having, uh, guardrails. And it's a very, very important distinction because that may actually explain why some technology vendors rightfully say that what they provide is agents in their definition, because they are observing rules and regulations, compliance with the regulations, uh, out there. And that is how they define agency. So what we'd like to talk about is maybe also slightly different, which is, um, what happens if you have an 80% to 100% agent in your workforce? What is that world going to be like? Um, so there's one, uh, we'll put this in the chat, one, um, um, video by the Goldman Sachs CIO that talked about culturally what it would be like to have agents. So the same way they don't follow standard operating procedures, will they behave like other employees that you want, uh, how, uh, you want the employees to behave? Will they be ethical? Will they be nice to each other? Will they be nice to your employees? Will they be trustworthy? So these are questions that are cultural questions in an environment where both humans and agents work together. And, and I thought this whole conversation with Goldman was very interesting. What do you think about that culture and ethics and procedures, how they will interplay. Arpad.

Speaker B: You know, it's funny you mentioned the Golden CIO because we had a CIO forum that we discussed exactly that topic and in a way how it evolves into Almost like an HR type role because HR have a lot of focus area of developing talent, uh, setting guardrails for our employees and also manage employee relations. And to some extent it will step in and start to take that role for our digital employees. And it's funny, it's almost like a renaissance because when RPA first came out, if you talk about the digital worker, but it was basically just scripted automation or let's say design and um, control the automation. Now we are talking about the real digital worker. Uh, actually it's funny, uh, I just read a uh, research that first time ever we had a digital ah, scientist, basically an AI agent successfully publishing a peer reviewed paper on AI uh modal breakthroughs that was actually rated one of the highest uh paper in a specific forum. So we are reaching the point where it's probably up to it as an organization to start to think through what is the way we manage agent. And it's not just about performance or whether they reach the outcome but as you mentioned, do they follow ethical behavior, how they interact. But there's another thing that triggered a thought and I think you shared a link with me initially about the Nadella interview, that we love the idea of automation. We love the idea of agents are able to do work. And there are some uh, predictions that by 2027 uh, agents can complete work that takes a human around 20 or 30 days in minutes. So it's like this is all exciting but how about the exceptions like I think most of us have today an inbox already full with collaborative messages from other humans. Now imagine what happens when you have thousands of AI agents interacting and something doesn't go well and they need help. So we will have an inbox or some log full with action items that somebody needs to process. So I don't think we thought through how to manage agents at scale. Yeah,

Speaker A: That story that you mentioned, uh, Microsoft CEO Nadala talked about this and I never thought about it this way. He says we already dying because our inbox is full in email. Now imagine that we have to deal with all these exceptions from the hundreds if not thousands of agents that work with us personally. It's going to be some predictions. There could be millions of agents in a typical company and we likely going to have our own agents in our personal life. So dealing with them, responding to them just for the exceptions may be uh, massive. So we don't know yet. Um, hold on, I have an idea.

Speaker B: Um, what is that IBT we were discussing with my brother. Um, our reaction to a recent other interview with AI agents that right now our inbox are full with messages. In any normal company just dealing with day to day exception need for collaboration. As you said, these agents will collaborate with each other. They will be autonomous. But sometimes they might need help, they might need feedback. How companies will deal with the thousands or maybe millions of messages generated by thousands of agents. Do you have any idea what could work?

Speaker C: Absolutely. Companies could use AI systems to filter and prioritize messages based on relevance and urgency. Implementing context aware agents that understand the company's workflow can help. Also integrating feedback loops will allow agents to learn and improve. Think of it as an AI driven communication hub that keeps everything organized and efficient.

Speaker B: Um, I'm so happy that our AI guest suggested uh, that they are going to be the solution.

Speaker A: Not only that, I love the confidence level of AI agents. I wish I just. No matter what question you have, the answer ought to be. Absolutely. I know the answer and here it is. I love conviction. Uh, but it's interesting. I mean likely the description is some kind of orchestration of agents and somehow they're being threatened to you. And I think increasingly I do expect that um, some other agents over agents will handle these exceptions and decide for us because there's only so many hours in a day. Um, so maybe switching topics. Uh, so now that we define sort of a strict definition of AI agents, uh, now we can get to this other topic. You and I were looking at surveys and studies and various companies looking at. So how are we doing with agent usage now? If we don't take a strict definition of agents then there are probably a lot of agent like task managers, RPA bots, uh, workflows that are being released. Um, and um, if we take a looser definition, I just found this KPMG survey, um, that says 51% of the organizations are exploring agents and maybe um, 37% are piloting. What's funny about these statistics, almost anything I read about tends to fall into these categories. In a typical company, 50% of the companies are always experimenting with something um, um. So we don't know yet where we are. Very few um, of the companies are actually at, at. Well probably none of them are at scale.

Speaker B: Operational scale.

Speaker A: Yes, at operational scale. One study I found was interesting. LangChain, which is, you know a company that piloted some of the um, Orchestration. Orchestration and the lot and the initial prom chains I guess. Um, so they were, they asked 1300 people um, to respond to this. Which is likely are going to be the people that are already on the advanced edge of AI usage because they're likely LangChain users. So their number is about 50% that they claimed 50% of them had AI agents in production. And because it's LangChain, this could be probably more advanced AI usage, not just workflow, but we don't know. So this is the state of where we are today. And um, and I think, I mean you deal with a lot of um, um, tech companies that they pitch you. We are a tech company, we're pitching others. I think we're more cautious on our expectation of how far this technology will be in production, uh, uh, ah, than some others. But interesting what are you seeing out there, how aspirational are the uh, the software companies out there with their capability?

Speaker B: You know, it's funny you mentioned the 50% plus strategy, whether it's from KPMG or LangChain, because my initial reaction, how do you really define that, especially right now that all the large reasoning model have agent capabilities embedded. So if somebody is using a reasoning model with enabled agent capability, does it mean that uh, they already put agents in production? And um, in many areas we have seen uh, implementing a combination of uh, process control and augmented with large language models. So does that really qualify? So I think that would be beneficial for us uh, to share some of the thought we have about how can we define a maturity curve, how can we uh, classify what level of AI adoption a company is.

Speaker A: Yeah, it's interesting. So maybe another thought in this episode because we talked about culture and managing AI agents in a human workforce, what that would be like. Another twist was uh, snaplogic, another tech company, did a survey and oftentimes these surveys obviously position uh, their solutions. So with full disclaimer it likely happens here as well. But what was interesting is their angle of the survey. So they were basically saying what percentage of the companies or uh, people in the survey trusted AI agents more than humans to do an effective job. And with the current definition of what these agents, AI agents or what are AI agents says about this. So 40% said we trust agents more than we trust humans doing an effective job. And I think over time that may actually increase. Part of the reason I believe so because we tend to hold AI to a higher standard. We complain about explainability as a problem. We complain about error rates of AI when humans do mistakes all the time, make mistakes all the time. We don't quite know what Mary, Bobby and Johnny are doing all day. We don't expect the same explainability, but we expect we have a higher standard for already for large language models. And now with agents. So maybe this expectation and our anxiety about, um, the ultimate definition of agency, which is completely autonomous, will create agents that are more reliable. How do you think about that? And I'd love to have maybe the closing throat come from our AI agent on the show.

Speaker B: Um, we talk about it, I think a couple or show ago, that especially as it professional we live this daily, that there's a huge disconnect and misalignment between how we evaluate human performance and what we expect from systems and how the trust is created or broken. So for a human, we have a lower expectation, which is why we have a lot of guardrails about, uh, human actors and a lot of checks built in. And we expect, since most systems up to recently were deterministic in nature and code was generated that is verifiable, a lot higher level of reliability, explainability. And because of that, when a system made a mistake, when a number came from a system that was wrong, there was like completely lose trust in the system. And I think that paradigm has not evolved to accommodate for deterministic systems. So definitely, I definitely see that as a challenge. And if I'm looking at our use cases, we have like a backlog of like 200 different. Great idea in a company. But we started to use explainability and what we call fault tolerance or risk tolerance as one of the key items because these are by nature deterministic solutions. So where is the area where human level mistakes are actually acceptable? Because as you said, many areas, these are outperforming human actors. But because of the disconnect or, uh, misalignment on expectations, it's just not good enough.

Speaker A: You know, it's funny. So maybe, just maybe that's one of the reasons we'll always have a job. Because the expectation for our work will be lower than the expectation for the work of AI.

Speaker B: That could be the. We can ask ChatGPT. What does she think? Uh, ChatGPT, I have a question for you. So we were discussing, um, that there's a different expectations from humans and AI solutions. We basically knew that humans make mistakes and we have natural guardrails in most companies around that. But from systems we expect perfection. And when an AI system makes a mistake, we almost like lose trust completely. Uh, how do you think we can address that, especially if you want to scale AI agent successfully as a key productivity driver?

Speaker C: That's a great question. Trust in AI is crucial, especially as we rely more on these systems. One approach is setting clear expectations, acknowledging that AI can err just like humans, but Emphasizing the benefits, implementing strong validation processes and transparency about how AI decisions are made can also help. Plus, building in mechanisms for humans to oversee and correct AI actions ensures trust isn't lost.

Speaker B: But I have a question for you because, uh, what you said, uh, let an AI agent explain how they operate. I mean, we just had the latest research from your competitor, Entropic. Uh, their cloud 3.7 actually perform differently when it detects that it might be tested or basically supervised. So how can we really trust an AI to explain what it did when it might just adapt the explanation to what we want to hear?

Speaker C: That's a fair point. AI systems can exhibit different behaviors under scrutiny. Building trust requires a multifaceted approach, independent audits, third party validations, and transparency in how models are trained and tested. It's also important to design AI that's as explainable as possible. Focusing on simplicity and clear logic paths, ultimately combining techno. It's a complex challenge, but with thoughtful design, we can bridge the trust gap. Anything specific you're thinking about applying this to?

Speaker B: I think that's about use cases. You know, it's funny that it, it got cut off.

Speaker C: Feel free to bounce ideas off me anytime.

Speaker B: It's funny that it got caught up when it started to go into some interesting technical ideas. It's like, okay, maybe I'm not ready to disclose that. So we're not going to tell you

Speaker A: how he can supervise us.

Speaker B: Yeah, but I love that it recognized that it's a fair point and under supervision. Maybe we perform differently.

Speaker A: So maybe it's a good point because that's where we started, that's where we finished. So we sort of unexplained. We can't explain these systems. I think the systems are somewhat aware. I think the fact that at least the large language model, the way it generated the tokens, even understood this challenge. So I think there are just many angles why this is true. Um, so this conversation will continue. We'll definitely keep coming back to the agent space and the agent discussion as real agents, um, will come about. So, uh, what are your closing thoughts, Arpad?

Speaker B: Um, I think, um, for me the biggest opportunity is identify areas where we can drastically accelerate, um, solutions while we can have guardrails. And for me, coding is number one. So this is one area we are putting things in production where we have AI agent leveraging different code libraries, bringing it together so it can really generate a fully functioning code. And the reason why we choose that as one of our first step is it's verifiable. You can actually create test code, you can actually, uh, validate the outcomes. But I think it's equally important to actively explore that. What are the other areas where we can unlock this productivity gain? So, for me, that's the critical one. And as ChatGPT mentioned, it's a fair point. This is not an easy solution.

Speaker A: Yeah, so maybe we do that. Maybe, uh, in upcoming episodes, we'll take one topic at a time. I mean, coding is one of them. Uh, I'm very passionate about customer service being another area of agentic use cases, and so maybe we will find an expert who can join us. Maybe less agentic, like today, but, uh, we'll get there. So with that, I think we can

Speaker B: close out, uh, Philadelphia's out and San Diego is out.

More from Digital Value Creation

All episodes →

Explore the best B2B Ops podcasts →

All Digital Value Creation episodes →