A Conversation about Designing Human-AI Collaboration Playbooks

Team Leaders' Audio Digest · 2026-05-26 · 24 min

Substance score

34 / 100

Five dimensions, 20 points each

Insight Density10 / 20

Originality7 / 20

Guest Caliber2 / 20

Specificity & Evidence11 / 20

Conversational Craft4 / 20

What our scoring noted

Our reviewer’s read on each dimension, with quotes from the episode.

Insight Density

10 / 20

The episode contains a handful of genuinely useful research-backed ideas—confidence miscalibration, capability-matched AI casting, the Carnegie Mellon name-tag study—but they are buried under heavy conversational padding, repeated analogies, and a slow-moving summary format. The insight-to-runtime ratio is mediocre for a 24-minute episode.

over the life of a project, a human worker's confidence in the AI starts to aggressively drift away from the AI's actual objective accuracy

Zhang found that injecting this highly capable AI actually pulled down the performance of already strong teams. It literally made them objectively worse at their jobs.

Originality

7 / 20

The episode is almost entirely a secondary summary of other researchers' work (Zhang, Chong, Song Xu and Luo, CMU, Toyota, Xiao) and adds no independent framework, contrarian argument, or first-principles reasoning. The apprentice-vs-power-drill analogy is used at least three times and the closing paradox question is a well-worn AI concern.

Governance and design are the exact same thing in this space. They need to be treated as core design contributors from day one.

They lock the apprentice in a soundproof box, and they only slide a piece of paper under the door when they need a screw driven.

Guest Caliber

2 / 20

There is no guest whatsoever; two hosts narrate a summary of a third party's published playbook. No practitioner, operator, or researcher is present to be interrogated, challenged, or asked for unreported detail.

we are pulling from this really comprehensive playbook by Dr. Jonathan H. Westover. It's called From Tools to Designing Human AI Collaboration Playbooks. A fantastic resource. It really is.

Specificity & Evidence

11 / 20

The episode names real researchers, institutions, and specific techniques (surrogate modeling at Toyota, the CMU drone-design study, Woebot in mental health, Daphne in aerospace) which is above average for the format, but no hard metrics, dollar figures, team sizes, or study sample details are ever given, leaving most claims at illustrative rather than evidential strength.

Toyota uses a heavily narrowed, highly specialized AI, specifically surrogate modeling with neural networks, just to predict the drag coefficients of cars based on 3D renderings

They tracked human design teams working on a assisted drone design, and they found that when the human engineers had explicit name tags for the AI's role, they absolutely thrived.

Conversational Craft

4 / 20

This is a scripted two-host performance masquerading as dialogue; every question is a pre-written setup for the co-host, there is zero pushback on any claim, and affirmations like 'Oh, do tell' and 'That is so counterproductive. Right.' dominate. No practitioner is present to be challenged, and no claim is ever stress-tested.

Oh, do tell.

That is so counterproductive. Right.

Conversation analysis

Computed from the transcript - who did the talking, and the verbal tics along the way.

Filler words

like40right40so30actually19you know12I mean4basically4literally3honestly3um1er1kind of1obviously1

Episode notes

This researchoutlines a transition from viewing artificial intelligence as a mere utility to integrating it as a deliberate teammate within professional innovation. Effective human-AI collaboration requires moving beyond simple procurement toward a structured design approach that clearly defines the machine's role, initiation methods, and cognitive functions. Research indicates that while AI can significantly boost team productivity and creativity, poor implementation can lead to eroded judgment and performance regressions if trust and transparency are not carefully managed. To succeed, organizations must cultivate multidisciplinary development teams and adaptive governance models that prioritize mutual situation awareness and ethical stewardship. Ultimately, the research argue that the value of AI is not found in the technology alone but in the intentional architecture of the partnership between humans and machines. See Privacy Policy at and California Privacy Notice at .

Full transcript

24 min

Transcribed and scored by The B2B Podcast Index.

Um, so think back to, like, five years ago. Oh, the absolute panic era. Right. We were all just terrified, you know, convinced that artificial intelligence was going to completely wipe out our jobs. Everyone was worried about headcount and automation. Exactly. But today, if you look at the data, it is showing us something way stranger and, honestly, a lot more complicated. Yeah. The conversation has definitely shifted. It has, because it turns out that handing a team of, like, brilliant, highly trained engineers and incredibly advanced AI, it can actually make their work significantly worse. Which is wild to think about. It is wild. So welcome to the deep dive. Today we're unpacking this really fascinating phenomenon. We want to know how you, as a human worker and an AI, can produce results together that neither of you could achieve alone. The whole super team concept. Right. And we are pulling from this really comprehensive playbook by Dr. Jonathan H. Westover. It's called From Tools to Designing Human AI Collaboration Playbooks. A fantastic resource. It really is. Yeah. So our mission today is to shortcut your path to mastering this. We're going to figure out how to stop treating AI like just some generic software tool and start treating it as a design teammate. And I'd say that is the single biggest hurdle companies are facing right now. Oh, absolutely. Because the narrative has completely flipped from human versus machine. Machine to human and machine. But the problem is organizations are spending millions, I mean, literally millions, buying the absolute most powerful AI models on the market. Yeah, they just buy the biggest one they can find. Exactly. They plug them into their workflows and then they just stand back, expecting this. This miracle to happen. And it doesn't. It doesn't. The massive, really expensive failures we're seeing all boil down to the fact that just buying a smart model does not guarantee you get a super team. The magic, and honestly, the failures, too, entirely depend on how the collaboration itself is actually designed. I see. If you treat a highly capable AI like a, you know, a simple spreadsheet application, your output is going to suffer tremendously. Okay, let's unpack this, because I feel like we really need to define our terms before we try to build one of these super teams. Definitely. Because obviously not all AI assistants is. Well, it's not playing the same sport. No, not at all. To understand the playing field here, we have to look at the massive divide between augmentation and collaboration. Okay, so augmentation first. Right. So augmentation is unidirectional. The AI basically just extends your physical or cognitive reach. Like a calculator. Exactly. Like a calculator. Extending your ability to do Long division. It's highly directable. It requires you, the human, to initiate the action, steer the process, and then interpret the final result. Right. It doesn't do anything until you tell it to. Precisely. But collaboration is fundamentally different because it is bi directional. Meaning it talks back. Yeah. Essentially the AI is outputting data. Sure. But it's also sensing the context of your work. It's adapting to your specific situation. And sometimes it's actually the one initiating the action. Yeah. It might even suggest a pivot before you realize you need one. And this brings us to this concept of hybrid intelligence. Hybrid intelligence, okay. Yeah. It's basically the system level superpower that emerges when a human and a machine work together deliberately. Okay. So it's like the difference between someone handing you a laser guided, incredibly expensive power drill versus, like, hiring a highly skilled apprentice. That's a great way to look at it. Right. Because the power drill is amazing. It's going to help you build a cabinet way faster than a manual screwdriver ever could. Right. But it's still just a piece of metal and plastic. Like, you have to pick it up, aim it, pull the trigger. It has no agency. Exactly. But collaboration is like having that skilled apprentice in the wood shop with you. They anticipate what you need. They hand you the sander before you even, you know, open your mouth to ask for it. Yeah. And more importantly, they might look at your blueprint and go, hey, if we adjust the joint right here, this cabinet will be structurally sounder. And see, to build on that wood shop analogy, the reason we are seeing such a high failure rate in corporate AI implementations right now is that companies are going out, they're hiring that brilliant apprentice, but then they are treating them exactly like the power drill. Oh, wow. Yeah. They lock the apprentice in a soundproof box, and they only slide a piece of paper under the door when they need a screw driven. That is so counterproductive. Right. They don't set up any structure for the AI to communicate context or adapt to the workflow. And that creates this massive crippling role ambiguity. Meaning the human doesn't know what to do. Exactly. The human workers are sitting there wondering, am I supposed to be pulling the trigger here or am I supposed to be taking advice from this thing? Okay, let's put you, the listener, right into the middle of this. Let's look at the actual real world stakes here. Because the pendulum just swings so wildly depending on how you handle this. It really does. So imagine you're an aerospace engineer and you are designing Earth observation spacecraft missions High stakes stuff. Very high stakes. So the playbook highlights this AI assistant used in this exact field named Daphne. Ah, Daphne, yes. Yeah. And in aerospace, you have to do these things called trade studies, which basically means running massive, incredibly tedious mathematical comparisons, just crunching numbers endlessly. Right. Tweaking the thrust, adjusting the mass, calculating orbital mechanics over and over and over just to see what works. Right. Well, Daphne takes that entire mathematical burden right off the human's plate. What used to be weeks of grueling, brain numbing calculations is compressed into literally hours, which is incredible. It's the absolute dream scenario. It frees you up to step back, look at the big picture, and do the high level architectural thinking you were actually hired to do. And that right there is a textbook example of achieving that hybrid intelligence we talked about. You, you're offloading the heavy analytical lifting to the machine so the human mind can focus entirely on synthesis and, you know, creative problem solving. But then, and this is where it gets scary. You look at the counter evidence and it just completely derails that dream. Oh, yeah, it's not all sunshine. Not at all. There's this study by a researcher named Zhang that looked at introducing AI advice into engineering design teams. Right. And Zhang found that injecting this highly capable AI actually pulled down the performance of already strong teams. It literally made them objectively worse at their jobs. Yeah, which sounds completely backwards. It does. And when they looked at novice engineers, it was even worse. The novices basically just relied on the AI to provide finished, polished CAD solutions. They just let the AI do the homework. Exactly. Because the AI just handed them the final beautiful 3D model. Those novices never learned the underlying structural mechanics of their own jobs. They skipped the struggle. Right. They didn't understand load bearing principles because they never had to struggle through the math. The AI just gave them the answer and their domain expertise just completely eroded. What's fascinating here is the psychological mechanism driving that failure. Oh, do tell. The Chong studies refer to it as confidence miscalibration. Confidence miscalibration. Yeah. So over the life of a project, a human worker's confidence in the AI starts to aggressively drift away from the AI's actual objective accuracy. Oh, so they either trust it too much or too little. Exactly. It goes in one of two directions. Either the humans drastically over trust the outputs, you know, blindly accepting structurally flawed designs because they just assume the computer is a flawless oracle. Right. Or they under trust it. The AI makes one minor error early on and the human just decides it's totally useless, ignoring perfectly good Time saving advice for the rest of the project. It's exactly like using a GPS that like once tried to drive you into a lake. Oh, that's a perfect comparison. Right? You get the confidence miscalibration on the overtrust side where you just blindly drive your car into the water because the glowing screen told you to turn right. Or you get the undertrust side where you throw the GPS out the window and and start trying to read a giant paper map on the highway, even though the GPS was, you know, perfectly fine for the other 99% of the trip. That is exactly what the drift looks like. And the critical finding from the research is that this drift happens entirely because the users do not know what the AI's actual role is supposed to be. Okay, I have to pause on this because it feels a bit paradoxical. How so? Well, wait, are we saying that giving incredibly smart, highly educated engineers a hyper advanced tool actively makes them worse at their jobs? Like, is this just human nature? Are people just getting lazy and falling asleep at the wheel? No, I mean, it's really not about laziness at all. It is a fundamental design failure. Design failure? Yeah. When the AI's role in the workflow is left ambiguous, humans just naturally misjudge how much authority they are supposed to give it. Oh, I see. If the AI is presenting itself like this all knowing oracle, handing down finishing, finished, uneditable solutions, humans just assume it has already considered all the variables because why wouldn't they? It looks perfect. Exactly. If they don't understand that the AI was only designed to be like a rough recommender, they just stop exercising their own critical judgment. The erosion of skill happens because the boundaries of the relationship were just never defined. Okay, so if throwing a super smart AI into an undefined role is actively destroying team performance, how do we fix it? We need a playbook. Right? How do we actually build the super team? We turn to the first two practices of Dr. Westover's playbook. Role clarity and capability matching. Okay. Role clarity? Yeah. Song Xu and Luo proposed this brilliant framework for classifying AI roles to eliminate this exact ambiguity we're talking about, how does it work? You have to define your AI teammate along three very specific dimensions before you ever deploy it. Okay. First, who initiates? Is the human prompting the AI? Or is the AI monitoring the environment and proactively tapping the human on the shoulder? Right, who starts the conversation? Exactly. Second, what is the intelligence scope? Is this a highly specialized agent designed for one hyper specific task? Or is it a general intelligence meant for broad brainstorming like a specialist versus a generalist. You got it. And third, what is the cognitive mode? Is it analysis oriented, just hunting for patterns and massive data sets? Or is it synthesis oriented, generating entirely new concepts? Okay, I grasped the theory there, but what does this actually look like on the ground? Sure, like if I'm an engineer trying to build something complex, let's say a drone, how do I actually assign these roles without it just feeling like, you know, pointless corporate jargon? Let's look at a multi year study done by Carnegie Mellon University. Oh, okay. They tracked human design teams working on a assisted drone design, and they found that when the human engineers had explicit name tags for the AI's role, they absolutely thrived. Name tags, like actual labels? Yeah. For example, the engineers were explicitly told, hey, this specific AI is a component recommender. Its only job is to suggest parts. Oh, I like that. Or this AI is a trade off analyst. Its job is solely to evaluate weight versus battery life. So the expectations are totally set. Exactly. Because the humans knew exactly what the AI was supposed to do. They knew exactly how much trust to give its outputs. That makes so much sense. But in the control groups where the AI was just presented as his general smart helper, the teams completely floundered. They asked the recommender for complex structural analysis, got bad answers, and failed. But it goes beyond just naming the role. Right, because you also have to explicitly match the AI's underlying capability to the task itself. Yes, and the Toyota Research Institute provides just an absolute masterclass in this. What do they do? Toyota uses a heavily narrowed, highly specialized AI, specifically surrogate modeling with neural networks, just to predict the drag coefficients of cars based on 3D renderings. Okay, let's break down that tech for a second, because surrogate modeling sounds incredibly dense. It does. Why does Toyota use that specific method instead of, you know, chatgpt or something? Because they intentionally stripped away the AI's general brain. Oh, really? Yeah. Instead of using an AI that has been trained on the entire Internet, that knows human history and conversational language, this neural network is trained exclusively on physics data. Wow. It doesn't know what a car is. It doesn't know what a road is. It just looks at the depth rendering of a shape and predicts wind resistance. That is so narrow. Very narrow. It is an incredibly specialized technical analyst. It is not a general ideation partner. They aren't asking it for marketing slogans. Here's where it gets really interesting. Interesting to me. Yeah. If you think about building a team like casting a movie, you wouldn't hire A dramatic Oscar winning lead actor to do your strength driving. Right. I mean, they might be incredibly talented, they might be the smartest person on the set, but they are absolutely going to crash the car completely. So Toyota didn't over buy general world knowledge AI for their drag predictions because adding unnecessary capability just introduces entirely new ways for the system to hallucinate or fail. Like having an AI that knows the entire history of the Roman Empire, trying to calculate wind resistance is actually a liability. You cast the exact right actor for the specific role to minimize your surface area for errors. That is exactly the mindset shift required. You are casting a teammate. But casting the right AI is really only half the battle. Of course, once you have the perfect actor for the role, that AI needs to know how to interact with with its human co stars without, you know, making them want to pull their hair out, which is a perfect pivot. Because if the interaction is clunky, it just doesn't matter how smart the AI is, the human is going to abandon it totally. How do we prevent this from feeling like you're just submitting a rigid support ticket to a robot? You have to explicitly design for interaction. And the Playbook calls these interactive attributes. Interactive attributes. Okay. For instance, if the workflow requires the human to initiate the process, you must prioritize an attribute called directability. Meaning you can steer it. Yes. The AI's output must be easily guided, tweaked and refined by the human. Consider industrial designers using image generators like stable diffusion to conceptualize products. Yeah, those are huge right now. Huge. But for those tools to actually be useful in a professional workflow, the designers need handles. Handles? Yeah, they need to be able to iterate quickly using sliders, masking specific areas or, or, you know, tweaking text prompts slightly. And if those handles aren't there, then it ceases to be collaboration entirely. If the AI simply spits out a finished, beautiful, but totally uneditable image, the human designer is just backed into a corner. It's like a vending machine. It becomes exactly like a vending machine. You put in a text prompt and you get a snack. If you don't like the snack, you can't ask the machine to just change the flavor a little bit. Right? You either eat it or throw it away. Exactly. You either blindly accept the output which erodes your own skill, or completely reject it, throw it in the trash, and start over from scratch. Vending machines aren't teammates. But there is another layer to this interaction, and I think it might be the most delicate one. We deal with trust. Trust. If you are going to treat this software as a teammate, you have to actually trust it. And in human AI collaboration, trust is a designed property, meaning you can't just hope for it. Right? It is not a vibe, it is not a feeling that develops organically over coffee in the break room. It has to be engineered into the interface. And the most dangerous trust enabler we try to engineer is empathy. Empathy from a machine. I mean, it almost sounds like an oxymoron. It is a total minefield. How so? The Playbook cites a researcher named Xiao who looked really closely at AI empathy in mental health chatbots systems like Woebot. Okay? The overarching assumption in the tech world was that a warm, empathetic sounding AI, one that says, oh, I am so sorry you are feeling sad today, would automatically build trust with a patient satisfaction seeking help. Because that's what a human therapist would do, right? But Xiao's research showed the exact opposite can happen. AI empathy can severely erode trust if it fails to generalize across different cultural or demographic groups. Walk me through the mechanics of that failure. Like, why does saying something nice make the human trust the machine less? Because just programming and warm phrasing isn't empathy. If a user expresses distress and the AI responds with this generic polite warmth, but clearly fails to grasp the specific cultural nuance or the gravity of the user's situation feels fake. It feels incredibly hollow. It feels patronizing. The human instantly realizes, oh, you don't actually understand what I'm going through. You're just executing a script and boom, trust is gone. At that moment, trust drops below zero. I want you, the listener, to really think about this for a second. Put yourself in that exact scenario. Can an AI ever truly earn your trust? If you, the human, know objectively that it doesn't actually feel anything, it's a tough pill to swallow, right? Like, if you know deep down in your bones that its empathy is just a complex series of coded text predictions, does that fundamentally limit how much you can rely on it as a partner? This raises an important question, and the research actually addresses this directly. The answer is quite relieving, honestly. Okay, Trust in an AI isn't about believing the machine has a soul, okay? Trust is built on two very tangible pillars. Interpretability and reliability. Interpretability, right. Trust requires transparency. The AI must explicitly tell the user its limits. It needs to show its work and say, hey, here's the data I was trained on. Here is my mathematical confidence level in this exact answer. And here is exactly where I am likely to Make a mistake. So it's just radically honest? Exactly. When you pair that level of transparency, transparency with a reliable track record, you don't need the machine to have a soul. You have a trustworthy teammate, simply because you understand its mechanics. Okay, so let's say an organization has done everything right. The perfect setup. Yes. We've cast the perfect AI for the job. We've defined its role as a recommender. We've designed it with handles, so it's highly directable. And we've built in deep transparency to foster that trust. Sounds great. The system is flawless on day one, but what happens on day 300? Real world conditions change. The market shifts, the data the AI relies on gets totally outdated. How do we keep this super team from decaying into a huge liability over time? This brings us to the final piece of the adaptive governance and building long term capability. Adaptive governance. Because deployments drift, it is completely inevitable. The AI that was predicting market trends perfectly on day one will start to degrade as new edge cases accumulate in the real world. Makes sense. So organizations need what is called a roles and rules registry. What does that actually look like in practice? Like a spreadsheet. Think of it as a living breathing constitution for your AI agents. It explicitly tracks what each AI is currently authorized to do, what databases it is allowed to read, and crucially, who the human in the loop is when a decision is made. Okay, so total accountability. Right. You also need explicit trust repair playbooks. Because it's going to mess up eventually. Because the AI is eventually going to hallucinate, it is going to make a mistake. And when it gives a team a bad piece of advice, how quickly and transparently the organization diagnoses and explains that error to the human workers, that will completely dictate whether those workers ever use the system again. Which means this isn't just an engineering problem to solve. Not at all. Like you can't just throw this to the IT department and expect to get a super team out of it. The literature absolutely screams that software engineers cannot build these collaborative systems alone. They need help. If you want sustainable human AI collaboration, you have to build multidisciplinary teams. You need psychologists in the room to understand the cognitive load on the human workers. Right. You need anthropologists to understand the cultural interactions and prevent those empathy failures we talked about earlier. You need human centered designers co creating the interface from the ground up to ensure it is socio technically sound. And those multidisciplinary teams have to build in continuous learning loops for both sides of the equation because it's not just the machine learning. Yes. The AI needs to be continuously updating its models based on the times a human expert overrides its advice. Right. But the human workers also need continuous, explicit training on where the AI's capability boundaries are moving. If the AI gets a massive upgrade, the human needs to know exactly how its role just changed. It's mutual adaptation. Hybrid intelligence relies entirely on both parties evolving together in real time. So what does this all mean? Synthesizing all of this for you, the listener. If you are leading a team or just trying to navigate this in your own career, you have to fundamentally shift how you view the so called red tape. Absolutely. Treating AI governance, you know, your legal team, your ethics board, your compliance officers, treating them like a roadblock that you just have to hurdle over before launch is a massive mistake. It's a recipe for disaster. It is. Governance and design are the exact same thing in this space. They need to be treated as core design contributors from day one. Right. Because they are the ones helping you build the transparency and the boundaries that actually make the super team possible. Designing the future of work is not a procurement problem. It is not about issuing a massive purchase order for a fancy new large language model and just hoping for the best. Right. Crossing your fingers. It is a pure team design problem. If you skip the heavy lifting of defining the roles, engineering the interactions and setting up the governance, you aren't building a super team. You're just buying software. You are just collecting expensive software models without ever achieving the collaboration you actually paid for. It really is the difference between buying a pile of power drills and actually taking the time to hire and integrate a skilled apprentice into your wood shop. Exactly. And that leaves us with one final kind of provocative thought to mull over today. We've spent this entire deep dive talking about how much humans need to explicitly guide the AI. Right. Setting the boundaries. Yeah. We need to understand its roles, monitor its outputs and maintain our own critical judgment so our skills don't erode like those novice designers. Very important. But as these hybrid intelligence systems become more seamless over the next decade, which they will. They definitely will. And as the AI becomes an incredibly intuitive, proactive teammate that just anticipates your needs perfectly, what happens to our own human intuition? Oh, that's a big question. Right. Think back to the panic from five years ago. Yeah. We were so worried about a robot taking our physical job. Our literal tasks. Exactly. But the real question moving forward might actually be this. If an AI teammate perfectly anticipates your every creative need and hands you the perfect idea before you even ask. Do you become a better, faster innovator? Or do you slowly, over time, lose the ability to generate that initial spark yourself? It is the ultimate paradox of the perfect apprentice. Thank you for joining us on this deep dive.

All Team Leaders' Audio Digest episodes →