Greg Whalen (CTO, Prove AI): Why Most AI Projects Fail Before Production

Liftoff with Keith · 2026-06-09 · 43 min

Substance score

41 / 100

Five dimensions, 20 points each

Insight Density10 / 20

Originality8 / 20

Guest Caliber11 / 20

Specificity & Evidence5 / 20

Conversational Craft7 / 20

Greg Whalen, CTO of Prove AI, discusses why most AI projects fail to reach production, emphasizing the gap between compelling demos and production-ready systems. He explains that founders underestimate the complexity of building enterprise-ready AI, confuse capabilities with product-market fit, and lack proper observability and telemetry practices throughout development.

Key takeaways

Founders conflate technical capabilities with actual product-market fit and underestimate the operational complexity required for production AI systems.
Generative AI projects require fundamentally different development approaches than traditional software - teams must start with blank slates, avoid over-attachment to prototypes, and iterate rapidly based on changing customer sentiment.
Most AI projects fail because teams move too fast without establishing telemetry and observability infrastructure early, making it impossible to backtrack or troubleshoot when problems arise.
Customer needs and sentiment in AI move in monthly cycles rather than annual cycles, requiring faster feedback loops and decision-making structures that most organizations are unprepared for.
Traditional software development practices like pushing features to users quickly fail with AI because AI systems can appear done at 30% completion, leading teams to confidently march toward dead ends.

Guests

Greg Whalen

Topics in this episode

Multi-agent systems Prove AI Generative AI production systems Foundation models AI observability and telemetry Enterprise AI governance AI system debugging Non-determinism in AI systems Product-market fit in AI AWS

What our scoring noted

Our reviewer’s read on each dimension, with quotes from the episode.

Insight Density

10 / 20

The episode contains a handful of genuinely useful operational observations - particularly around AI systems looking complete when they are not, the shortened customer-sentiment cycle, and team-size advice - but these are diluted by heavy conversational filler, repetition of the same points, and vague meta-commentary that fills large stretches of airtime.

AI is very different than traditional software. Right. You get a result when you're 30% done.

Your customer needs and sentiment might change in one month, in four weeks, literally.

Originality

8 / 20

A few takes push against conventional wisdom - critiquing token-maxing as a lazy crutch and explicitly retiring the Amazon two-pizza-team model for AI prototyping - but most of the episode recycles familiar startup advice (pivot fast, talk to customers, don't over-engineer) with AI-flavored language rather than genuine first-principles reasoning.

it's sort of a crutch to, you know, it's, it's sort of a, it's sort of a lazy way of doing this

instead of spinning up like, you know, the Amazon 2 pizza team. Right. So. Right, don't do that anymore.

Guest Caliber

11 / 20

Greg Whalen is a genuine practitioner - AWS background, current CTO of an AI infrastructure startup - who speaks from real operational pain rather than thought-leadership talking points; however, his seniority tier and the scale of his work are not exceptional, and Prove AI is an early-stage company without demonstrated enterprise traction discussed in the episode.

From aws, a little company you may have heard of, um, to Zendit, maybe a smaller company...to now the CTO approve AI

the type of observability that you're looking at doesn't map very well to monitoring host loads, to error rates, to latency

Specificity & Evidence

5 / 20

Almost no concrete data, named customer examples, dollar figures, or measurable outcomes appear anywhere in the transcript; assertions like 'customer sentiment changes in four weeks' and 'one person should prototype in a few weeks' are stated as fact without supporting evidence, and even the company's own product capabilities are described only in abstract terms.

I wake up, I want to be building new stuff, but I can't because there's always some vague nondescript outcome that's not going well. And I don't even know, like, I don't even know if it's a problem yet.

I think that the, uh, you know, what I can, what I can guess is that, you know, I'll be in the camp who, who think that models will become, you know, more of a commodity

Conversational Craft

7 / 20

The host sets up topics adequately but repeatedly leads witnesses, reframes rather than interrogates, and pivots away from interesting threads before they are fully developed; there is no meaningful pushback, no request for evidence behind assertions, and the lightning round substitutes novelty for depth.

Are you, are you seeing that founders are underestimating just how hard production systems really are?

You're kind of, you're kind of describing this new era of enterprise readiness. It feels like, yeah, absolutely.

Conversation analysis

Computed from the transcript - who did the talking, and the verbal tics along the way.

Share of words spoken

Speaker B86%
Speaker A14%

Filler words

you know212right196um130so92like71uh59I mean18kind of12sort of10actually9anyway3er2literally2obviously1

Episode notes

In this episode, Keith sits down with an AI infrastructure leader to unpack what is really happening behind the current AI boom. While most conversations focus on models and hype, this discussion goes deeper into the engineering realities of building reliable AI systems at scale. The conversation explores why observability, telemetry, debugging, and system architecture are becoming critical as enterprises move from experimentation to production AI. They also discuss why terms like “agentic AI” are often misunderstood, why many AI projects end up being rebuilt from scratch, and why the future competitive advantage may come from the tooling built around models rather than the models themselves. Topics covered include: Why most AI systems fail due to poor observability The real engineering challenges behind enterprise AI Why “agentic AI” may be the industry’s most overused term The shift toward commoditized foundation models Why telemetry and debugging will define reliable AI systems The underrated founder skill of ignoring distractions What the AI stack could look like by 2028 This episode is a practical conversation about the difference between AI demos and real AI infrastructure.

Full transcript

43 min

Transcribed and scored by The B2B Podcast Index.

Speaker A: Foreign. The sponsor of Liftoff with Keith is the one and Only Compass Strategic Advisors.com, an experienced partner to help you navigate everything from cap tables to stock option and compensation plans, and all types of backroom and marketing services. There is no better friend to the startup CEO than Compass. Check them out at Compass Strategic Advisors. Okay, let's get started. Liftoff. And that's what we're about to do. I mean, look, a thousand of, uh, startups are building AI products. Very few are building AI systems that enterprise actually trust. There's a massive difference between a compelling demo and a production ready company. Good thing for you guys today. Listening. My guest today, Greg Whalen has spent decades operating at the intersection of AI, infrastructure, payments, enterprise systems, large scale engineering and more. From aws, a little company you may have heard of, um, to Zendit, maybe a smaller company. I don't know, about the same. Maybe, um, to now the CTO approve AI. Greg's seen what happens when emerging technology collides with real world complexity. So today we're gonna talk about what founders are getting right, what they're dangerously underestimating, and why the future of AI may depend less on intelligence and more on proof. This is Liftoff, and I want to welcome Greg Whelan to the show. Greg, what's shaking in Boston today?

Speaker B: Yeah, yeah. Uh, thank you so much for the intro. Great to be here. Nice warm day here in the Boston area. Um, look forward to the discussion.

Speaker A: Yeah, man, when it gets warm and you're walking around the Charles river and waving to all those, uh, folks at Harvard Square, I mean, that's a tough, that's a tough visual to beat. But, um, yes, Greg, let's get into it a little bit. I'm so anxious to talk to you. I really appreciate you taking the time. You really have worked across, uh, all areas, what patterns repeat across every major technology wave that you've been part of.

Speaker B: Yeah, yeah. So, um, you know, I think that, um, some of the patterns here that, you know, that I like to focus on, um, I'm particularly interested in solving and helping with, um, at least in AI, is, you know, the, the fact that, um, AI making AI portable, provable and easy to debug. Right. Easy to deploy and run at scale has been something that's, uh, you know, been keeping me up and has been a constant shoulder pain, you know, for the last several decades. Right. So one of the reasons I joined up with Pruvai was just, you know, as, as AI, um, explodes into more flavors than there were, you know, before. As it becomes even more critical right in for, for any business and any technologist. Um, you know, I was sort of getting the hope that a lot of these problems around, you know, how do we share and how do we collaborate on AI, how do we debug and troubleshoot AI, right, in an efficient way? So obviously it's not like we, we can't do that, but, you know, doing it at scale, doing it efficiently, doing it without, um, you know, a bunch of unexpected outcomes, I think is, is something that's always been ripe for. For solving. Uh, yeah, it's just, you know, as, as we look into some of the problems out there, I mean, these are things that I think are repeating themselves, right? There's a new technology wave out there. Um, people are suddenly becoming interested, immensely interested in one particular type of AI, Generative AI, uh, agentic AI. Um, you know, and I just look at this, and every time this type, this type of thing happens, right? The, the move from the need to have a specialist to, to debug, to look into all of those things and treating that as, okay, typically goes away, right? It typically becomes something that, um, now everybody has to do and everybody has to do at scale, even if they don't necessarily want to put in the time, right. Or the thought to, to really figure out how to do this.

Speaker A: Right?

Speaker B: And that's what really excites me here because. Right, that's, that's when things are ripe for, you know, the tool sets to suddenly start exploding for, for, you know, for, for really the infrastructure and the tooling to become mainstream enough for everybody to do this.

Speaker A: Are you, are you seeing that founders are underestimating just how hard production systems really are?

Speaker B: Absolutely. Yes, absolutely. So, yes, they're absolutely underestimating, um, the amount of work that goes in, but they're also underestimating the differences in approach that are required. Um, you know, that you can't run it like a typical technology product. You can't stay hands off on it. I mean, there's a lot of pitfalls, um, to generative AI that people are, you know, stepping right into. Um, you know, so, so there are some similarities to passwords, but there's also some new, you know, some new ones. Right? So, you know, the fact that generative AI is, is relatively easy to understand from, from an external perspective. Like, so if back to like, you know, the, you know, cicd wave, right? And that whole thing, that's less flashy, right? It's less, less something that somebody says, hey, cool, I want to get in on that um, generative AI, like everybody, I think will could. That resonates with them. Right. Um, you know, and it's something that they want to be a part of. Right. So the more people coming to the table, more people, you know, wanting to, to play around with it, which is great. Right. But you know, on, on the flip side, right. It also means that the unprecedented number of people talking about it, experimenting with it, um, you know, causes a whole host of problems that, that are, that do not look similar to some of the big technology waves that we've had in the past.

Speaker A: Then what separates the interesting AI projects from the real businesses?

Speaker B: You mean separates the, those, those companies

Speaker A: that you're working with that are addressing the problem versus more standard businesses?

Speaker B: Oh, yes, absolutely. So I think that the ones that are doing it, right are the ones who are really going deeply, right, and really just clearing the table and saying, let's think about this from first principles, like, let's build up from nothing. Um, let's figure out what we have to get great at. So I think that those are the two sort of fundamental things that I advocate internally is one is start with a blank slate. Right. Don't. Yes, there's baggage. Everybody has baggage. Put it aside. Right. There's no, should not be any dependencies. There's not ways you have to do. There's not governance things to worry about. Just clear the table and just say, all right, where do you want to get, where do you want to get with your application? Um, and now let's try to get there. Uh, let's put aside all of these other concerns that we don't have answers to. Um, focus on the build, focus on getting something out there. Um, and you know, feel free to just rethink everything. Ah, rethink how you're going to deliver this. Because really it shouldn't look any, anything similar like you've done in the past. So. Right. That's one sort of caveat. Uh, that's one characteristic of successful companies out there. Right. The other one, though, is to get super specific about what you do need to be good at. Um, so the ones that are really specific about the problem they're solving and the technology, the domain they need to get familiar with in generative AI are the ones that tend to do better. Um, you know, it's, it's like, yeah, this, this matches up with a lot of my experience as well in highly complicated areas. Um, you know, being a generalist and saying, yeah, I kind of understand the AI trends. Cool. I'm going to lead this is not the best approach, right? It should be. Yeah. You know, ignore 90% of what's going on out there if it's not useful to you, focus on getting great at one or two things and do that. Right. That's what you should be going for.

Speaker A: Do you think, um, technical founders have it in a way where they kind of get confused between capabilities and product market fit?

Speaker B: Yeah, absolutely. So they'll confuse a bunch of things here. Right. Um, and the root cause of it is really not thinking through like what specific problem do I have to solve. Um, put everything aside, you know, who is my customer, who is my user, what do they want? Right. And that's one thing that prove AI has, uh, we found difficult, difficult as well is that talking to potential customers. Right. So if you are trying to get to product market fit, um, you know, you shouldn't make the same assumptions about your customers or your potential users that you would, would have maybe five years ago. Right, five years ago. I think you'd expect users to have a pretty linear progression in their own maturity, in their own needs, um, relatively stable population, relatively sensible. Um, what we're seeing now and you know, we see people, you know, with demands for AI, right. Start something, get into it for two months and then throw it all away. Right. Start over. Um, and that's, that's largely a function because there's so many external dependencies that are changing quickly, eg, foundation models, uh, moving in step. And you know what it means for product market fit though is that the age old way of doing things where you talk to customers, you do it for a couple of weeks, maybe a month, you get feedback, you think critically about what the shoulder pain or the problem is, you figure out what the solution is and then you start to build. It doesn't work very well anymore in this environment. You're going to find, um, a month into the process that your customer sentiment has changed entirely. Right. And it's not, hey, there's a little tweak. It's, oh no, we gave up on that and we did something else. Uh, this is, it's very hard to work.

Speaker A: Excuse me, you're describing something that I've heard described as the demo culture versus production reality. I mean most companies are still in that experimental mode, but this is also a time where you can make some money. So you said Gen AI systems never make it from pilot to production. Why is that?

Speaker B: Yeah, so a lot of reasons. Um, you know, there's a whole host of reasons out there, but I think if I, if we Root cause it and do like a five wise exercise. I, I think what you'll come down to, right, is, is figuring out that the, you know, ultimately, right, it's, it's that people have failed to evolve and to pivot quickly enough, right, in response to things. So they let it go too wrong, um, too long and they let it go down a path that was, was, was pretty clearly not the best path, but was the way to get it to release. Um, and it turned out though, and you say, well, it's probably going to be good enough. We just need another month. But the problem is you find out in another month that now, um, it's not good enough anymore. It's, it's not even releasable, right? It's not something we feel comfortable releasing with. So, so it's. So the root cause here mostly you just go back to that. You know, people are still taking too long to develop their prototypes and then they're still too attached to them. Right. And this is a, this is I think a, a symptom of, of how people have typically approached software, right? It's like, um, you'll, you know, identify the pain. Yes, it's going to take a few iterations. No, we're not afraid to fail. You know, we're okay to fail for some things here. But you know, the, the iteration cycle I think is moving really quickly and customer sentiment is moving much more quickly so that you have to move some of that decision making down to individuals. You have to, you know, split up teams so teams are smaller. You have to empower people. You have to put in different types of guardrails and mechanisms, right. To encourage people to throw away stuff, right? To say, hey, last week this was fine. Two weeks later we just got some feedback. You know, we're, we're hurtling down towards a dead end and we can all see it. I just, you know, what's happening is people don't speak up. They're like, well, we'll get the pilot out the door. And, and the, the silent thing in the room, the elephant in the room is yeah, we're, we're hurtling towards a dead end. Right? This is, yes, it's going to work. It's a dead end. We're going to have to throw away three quarters of it. But uh, you know, we're just collectively a little afraid to bring this up. And you know, we'd like to at least see it get to, into some user hands because we figure some feedback is better than, than none. And that's, that's what's fundamentally not true anymore. Right. You should have gotten the feedback earlier, um, you know, in order to avoid going down that dead end, because it's, it just, um, too many people get bought into it. They think, oh, well, we're 90% done. Surely a few course corrections at the end is fine. And what nobody's expecting is. Yeah, but really it's, it wasn't 90% done. It was, it was 30% done. It just looked done. Because AI. AI is very different than traditional software. Right. You get a result when you're 30% done.

Speaker A: You're kind of, you're kind of describing this new era of enterprise readiness. It feels like, yeah, absolutely.

Speaker B: Yeah, yeah. It's. Yeah. Enterprise readiness and how you govern AI and how you consume prototypes and expectations around what you're looking at. All need to change. Right? It. None of it, none of it's the same anymore. And it's really jarring for, for, you know, enterprises who, who have gotten good at developing software and they understand the software development life cycle, they understand startups, uh, as so many of these, so many nuances change. And I think a lot of people look at it and say, I don't understand why, why this is different. Uh, and they start to get annoyed.

Speaker A: Yeah. You know, a lot of people would say the industry is moving too fast. Um, it lacks some sort of operational maturity and experience. How do you land on that? And what are some of the things founders probably don't understand or, um, maybe some hidden complexity in the whole mix that they should consider?

Speaker B: Yeah, I mean, I don't think that it's necessarily moving too fast. Right. I think everybody needs to. I mean, it's going to move as fast as it can move and it should. No reason to pump on the brakes here. Uh, you know, that's never worked anyway historically. Right. If there's, if there's a good, groundbreaking technology out there, you know, pumping the brakes.

Speaker A: Well, yeah, and I'm not really arguing with you except to say if you're, if you're rushing to. Putting extra pressure on yourself to speed, sometimes you do take those shortcuts and make some mistakes.

Speaker B: Absolutely. Right. And that's that. So that is a mistake. And I think that. Right. Rushing, what people are rushing through is the ability to backtrack, the ability to troubleshoot, the ability to observe.

Speaker A: Right.

Speaker B: Because they are under pressure to deliver. Right. To get to the end state. Whereas there probably is. No, no, the end state is probably not known for most generative AI projects right now. And that's Fundamentally, the dissonance. Right. It's like, you know, we're pretty sure we've got a good idea here. We think we know the end state, the end states in six months from now, but guess what? In six months from now, everything's going to change.

Speaker A: Is that really different though, than what was, uh, the old software industry, where it seems like there's always a rush to hit a date and a production schedule and a rollout and a launch, Right?

Speaker B: Yeah, yeah. It's just, it's just customer sentiment, I think, was customer sentiment and their needs was more stable. So, like, if there was pressure to deliver, you're right, that was always there. Hey, we got to get something into customers hands in six months, right? That's, that's always been something out there. I get it into people's hands quickly. Absolutely. Um, you know, what was assumed though, and what was, what was always there and what's different now is that, that that customer's need didn't materially change between now and six months from now. Right. How they ran their business did not materially change. Um, their, their dependencies didn't materially change. Right. And that was one of the reasons why you wanted to get it down to six months or a year or whatever, because if you let it go for longer than that, it would change. Um, and then you, you know, then you really would miss the boat. The, the cycles within anybody consuming generative AI are much shorter. Right. They can be a month. Right. Your customer needs and sentiment might change in one month, in four weeks, literally.

Speaker A: Yeah, that's what's blowing me away in this whole AI era. Yeah. So, Greg, we talked a little bit earlier about the telemetry problem. You can't, uh, improve what you can't see. Let's talk about that for a second. Um, we're doing it, we're doing telemetry. Wrong. Explain that.

Speaker B: Yeah, yeah, sure. So when I, when I say wrong, I, I mostly mean that it's, it's too much of an afterthought. Right. So we go back to the need to backtrack, the need to troubleshoot, the need to remediate. Right. Avoiding, you know, knowing full well you're going to go down a few dead ends. Right. Your job, our job is to backtrack as quickly as possible and get out of it. Um, so I, you know, I think, I think of it as a backtrack type of exercise. And what people will not do is they won't lay down that breadcrumb trail, right. They'll just say Look, I just gotta charge forward and I'm just gonna bash through a bunch of doors and I'll get to something and then we'll iterate from there. You know what they're not doing, though, is they're not, you know, thinking from the perspective that, like, no, no, there's going to be some doors you can't actually get through. You're going to have to go backwards. You're going to have to understand what your AI system is really doing under the hood. What pieces of it work, what pieces of it don't work. Because if you don't do those things, you're going to end up starting over, right? You're going to end up saying, you know what, let's switch M. Let's, let's switch foundation models. Let's, let's just switch how we do this. Let's, let's move from single agent to multi agent. These are the things that burn up a lot of time and, um, you know, wreck progress, right? And fundamentally, it's that people are pushing really hard to get to the end, right? And saying, yes, it doesn't have to be perfect. I just need to get there. Which was a good approach for traditional software. Like, I just need to get the idea into people's hands as quickly as possible. Um, you know, what they're seeing now though, is that when you're wrong, you can't go back to like an intermediate state because you skipped all of the telemetry, observability, debugging and et cetera, tooling. That was part of the thing. So you end up throwing away your whole AI pipeline. You end up saying, all right, well, we made a few, you know, design choices. If we had to do this differently, what would we do? And the answer comes back to, well, we do it over. Like, let's start over. And that, that's, that's what's wrecking people. Because not only does that waste time, you know, the developers time, the engineers time, but it also erodes, um, you know, erodes confidence from the people that you're working with, whether that be customers or internal stakeholders, right? Because somebody hears that and says what? Uh, but it looks like it's working. What do you mean you're starting over? Like what? You know, this is not what people are used to, right? They're used to, all right, it needs a new interface. How hard can that possibly be? What is that, another month? Right? I know. Instead the answers are coming back now. We just, we built too fast and anthropic and OpenAI have done a few things that are different and now we're doing, we want to do it as a multi agent system and not a, you know, start over. Right. It's just unusual to hear that from people who are coming from software which, which you know, which ultimately both wastes time as well as erodes confidence. Right. And those types of things are very, very detrimental to AI projects.

Speaker A: Do you, I mean we talked a little bit about telemetry, so maybe we should have you explain it in plain English and why startups, you know, particularly the startup CEO should care about it before they have real uh, enterprise customers.

Speaker B: Mhm. Yeah, absolutely. So like, like anything in observability, telemetry, debugging, replay, these are things that engineers do not typically do well by themselves unless they're told to do it. Um, that has always been the case. Right. So. Right. And that's why as a CTO or an engineering leader, you'd put in metrics, you'd put in guardrails, you'd put in safety. Right. To make sure that your operational excellence is, is up to par. Right. Because by default it won't be right. By default people cut corners. By default people don't do that. You know, they don't really do their, you know, their P99 latency graphs. Right. They don't keep them inspected, they don't, you know, have tickets autocut. So like the behavior, behavior is fundamentally the, you know, the same. Yeah. It's just that um, there's not a good. Because the, the type of observability that you're looking at doesn't map very well to monitoring host loads, to error rates, to latency, um, or to availability. You know, when, when I say okay team, please go ahead and make sure this system is observable and it's, and it's ready naturally what, what an engineer is going to do is to say all right, well I guess that means I just need host latency and a graph and I guess we'll be okay. And I'll uh, monitor some error rates. Right. Those things work fine for traditional software when you have unit tests and all this type of stuff. But there's a whole bunch of non determinations, non determinism within, you know, multi agent systems that, that exponentially explodes and gets really nasty. Um, you know, as, as problems and unless you are explicit with the people building this stuff that hey, this stuff is bite you, they won't do it. Right. It's just, it means that your, your old runbooks of how to look after operational excellence don't work anymore. So that's what a CTO needs to do.

Speaker A: Now you, um, along those lines, said that if you can't prove what AI is doing, you can't, you're not controlling it. What do you mean by that? And why is governance so, uh, so omnipresent in these discussions?

Speaker B: Yeah, absolutely. Yeah. So, so, you know, our company name prove can be many things, you know, within AI. It could be improved governance, it could be right. Um, it could also just prove AI works right, prove that it's functioning right. And you know, anything, anything along provability. Right. And knowing what your AI system is doing so that you don't have to waste time fixing it is the thing to rally around. So, um, you know, um, I mean it includes governance, but really it's just a matter of when something breaks. You don't want to have to spend a full day figuring out did something break? Should I look at something today? And if I'm looking at something, is this a one hour problem or is it an eight hour problem? And if I need to loop in additional people, are we going to spend the day debating about whether or not there is in fact an issue? Because those are the things that will kill you and those are the things that I see every day. And uh, you need some sort of observability trail. You need some sort of suite of troubleshooting systems in order to avoid all of those problems. The life cycle. Imagine I'm an AI engineer and I have a system. I wake up. First thing I want to do that I can't do with today's tooling is answer a simple question like, should I look at something today? Right. Because yes, there's evals out there, right? But that's not going to necessarily tell me if there's something I should really be looking at. Right. Um, you know, do I really have enough, you know, visibility into my tool to, to have it guide me to something to say, hey, it is something drifting off the rails in terms of my outcomes, of my outcomes. Right. Hard problem. But you know, that's something that, that I'm not given for free. If it is drifting, then what's drifting? Did something change? Does it correlate with, with, you know, with the drift, you know, what's going on here? Um, you know, um, I don't want to look only at things like model drift and things like that. I want to look m more holistically about our outcomes correlated with something. Did something change? Um, otherwise I'm going to go on a five hour you know, witch hunt, trying to find something that may or may not have changed. And it's going to, you know, waste my day. I'm going to literally, you know, blow my day on it, you know, and you know, the rest, you know, the rest is when I find something, is this, uh, is this something that I can solve, you know, over morning coffee, is it something that's going to take the rest of the day, is it going to be something I need help with? Um, you know, all of these, all of like this is the, this is one of the biggest kind of shoulder pain moments that we've heard from, from users out there. And it's nondescript, right. It's almost a meta problem. But it's just the meta problem of I wake up, I want to be building new stuff, but I can't because there's always some vague nondescript outcome that's not going well. And I don't even know, like, I don't even know if it's a problem yet. And I waste half my day figuring out is there a problem. And then once I figure out there is a problem, I end up chasing all sorts of crazy things until the end of the day to figure it out because so many people are touching stuff. Um, there's so many possible, you know, problems out there, so many dependencies and there's non determinism in the mix. So you know, recreation of things like replay, the ability to snapshot and replay AI systems doesn't exist for me anywhere. So like, you know, when we say prove, that's what we mean by it. It's just like, you know, prove the outcomes, prove that it works. Right. Prove what? You know, it can be anything in this case. Right. But there's a whole class of stuff around, not really understanding, you know, what's getting you to these outcomes that, that I want proof over because that'll save me immense amounts of time.

Speaker A: Yeah, you know, they're the coffee, morning, uh, discussions and problems and then there's the single malt scotch kinds of, uh, discussions that are probably saved better for later at night when it comes to AI, um, making decisions in regulated industries. That seems like it's one of those kinds of single malt questions. How do you feel about that? And what's the level of comfort? And um, with hallucinations on one side and efficiencies in another. How do you start to look at that as a topic?

Speaker B: Yeah, I mean, I'm still very bullish on AI everywhere. I think it's really up to the Individual to, in the, the engineer in this case, right. It's, it becomes their job to put in bulletproof policies and management of their AI systems, right. In order to comply with, with regulatory stuff. Because that's, that's the engineer's job. Right? That becomes their new job. Um, you know, so, so what's, what's a hard problem is how do you describe that to an AI? Because it's not like, you know, you know, we don't get to be hands off and say, well, the AI is going to do what it's going to do. I mean, certainly not. Right? It's, it's um, it's that it's my job as a professional, as a technology professional to, to manage this non deterministic system and. Right. There's precedent here. There's plenty of other non deterministic systems out there. Right. You know, and um, we found ways to manage them in very dangerous things like industrial, you know, applications, you know, as, you know, aerospace, etc. Right. So it's not like this is like, oh my gosh, it's unprecedented. I mean, there's people whose lives are at stake, you know, and in response to systems like geology and uh, various other things that are very unpredictable, um, so it just becomes down to the individual to say, do I really understand the regulation well enough and have I spent the legwork really focusing on making my job to make sure that I understand how to control my AI with sufficient visibility and sufficient, sufficient force to get the job done that I need to get done. Right. The pitfall there is just people again taking somewhat hands off approach, saying, well, I don't know, I mean, it's non deterministic. So what do you want me to do? I guess I'll get it out there and then we'll see how it works and then we'll realize we'll put out the fire. And that's, that's the bad decision right there. Right? That was the wrong decision because it's your job not to worry about getting into production. It's your job to put in those guardrails, to put in the policy, to put in the whatever right. To make sure that you never get to that state in the beginning. Right? And a lot of people aren't equipped to do that because they've never done it before. Right. They've taken a policy as text and they've encoded that into a deterministic system and said all, uh, right, well we have audits and we have checks and we're sure it will never happen. Right? They've never figured out how to instruct an AI and what the right metrics and checks are to make sure that the AI is never going off the rails. They've never done that before. Right. Which is the pitfall. Right. That's, that's why this problem is existing.

Speaker A: Greg, let me, let me turn our attention just a little bit of futuristic stuff not that far down the road, but what does the AI stack look like and say 2028 and which layer creates the most enterprise value? Simple question, right?

Speaker B: That's an excellent question. Uh, I don't know exactly. What.

Speaker A: Come on, you could share.

Speaker B: I, I can, I can guess, right. I think that the, uh, you know, what I can, what I can guess is that, you know, I'll be in the camp who, who think that models will become, you know, more of a commodity, right? Models will come commoditize, right. And that, um, you'll, you'll end up with a bunch of really interesting engineering tooling out there for code generation as well as for, you know, maintenance of those. Right? So, you know, we, we have to take things like cloud code, like um, you know, uh, you know, coding assistance as, as the de facto, as the norm, you know, which also implies that people may or may not be making clear, uh, clear cut technical decisions about what types of AI frameworks are actually being used out there. Right? It may be something that's given to them, it may be something that, um, you know, that they leave to, to the actual, you know, coding assistance to suggest and to ultimately run. Right? But, but I think you can assume that, that, you know, the, the thinking today around there's going to be, you know, a choice of models out there, that there's going to be largely, you know, commoditized, you know, modeling out there is, is most likely, right? Lots of emphasis on the harness, right? So on, on creating the tooling and the technology around those models to control, to maintain and to make sure that they do what you need to do will be fine, right? What that actual stack is, is probably going to be materially very different than anything that exists today, um, by 2028. Um, you know, I think that that's something to watch. I mean, of course, you know, you know, Codex is of course already going in that direction, right? So, yeah, I wouldn't be surprised if they were, you know, somehow playing, you know, playing in the space, um, some of the big incumbents out there around model chaining and um, you know, model observability, I think will, um, still likely be in the mix. You know, your lang chains, et cetera. Um, you know, and observability. Right. So observability and Data Telemetry Collection, OpenTelemetry and Otel. I think I would make a bet would still be in the picture as well. Right. Because I think the need to store, you know, a whole host of new, new uh, sort of telemetry and outputs. Right. Will Be a necessity and won't, won't go away. Right. But the um, you know, the day to day tooling and how to manage your, you know, manage the actual systems and development I think will. You will probably be, it'll probably be stuff by 2028 that you know, may or may not exist today.

Speaker A: Yeah, that's, yeah, that optimization. So uh, the other thing I wanted to do when I have your time, I get a person with your kind of background on the, on the uh call is to share some of your counsel that you can um, uh, layer on to some early stage founders, particularly those in this space of, let's just call it AI Tech founder. Um, so what kind of advice do you give first time founders entering AI right now and what, what should they focus on before raising money? Do you mind sharing thoughts?

Speaker B: Sure, sure. So I have some thoughts. Whether they're right or not, I don't know. I know limitation of liability, but I, I'll try my best.

Speaker A: Yeah. Okay, you're absolved.

Speaker B: Thank you. So one is I think that you know, as a CTO or as a founder, um, I think building and getting hands on with this stuff is really important. It's, it's, it's really, I mean. Right. It takes a lot of time and it's, I think it's not actually a lack of interest, it's a big time investment. But um, this is not the time to read a few PowerPoint slides and a few um, you know, a few thought leadership and say hey, I kind of get it right. We, we could pull that type of thing off with a few of the big technology waves in the past. You're like hey, I, I get what they're saying. I don't really need to go ahead and do this myself. I don't need to spin up a kubernetes cluster myself. Uh, this is not the case. This is not the time to do that. Right. This is the time where you know, understand what people are actually. You know, if you were to build something today, what would that look like? Um, you know, do it yourself I think is one you're going to find so many things that um, are just you know, you'll get, you'll get an understanding. What's ready, what's not ready, you know, what things work, what things don't work. Um, because if you're going to listen to marketing and thought leadership exclusively, you're going to be misled. Um, so I think that that's a key one is to, you know, leaders should be in the details here. Um, they should also, you know, if they're going to manage a team. I think the structure on how they manage early stage prototypes is, is different as well. So like, you've probably heard about token maxing, right? So, you know, people trying to keep an eye on saying, you know, what everybody needs to be a 20x engineer or whatever we're going to call them, and I'm going to keep track of how many tokens they're consuming for their coding tools. Um, in principle, good idea. In practice, I don't, I, I wouldn't do that. Um, I don't think it's the right, you know, thing. I think it's, it's an easy metric to track. Um, but it's sort of a crutch to, you know, it's, it's sort of a, it's sort of a lazy way of doing this. Right. I mean, what you're trying to do is to make sure that your, your engineers are really able to, to push things out and get feedback quickly.

Speaker A: Right?

Speaker B: So, so like, you know, what we prefer, what I prefer, you know, specifically is to, instead of spinning up like, you know, the Amazon 2 pizza team. Right. So. Right, don't do that anymore. Right. So I don't think that that's necessary anymore because. Right, one person should be able to deliver a prototype in a few weeks. Um, so like, that becomes the expectation and the metric then becomes, well, you know, are they, are they passing through gates as I would expect? Right. Are they hitting a three to four week prototype timeline? Are they iterating? Are they getting into people's hands for something that they could look at? Are they getting feedback and are they, you know, putting out something materially different? Um, right. So I mean, that implies that they're going to be consuming a lot of tokens anyway, right? Because it's one person. You know, it's one person. It's not a team of people, you know exactly what they're doing and you know that they're actually outputting working software. So, so I think what I'm advocating here for an early stage is like, oh, like, you know, oh, don't spin up a team Right. It's spin up, spin up a research lab in, in many ways. Right. Um, expect people to put out prototypes of their own. Right. Expect people to start prototyping, um, and expect them to, you know, expect each engineer to be responsible for a full end to end, you know, idea. I'm sure that that's, there's going to be times when maybe you want two or three or four people on and of course as you get some traction and you figure out what you want to do. Yeah, having one person do everything is not going to be the way forward. But you know, the biggest pitfall is just, hey, we're going to do an A.I. uh, project. Let's, let's figure out the customer pain now, let's throw 10 people on it and let's get started. It doesn't work very well because you don't get, you don't get your first prototype in two weeks, which is what you'd want, right? You get a bunch of planning, you get a bunch of back and forth, you get a bunch of discussion, it stalls things. Um, you don't want to do that, right? You want, at the very beginning, you actually want a bunch of stuff that you're going to throw away. You want to tease out the idea, you want to show it to people and then you can start consolidating. Uh, it's very different than how you would approach traditional software. The Amazon, the famous prfaq process in the two Pizza team that I'm familiar with. And it does work very well for launching startups within an enterprise at scale. Um, it doesn't work very well here. Um, you're almost going back to, you know, a different way of working and that, that's probably the biggest material advice I would give to a startup cto. Right. One is go deep, figure out what you need to be great at in a tech, be hands on about it. Right. If you're going to read marketing, you're not going to get there. And then when it comes time to spinning up a team, do not throw 10 people on it, do not throw 5 people on it. Um, they're going to step all over each other and they're going to get nothing done for three weeks. Like, um, really, you're going to have to cut it down to one person. You know, get me a prototype in this number of weeks, consolidate from there and, and build. It's, it's, it's very mind jarring in the beginning because it's um, it's, it goes against all traditional wisdom that I think you, you know, most CTOs have relied on for the past couple of decades.

Speaker A: Yeah, um, Greg, I've got a bunch more questions for you, but if you don't mind, I'm going to jump over to the lightning round just to kind of uh, change, change this, the pace a little bit. And these are short answers, but, uh, you know, I won't stop you. So we, uh, um, uh, you know, as a technology industry, love jargon. What's the, what's the most overhyped AI term today?

Speaker B: That's a good question. Um, yeah, I, I still, I think it's probably, it's still agentic, you know, agentic AI. Right. It's, it's ah, it's just, it's one of those terms that could mean too many things. Right. Um, there's a lot of that in here. I'll pick on that one just because when we say it, do we mean it's a multi agent system? If it is multi agent. But what are we talking? Like what is the agent? Do we mean multi foundation model or what? So I'll pick that one.

Speaker A: But uh, that's a good one. What's the most underrated founder skill?

Speaker B: Oh, uh, I'd have to go with here the, the ability to ignore things. Right. So, um, a lot of good founders and a lot of, well, a lot of great founders. Right. Yeah, you know, are, are. They're really, you know, and this could mean. Could be myself as well. Right? You're not good at everything. In fact, there's a lot of things that I'm not good at. There's a lot of good things that some really fantastic founders are terrible at. And you know, uh, but what they are is they're, they're fantastic at like two things and. Right. One of their key skills is learning to ignore stuff. Right. It's just like, you know, don't. It doesn't matter. Right. All I need to do is this. All I need to do is this. And concentrating on that one thing is. It's just a very underrated founder skill. Founders who get distracted and want to be master of everything and. Right. Um, who get caught off guard and get upset when they get a question like, oh, well, have you heard of this other startup that just came out? If they get distracted by that, it's actually a very bad thing, especially in AI because every other day there's going to be something else that you haven't heard of. And at some point you're just like, it's not my priority right now. And being comfortable with that and able to do that is a very underrated skill.

Speaker A: It's like prioritization, specialization. I like that one. Startup mistake founders repeat endlessly.

Speaker B: Um, yeah, so I, Well, I'll. It's fundamentally, you know, pivot. Right. Um, you know, pivot quickly. I think that that's. Yeah, I think if you had to ask this question a little while ago, even before AI, that would still be an answer. Right. But I think it just means in this case, um, it just becomes the pivot needs to be happening even earlier in the cycle. Right. So it's still a, uh, it's still something that people miss out, miss out on. You can even see that with a lot of the AI tools that are out there. Right. So some that are widely loved, widely used, but for some reason, you know, don't have that, that big gigantic valuation or the ability to transact. Right. There's a lot of odd transactions out there and. Right. You know, fundamentally they made a bunch of them have great products, maybe open source, maybe free open source. Right. But a bunch of design decisions early on about how are we going to commit to single agent systems. Like, is every AI going to be like a chatbot and have a human operator in the mix? Right. Some of those big assumptions that some really big AI properties made early on and turned out to be false two years later. Right. I look at them and I say, I'm sure they know, I'm sure I'm not teaching them anything here. But you know, you kind of, you kind of get yourself into a sticky situation there because, you know, the market has moved on and. Right. The signal was probably there. Um, yeah, it's just, you know, again, that ruthless pivoting and expecting people to change is, uh, you know, one of the biggest pitfalls that's out there.

Speaker A: Okay. I, uh, put you on the spot and ask you for an AI company that you really admire. Like, they just keep nailing it or keep doing really cool stuff or breaking the, breaking the mold and doing it repeatedly.

Speaker B: Yeah, sure. I mean, I particularly, you know, I like a lot of the stuff that coming out of Codex at the moment. Right. So, um, you know, you know, I look at, uh, it's also especially relevant, you know, again, this pivot thing. Right. And the, you know, the, the, the existential pivot that they had. Right. And that, where they were, you know, that they were closely bound to, you know, some, some particular, um, you know, AI providers. Right. Some model providers. Right. And just, you know, really, really, you know, a willingness to stay close to their customer base. Right. Frank. And open discussion around, hey, you know, yes, there's trouble. Here's what we're doing. You know, inability to release under those conditions is, is really hard. Right. So, you know, you know, what I'm respecting here is just, you know, people who, you know, are, are able to see something coming, right. And say didn't see, didn't expect this to happen. Now, uh, I think in their words, if I'm remembering one of their code releases, right, It's Code Red. It's, it's, you know, we're in trouble. So still going to do this, right? So the communication, the clarity of communication, the ability to keep releasing. Right. Keep a team, you know, um, you know, delivering under those conditions is extremely hard. Right. And um, you know, those things together, like, you know, no, no concern around making a really hard pivot, a really hard turn anyway.

Speaker A: Right.

Speaker B: Quickly being able to craft, like what are we going to do about it? And then turn your team in that direction and continue executing is immensely hard. And um, you know, I'll um, I'll throw that one out there that I was, you know, particularly impressed with how that went.

Speaker A: Good one, Greg. And then for final um, question. What should founders spend less time on or more time on? I know we danced around this in a couple ways, but I'm re asking the question, maybe you could put a finer point on something.

Speaker B: Yeah, sure. So I think, I think the founders, um, I think, you know, I talked about what they should spend more time on, which is going deep right. On the. Spend less time on. I see diminishing returns from spending too much time on um, you know, future vision and thinking, you know, those, those two to three year outs type of, type of situations. Uh, just because so many of the predictions are wrong at the moment. Right. So trying to design and think about oh my gosh, well, what would happen if you know, anthropic did thing X or uh, oh my gosh, what would happen if Google decided to get into this space? Right? There's, there's like a thousand what ifs out there.

Speaker A: Yeah.

Speaker B: That is, it's just a waste of time. You don't know what any of them are going to do and no line of reasoning can prove, predict their behavior right now. And, and right now there's still a lot of founders, uh, out there who think that way. Right. Who think like the. I know exactly what Google's going to do with Gemini, you know, and well, logically Anthropic would do this and then they would do this and then Microsoft would jump in and do like I've heard this line of reasoning from at least two or three people. And it's like, you know, they're going to be there ready to bounce when, when, you know, ready to pounce with their product. When all, you know, all of these big players do exactly as they think they will. It's, it's just I don't think that that's a good idea. Right. And I think that that thought exercise is a waste of time right now.

Speaker A: That makes sense to me. Well, this has been a master class from somebody who has actually been there, done that. I, uh, I think it's been fantastic. Greg Waylon, prove AI off to the races with your own company. And thanks for sharing some thoughts with us as well here. Any, uh, any last, uh, any last comment before we send you back to the pits?

Speaker B: Yeah, sure, sure. Just check us out@proof AI.com and we also have a multi, um, agent systems engineering group on Discord. Uh, we're always happy to have people come and chip in on like, you know, what types of debugging and you know, provability challenges are you having. Right. We're, you know, we're actively building a community out there, um, to talk about it. So, um, yep, check us out there. Um, multi, multi agent systems engineering group on Discord or prea dot com.

Speaker A: Greg, that's fantastic. You know, in, in the time, in the fun era that we're in right now, it's extremely challenging and I think community is a great way to kind of help lift, uh, all boats. So thanks again for your time. Really appreciate it.

Speaker B: Thank you. All right.

More from Liftoff with Keith

All episodes →

Explore the best B2B SaaS podcasts →

Listen to this episode All Liftoff with Keith episodes →