FinOps and AI

Next in Tech · 2026-06-23 · 30 min

Substance score

41 / 100

Five dimensions, 20 points each

Insight Density10 / 20

Originality8 / 20

Guest Caliber7 / 20

Specificity & Evidence8 / 20

Conversational Craft8 / 20

This episode explores FinOps and AI cost management, discussing how financial operations principles developed for cloud infrastructure are now being applied to AI spending, with emphasis on tokenomics, model routing, and the challenge of measuring AI value and ROI as agentic workflows and widespread organizational AI adoption drive unpredictable costs.

Key takeaways

Model routing by persona - assigning cheaper, less capable models to non-critical functions like marketing copywriting while reserving premium models for customer-facing agents - is emerging as a practical FinOps approach for controlling AI costs.
Agentic workflows with reasoning loops and retry storms are causing organizations to exceed their AI budgets significantly faster than expected, forcing model providers to shift from unlimited consumption models to seat-based licensing with token-based overage pricing.
The fundamental challenge in FinOps for AI is the opacity of token consumption across different models and the lack of clear measurement of cost-per-business-outcome, making it difficult to assess ROI compared to traditional IT investments.
Organizations are allocating up to 80% of their FinOps efforts to managing AI spend, and the FinOps Foundation launched a companion Tokenomics Foundation partnering with open-source AI communities and ITAM groups to address AI cost management across departments.
Cost prediction for AI is becoming critical as surprise bills are exponentially larger in the AI era compared to the cloud era, requiring integration of tokens, infrastructure, agentic server calls, and business outcomes into unified cost accounting.

Guests

Gina Telsek Melanie Posey

Topics in this episode

Agentic workflows Snowflake Databricks Red Hat FinOps Foundation Tokenomics Foundation Model routing Token-based pricing OpenShift SUSE AI proxy

What our scoring noted

Our reviewer’s read on each dimension, with quotes from the episode.

Insight Density

10 / 20

There are a handful of genuinely useful observations - retry storms in agentic workflows, persona-based model routing, and the shift to seat-license-plus-token pricing - but these are surrounded by heavy stretches of recapping, metaphor-stacking, and filler that dilute the useful content per minute considerably.

There is reasoning, there are retry storms, there are loops and so on. And companies were blowing their budgets using agents.

it's something like cost per outcome. So that would actually be the fully loaded cost across the tokens, the infrastructure, the sort of back and forth agentic MCP server calls.

Originality

8 / 20

The episode is largely a conference debrief that surfaces practitioner observations rather than original analysis; framings like 'building the plane as we fly it' and comparing AI billing shock to cloud egress sticker shock are recycled analogies, and there is no genuinely contrarian or first-principles argument made throughout.

What AI basically does is turn a simple mathematical equation into a quadratic equation in terms of the numbers of variables

In some ways FINOP starts to look a lot like technology business management, which heretofore has been seen as a different thing

Guest Caliber

7 / 20

Both guests are S&P Global research analysts reporting second-hand observations from attending FinOps X; they are thoughtful and knowledgeable but are not operators who have personally implemented AI cost management at scale, which limits the practitioner depth of the conversation.

the practitioners that were speaking and that we spoke to, they said really the easiest way to do that is by Persona

We started hearing it last year and we heard it a whole lot this year

Specificity & Evidence

8 / 20

A small number of concrete details appear - the 80% of FinOps effort on AI, the token-as-three-quarters-of-a-word convention, named organizations like the PyTorch Foundation and ITAM Forum - but there are no real dollar figures, no named enterprise case studies, and no comparative vendor data to anchor the claims.

Tokens are defined as being as amounting to roughly three quarters of a word

as much as 80% of their finops effort is now going to AI spend management

Conversational Craft

8 / 20

The host is clearly informed and asks some substantive follow-ups around model routing vendors and ROI measurement, but his questions frequently arrive as extended multi-sentence monologues that answer themselves before the guest can respond, and there is no meaningful pushback or challenge to any claim made by either analyst.

Is this getting back to the thing we've always struggled with, which is understanding what that delivered value for any particular IT project is, or are we getting to that quagmire?

Were you seeing any of that and are they talking about third parties or is this something where we ought to expect to be able to integrate model routing?

Conversation analysis

Computed from the transcript - who did the talking, and the verbal tics along the way.

Share of words spoken

Speaker A43%
Speaker B40%
Speaker C17%

Filler words

so33like24uh18actually15right10kind of7basically6um3er2you know1sort of1

Episode notes

We're in the early days of cost impacts for AI applications. While there are some cautionary tales, current spending seems to be a small fraction what's to come. Analysts Jean Atelsek and Melanie Posey return to the podcast to talk about what they heard at the FinOps X conference with host Eric Hanselman. The need for cost management in AI is seen as so great that the FinOps Foundation, a project of the Linux Foundation, is talking about morphing its conference into Tokenomicon and pivot into token economics. The portmanteau of tokenomics is sweeping across cloud and AI services providers, as well as IT vendors, as enterprises wrestle with dueling forces of AI acceleration and management constraints for access and cost. Unlike FinOps for cloud operations, the costs and metrics for AI are fairly opaque. Some enterprises are trying to manage costs by limiting access, but that risks stifling the innovation and democratization that is supposed to come with AI transformation. Request routing is promising, but it requires understanding the nature of the request and the suitability of available infrastructure to fulfill it, something that is not well understood by many.

Full transcript

30 min

Transcribed and scored by The B2B Podcast Index.

Speaker A: M welcome to Next in Tech, an S and P Global podcast, where the world of emerging tech lives. I'm your host, Eric Anselman, Chief Analyst for industry research at S and P Global. And today we're going to be talking about an aspect of AI that enterprises are just starting to wrestle with. Cost management. We think about what's happened. So many of the impacts of cost in AI are only just starting to be felt by organizations. But with me to discuss this are two of the analyst team who were just at the FinOps X conference taking a look at a lot of what had been the financial operational management pieces that have come out of cloud and that are now starting to find a home in the AI world. With me are Gina Telsek and Melanie Posey. Welcome back to the podcast. You both.

Speaker B: Thanks a lot, Eric. Always great to be here with you.

Speaker C: Yeah, thank you.

Speaker A: So what's your take, I guess, for a little. Can you give us a little background on what the FinOps X conference has been about? Really? What has it been wrestling with? Because it's now. I don't know, how many years has it been going now? A few now, but being driven out of concerns about cloud cost. Right.

Speaker B: All right. The FinOps foundation was started about six years ago when they got the first FinOps X conference to get a bunch of practitioners together to talk about issues related to cloud costs and techniques for cloud optimization and rightsizing and all of those things. And over the years, it's become a combination of a practitioner conference and a, uh, vendor conference, in the sense that a lot of companies have sprung up with different tools that basically help organizations wrestle all their cloud billing data to the ground, find ways to categorize it, run analytics on it, to get a really deep granular view into what's being spent on cloud overall. Who is that spending with, what is that spending on? And who is responsible for driving that spending? And what's developed over the last couple of years is how do we deal with the whole issue of AI costs? And now that the whole topic of tokenomics is out there in the wild with everybody talking about it like your grandmother knows what tokenomics is now. So I think that's why the FinOps foundation has decided to kind of get there first and be the one that shapes the conversation around tokenomics, um, and finops for AI, essentially, so that they're renaming next year's conference Tokenomicon. I think that's it. And to take a deeper dive into this whole thing and the interesting dilemma that we're seeing in this space right now is a lot of organizations haven't yet gotten a handle on their in quotation marks, traditional cloud infrastructure costs yet. And over the years the FinOps foundation has added different, what they call scopes, different silos of cloud spending. So that brings SaaS into the picture like how do you account for your spending on Snowflake and Databricks and some of those other software platforms and how do you do on premises infrastructure cost, how do you bring that into the mix? And now how do you bring AI into the mix? So lots of related but somewhat different problems that involve different Personas within the organization who are responsible for driving that spend and different vendors who could jump in and help.

Speaker C: I'll just add that part of that effort is that the FinOps foundation has launched a companion foundation called the Tokenomics foundation and that's going to be working with the IT Asset Management ITAM Forum, which is also part of that same group. And it brings in a lot of the AI native open source community with open Secure Software foundation, the Pytorch foundation, the cloud native computing Foundation. In other words, it's something that's really moved beyond just cost management because AI touches so many departments that they really need to figure out how to corral those costs and it's not something that's going to be handled just by finance people and engineers.

Speaker A: So this is really taking a step one operating in parallel on the AI side, but taking a big step from what had been some of those original goals from the financial ops, financial optimization kinds of capabilities, the finops, that core capability and now really moving into entirely new spheres. I guess to some extent much the same way they had done with cloud because the original challenges with FinOps were all that issue of you're moving from what had been the historical IT approach of spending on capital to now shifting that because you'd go out and you'd buy infrastructure and you'd set it up, you'd run it, but you'd spent upfront and then you work hard to maximize utilization of it. But that upfront cost had been the financial model that shifted to an operating expense model. And that when we started to move to cloud, in which you are now renting and or leasing the capabilities you're working with. And for many organizations that shift from capex to opex was a real jolt in some industries of course, that are driven to optimize capital performance. In fact, that was really complicated. We got into all sorts of nuances. Melanie, as you were saying that figuring out what the Differences are between SaaS apps like the Snowflakes and the databricks and the salesforces of the world to managing your cloud capabilities of how big an instance do you need when you happen to be in Amazon or Azure or GCP or OCI to make sure that the instances were sized appropriately so that you weren't getting this great big powerful machine to do tiny amounts of work to now really shifting to a very different environment, which is trying to drift towards that idea of AI tokens as being this uniform measure of AI work that gets done and at the same time starting to do a lot of the things that we had to do with other cloud capabilities, which is manage policy. As, uh, you were saying, who gets access to this? How do you manage the allocation of the cost? It seems though, this pivot to AI is a pretty significant one in a whole set of procedural ways that many organizations are still just now starting to really understand how that impacts and how they integrate with their operations.

Speaker B: That's definitely a good point, Eric, because one of the most obvious things that is different from cloud infrastructure versus AI usage is that unit of value, right? Like tokens are a unit of measure here, and it's a very imperfect unit of measure, but it's the best we've got so far. But it's a lot more variable than some of the things that can happen in a cloud infrastructure environment to throw costs completely out of whack. Because first of all, you've got a plethora of different models that folks can use for AI queries or for agentic AI operations. So that causes huge amounts of variability in the cost, right? So how do you optimize the types of models that are used in AI and agentic operations? Is there a way to route queries or agentic workflows to the best in quotation marks? Model based on cost, based on performance, based on latency based on throughput. So what AI basically does is turn a simple mathematical equation into a quadratic equation in terms of the numbers of variables. Like you've got X, Y and Z are the variables and they're not any constants in the equation. This is part of the difficulty in developing the tools that can help organizations get a handle on on these AI costs. Because a lot of times when people talk about it costs, the real answer is, well, it depends. And you get this exponential it depends response in AI along with it's complicated as well.

Speaker A: So is this something that we should be using large language models to figure this out and how much are we going to pay for it.

Speaker B: Exactly. That's exactly what we heard quite a lot about at the event. We started hearing it last year and we heard it a whole lot this year where folks are talking about FinOps, uh, for AI, so applying the principles of cost optimization to AI. And then there's AI for FinOps, where a lot of the FinOps providers are integrating AI in the way that they deliver value to their customers. And one thing that's going to be really important going forward in finops for AI is this ability to predict costs. Because if you had surprise bills from your cloud provider during the cloud era, the frequency and probably overall size of those surprise bills is exponentially bigger in the AI era.

Speaker A: Well, and you raise actually a really important point there, which is there was the whole cloud sticker shock. And we went through the various phases of cloud sticker shock. First just the cost of individual instances in the cloud, then egress fees for data. That was its whole revelation when suddenly volumes of data got big enough to be expensive. This is something where we're stepping into an area where we like to talk about the democratization of AI and the employee coder and all of the work we're doing to extend AI, uh, capabilities to everyone in an organization. And guess what? When everyone in the organization has the ability to start kicking up bills of potentially significant size for AI, uh, how do we manage that? Because yes, we want everybody to have access, but holy cow, do we want to pay for everybody to have access or at least to have unlimited access? Gets us into some complicated areas.

Speaker C: I was just going to say AI was a big topic also at last year's conference. But what changed between last year and this year was the agentic workflows, which are not just launching an agent that uh, does its job and then retires back into its spot. There is reasoning, there are retry storms, there are loops and so on. And companies were blowing their budgets using agents. And as a result the model providers changed their pricing models. So when it was a big land grab situation, you could buy a premium subscription and have an extremely high cap on your usage. And that led to a great deal of abuse. So now instead of an all you can eat model, there is a subscription seat license, uh, in many cases a seat license with a certain modest amount of usage and then token based pricing on top of that. And uh, that is something that, you know, these companies have got to get a handle on. And the FinOps personnel who had managed to really do a lot of automation and optimization in terms of getting the cheapest Rates for their resources and also using fewer resources, all of a sudden they were back at square one. So that's the problem with that.

Speaker A: It sounds like the various AI providers out there are having an all you can eat shrimp moment. With some interesting spin beyond this in that they of course need to ensure they protect themselves, but they also have an incentive to keep their customers happy and give their customers ways to limit those bills as well. Because of course if suddenly they get skyrocketing bills, customers are going to shut off use and presumably it is in their best interests to help them manage for the AI providers to help their customers manage that use. So as you're identifying per seat token based, starting to put limits in. We saw this kind of evolution in cloud services, although there because the folks actually doing the consumption were a much smaller audience, you could do things like alarms and limits and things that were. That didn't have to go quite so far. Now that we've got everybody in our organization potentially using these capabilities, it seems like there has to be. There are just a lot more controls and the complexity of those controls have to be significantly greater to be able to accommodate all these different use patterns.

Speaker B: That's exactly it Eric, is that there weren't that many people within organizations who were spinning up cloud instances and then leaving them idle or getting a bigger instance than was actually required for the workload.

Speaker A: You it was all those darn developers.

Speaker B: You can always blame developers for things. And continuing into the AI era you've got developers still being front and center because they are building agenda capabilities into external customer facing websites. They can retail the chatbots and AI personal shoppers and that kind of thing. But at the same time you've got the developers and the IT folks building internal AI capabilities that are used for back office and middle office types of workloads. So there are a lot of places where costs can spin out of control. And I think where we are in the AI era right now, we're building the plane as we fly the plane. Whereas in the cloud era finops came along quite a few years after the cloud came along. So the thing itself was built. Now you're building a control and navigation system to keep everything under control. And we don't actually have that luxury of first this, then that. In the AI era it's all happening at the same time.

Speaker A: A recipe for complexity certainly maybe not one for success. Although I don't know what's your take on at least how people are trying to apply this? Because uh, as you're pointing out, this Is the ground shifting under everybody's feet around this? It's not just how the models are evolving, it's the data pipelines, it's how we're actually building consumption. This whole environment's moving pretty quickly.

Speaker C: I'll say one interesting phrase that we heard quite a bit at this year's FinOp6 was model routing. And in most cases the practitioners that were speaking and that we spoke to, they said really the easiest way to do that is by Persona. So they are going to give the developers that are working on the customer facing agents the latest greatest models. But if you're in marketing and you're writing copy for a catalog or something like that, you're automatically getting a cheaper, less capable model. So that's one way that they're doing it. And there are other techniques for model routing, but that's the one that seems the uh, easiest and most popular at this point.

Speaker A: I'm curious because we've been hearing about model routing from a bunch of different corners of it and it's everything from the big infrastructure providers, The Dells, the HP's, uh, the Lenovo's of the world are identifying that they've got capabilities to be able to do some of this. We've also got it on the software provider side. Um, SUSE has got their AI proxy, Red Hat has got a routing capability within OpenShift. Were you seeing any of that and are they talking about third parties or is this something where we ought to expect to be able to integrate model routing? Where does that fit and who are the players in that area?

Speaker C: I would say it is a policy discussion right now the enterprises who are applying it uh, to the Persona based ones are basically saying we are coordinating off this particular model for these particular functions in the organization. And that's not a very sophisticated way of doing it. But that is a, ah, as of now seems like a popular finops way of doing it.

Speaker A: So if you want to get to that trillion parameter model, you better have a really good application. But yet that still leaves open this question of we've come through the waves of training consuming most of the work that's being done but now we're getting to this next stage of actually getting to inferencing where we're delivering the work with the models but it seems like we've still got some gaps of understanding that is the cost of what we built to do the inferencing actually worth what we're paying for it and that that starts to that ROI on wow, we're spending Umpteen zillion dollars on tokens. We for this customer service app that in fact doesn't actually generate any revenue or sales. How do we calculate roi? How much do we spend on this? And it's actually costing twice as much as our customer support center actually used to cost. So was that a good thing? There's some big questions out there.

Speaker B: Yeah, that's the missing piece of all of this is how do you get to value and how do you get to a point where your AI consumption metrics, it's not just about overall number of tokens used, it's something like cost per outcome. So that would actually be the fully loaded cost across the tokens, the infrastructure, the sort of back and forth agentic MCP server calls. Getting all of that data in one place and then assigning it to a particular business outcome, that's going to be the tricky piece of things. And once we get to that, people won't be talking about tokens as such. Tokens are just part of the AI workflow. They're an input rather than some intrinsic unit of measure that relates to cost or value or something else.

Speaker A: Is this getting back to the thing we've always struggled with, which is understanding what that delivered value for any particular IT project is, or are we getting to that quagmire?

Speaker B: Uh, actually in a way it's something that uh, it's nothing new, it's just more complicated with AI because you've heard for years, like back in the dark days of what we used to call strategic outsourcing, there were always questions about if you spend billions of dollars with Deloitte or IBM, what does that get you at the end of the day? And there were lots of pricing models being bandied about that basically boiled down to some kind of revenue sharing or some kind of sharing in the cost savings that the cost savings, ah, percentage of that is basically what you pay for the solution in the first place. I think that's another part of this that people have talked about this outcome based, I don't know value metrics, but how you get to that's going to be a lot of work. And in some ways FINOP starts to look a lot like technology business management, which heretofore has been seen as a different thing, but we'll definitely see some convergence there.

Speaker A: That is a really interesting point because what that says is that much like the outsourcing or services model, that what happened is we got to outcome based capabilities or services that were outcome based in terms of how they were measured, maybe this is something where AI has taken us up the technology stack a little and in fact we have to treat this like a managed service. It seems like all we need now is for the AI providers to actually package it that way, but details.

Speaker C: Exactly.

Speaker B: Yeah, well, another thing that kind of needs to happen as well is there's a certain amount of opaqueness in how many tokens you end up using for a particular query or for a particular workload. Tokens are defined as being as amounting to roughly three quarters of a word. I think that's the convention that's out there right now. But how do you decide that the reasoning that takes place inside of this or that model means it takes this many tokens to spit out the response where it might take a, uh, larger or smaller number of tokens to get more or less the same thing from another model. Let's do the models really show their work to an efficient extent that organizations can make informed decisions about is the juice worth the squeeze using this model versus that model? Those are really complex questions. And again, it's calculus, not algebra, when it comes to trying to figure all this stuff out.

Speaker A: Well, Gene, back to your point of that issue of model suitability. Which models just simply constraining which models get used by which users is a fairly crude approach and getting to something where we start to actually assess what is the request and route it to the right model, because we know that this model works better with this type of request in this area or in this field. Wow, that does get really complicated really quickly.

Speaker C: Yeah, I'd say that the FinOps practitioners have developed a lot of muscle in terms of optimizing for OPEX spend on compute resources. And now since the FinOps foundation changed its messaging to say they want their practitioners to manage technology value, not just cloud value, so they're expanding their remit at the same time. And I think everyone realizes that some of the people we who spoke at the conference were saying that as much as 80% of their finops effort is now going to AI spend management. So they are putting in the effort to get a handle on this, despite the opacity of what a, uh, token is even from model to model, and what the value is once you come out. One thing that they, that some of the vendors are doing, however, is they're using event correlation between token usage and observability to get a better idea. It's almost back to square one. They're working on allocating the cost to different teams, different projects and so on, and then going from there. And that seems to be where a lot of the users and a lot of the vendors are at right now, is putting that together.

Speaker A: It seems like everybody can see the horse dashing for the gate and they're just desperately trying to get to the gate before the horse gets out. But, um, I guess we'll have to see if we can actually get there. These are great perspectives. Thank you both. I guess the, the answer for now is stay tuned, buckle up and hopefully you can get your. Get one's arms around this in terms of managing it.

Speaker B: Yeah, I think that lest anybody come away thinking that this is an unsolvable problem as far as FinOps for AI, it's not because FinOps itself for public cloud infrastructure spending has laid a great foundation, as Gene was saying around basically doing the basics on who's using these resources, how much of these resources are being used, and what are they doing with the resources. Those fundamentals are already well established on how to figure that out in your cloud finops world. What we'll start seeing with finops for AI, which I like better than the term tokenomics, is building upon that and accounting for some of the quirks and the different, let's say, run times of AI workloads as opposed to traditional workloads. But the fundamentals are there. I think we'll see more of an uh, entire industry ecosystem effort around tackling the AI cost beast to the ground. So look forward to seeing a lot of different types of companies being involved in, in this new kind of. I don't know, do we call the Token Tokenomics foundation like a subreddit of, uh, the FinOps Foundation? Maybe, but you'll have a lot of new faces and a lot of old faces involved in that effort as well.

Speaker A: So it sounds like there are a lot of lariats heading for that horse, but we'll have to see. But I think, Melanie, that's a good, more positive note to end on that. We have the tools, we have the basics, we understand what we're getting towards, more work to do. But I guess this is clearly an area where we get a lot of people focusing on really how to make this, how to practically manage somewhere. This is going, but this has been great. Thank you both. And that is it for this episode of Next in Tech. Thanks to our audience for staying with us and thanks to our production team, including Sophie Carr, Franmiya Deeshan, Kira Smith and Dylan Scheibel on the marketing and events teams. If you like this episode, please subscribe or like us. I hope you'll join us for our next episode. Episode. Because there is always something next in tech.

More from Next in Tech

All episodes →

Explore the best B2B Finance podcasts →

Listen to this episode All Next in Tech episodes →