How Precisely Is Closing the AI Data Integrity Gap

Tech Talks Daily · 2026-06-24 · 26 min

Substance score

47 / 100

Five dimensions, 20 points each

Insight Density11 / 20

Originality8 / 20

Guest Caliber12 / 20

Specificity & Evidence9 / 20

Conversational Craft7 / 20

Dave Schumann, Chief Data Officer at Precisely, discusses the critical gap between organizations' perceived AI readiness and their actual data infrastructure capabilities. The conversation centers on findings from Precisely's State of Data Integrity and AI Readiness Report, revealing that while 87% of leaders claim to be AI-ready, half still cite infrastructure as a major challenge - highlighting that success depends on data quality, governance, and semantic layers rather than tooling alone.

Key takeaways

Organizations must prioritize data integrity and quality foundations before deploying autonomous AI agents, as AI amplifies both good and bad data at scale.
Governance programs should produce embedded automation and controls in code and pipelines rather than static policy documents that cannot scale with AI deployment.
Successful AI ROI measurement requires tying agent outcomes to business fundamentals (revenue, cost, service, risk) rather than vanity metrics like token usage or query volume.
Pilot projects often fail in production because they operate in controlled environments with curated data, while real-world agents face broader datasets and lack the observability needed for autonomous operation.
Organizations with existing data governance programs that fold AI governance into them are significantly more successful than those building separate governance frameworks.

Guests

Dave Schumann

Topics in this episode

Model Context Protocol (MCP)Snowflake MLOps Precisely State of Data Integrity and AI Readiness Report Semantic layers Data fabric Extract Load Transform (ELT)Data catalog Autonomous agents

What our scoring noted

Our reviewer’s read on each dimension, with quotes from the episode.

Insight Density

11 / 20

The episode contains a handful of genuinely useful framings - pilot-to-production decay, governance-as-automation-not-documents, and folding AI governance into data governance - but these are interspersed with long passages of standard data management advice and conceptual throat-clearing. The ideas-per-minute rate is modest.

87% of the respondents said their infrastructure is ready, but half still cited infrastructure as a key challenge

governance has to produce automation, not documents. Policies don't scale AI, automated controls, quality checks, uh, they're embedded in the pipelines

Originality

8 / 20

Most of the thinking is familiar data-management wisdom rebranded for the AI moment - ELT, semantic layers, data products, and 'garbage in, garbage out' are industry staples. The 'below the waterline effort' framing and the MCP-server critique are minor fresh angles but don't constitute genuinely contrarian arguments.

below the waterline effort. So that's lineage, it's the data definitions, it's profiling, it's enrichment, and that's quiet and tedious work

arc up an MCP server, go at the system view of the world is kind of, um, short sighted

Guest Caliber

12 / 20

Dave is a working CDO at a real data-integrity vendor and demonstrates genuine technical fluency with ELT, semantic layers, and MCP servers. However, the conversation is inherently vendor-positioned and lacks the operator war-stories or hard-won lessons that would push the score higher.

I lead our data strategy, which is governance, analytics, now, AI enablement, and the data engineering and architecture teams

we use a methodology called elt, Extract, load and transform. Um, for me, that was a really transformational change in the industry when we went from ETL to elt

Specificity & Evidence

9 / 20

The episode offers one survey stat (87% / 500+ respondents via Drexel LeBow) and names specific tools (Snowflake, Databricks, Copilot, Claude), but there are zero named client examples, no dollar figures, no timelines, and no case-study outcomes - just the vendor's own proprietary survey and generic technical architecture descriptions.

we did the study in conjunction with the Drexel LeBeau, uh, college and uh, one of the interesting things surveyed over 500 IT professionals

we have a hypothesis, we're going to go test this out. We have a tool, uh, that we're going to work with and we build this well contained pilot

Conversational Craft

7 / 20

The host asks reasonable scene-setting questions but never challenges a single claim - the glaring internal contradiction in the 87% stat is noted briefly and then dropped rather than probed. Questions are largely pre-scripted 'tell me about X' prompts with no productive disagreement or meaningful follow-up.

what are organizations underestimating most when it, when it comes to becoming AI ready, what are they forgetting or underestimating?

was there anything in the report that you found surprising that caught you off guard

Conversation analysis

Computed from the transcript - who did the talking, and the verbal tics along the way.

Share of words spoken

Speaker B63%
Speaker A37%

Filler words

so55uh45um19right16sort of13like9you know3actually3kind of2er1I mean1

Episode notes

Can organizations really call themselves AI-ready if their data foundations still have gaps? In this episode of Tech Talks Daily, I sit down with Dave Shuman, Chief Data Officer at Precisely, to discuss the findings from the company's latest State of Data Integrity and AI Readiness Report. Drawing on insights from more than 500 senior IT leaders across the US and Europe, Dave explains why many organizations are confident in their AI readiness while simultaneously identifying infrastructure, data quality, and governance as their biggest obstacles. Our conversation focuses on what Dave describes as the AI data integrity gap, the growing disconnect between ambitious AI initiatives and the quality, consistency, and context of the data powering them. We explore why successful AI projects often perform well in controlled pilot environments before struggling when deployed at scale, and why many organizations continue to underestimate the importance of data lineage, semantic layers, governance, and observability. Dave also shares why he believes data governance and AI governance should be treated as a single discipline rather than separate initiatives.

Full transcript

26 min

Transcribed and scored by The B2B Podcast Index.

Speaker A: The leading issue of agentic AI in businesses right now is ensuring agents act with compliance guidelines. And denodo applies guardrails across your entire data estate. By, uh, aligning your company's data infrastructure under one system. These guardrails perform consistently across your platform. So start scaling your business and start with Denodo. Simply visit denodo.com to learn more. What if the biggest barrier to AI success has less to do with the model that we're all obsessing over and far more to do with the data that is feeding that model? Well, my guest today is Dave Schumann. He's the chief Data officer at a company called Precisely, and they are an organization focused on data integrity and helping organizations make better decisions from data that they can trust. And my guest describes himself as a data whisperer, which feels incredibly fitting for a conversation that goes beneath the surface of AI readiness. Because while many leaders are under pressure right now to show fast returns from those AI projects, the reality inside many organizations is sadly far messier. Platforms have been brought, pilots have been launched, and confidence is being projected. But the data foundations, these are often still full of gaps. So in today's episode, we will discuss Precisely's latest state of data integrity and AI readiness report. There are so many big stats in there, but one of the standout findings is the disconnect between perception and reality. Many leaders say they're all ready for AI, but their infrastructure, their skills, their governance, and data quality, all these things remain persistent obstacles. So Dave will explain today why successful AI depends on quality, governance, enrichment, context, and semantic layers. And he'll also share why pilot projects often look impressive in those nice and safe control environments, but they always struggle when they move into production, where agents face broader data sets, real users, and far fewer guardrails. So we'll talk about all this roi, why many organizations are measuring the wrong thing, and why AI value must be tied back to revenue, cost, service, and risk. So if autonomous AI amplifies whatever data it is given, whether it be good or bad, is your organization spending enough time fixing the foundation before thinking about asking AI to make decisions? Well, enough for me. It's time to bring Dave onto the podcast now, and I cordially invite you to listen in with me. So thank you for joining me on the podcast today. For everyone listening, we begin by just telling them a little about who you are and what you do.

Speaker B: Neil, thanks for having me on. Uh, my name is Dave Shimon. I'm the Chief Data Officer at Ah. Precisely. And you say what is Precisely? Precisely is A global leader in data integrity. We help organizations ensure their data is accurate, consistent and contextual so they can trust their data to make better decisions. Uh, at precisely. I lead our data strategy, which is governance, analytics, now, AI enablement, and the data engineering and architecture teams that send underneath it all. Um, I've spent my career listening to what the data is and isn't saying and the signals below the surface. And if I had to go back and boil that down to one pithy little statement, I would call myself a data whisperer.

Speaker A: Oh, I like that. The data whisperer. That's got a real ring to it, hasn't it?

Speaker B: Also not a horse whisperer. I have proven that emphatically, but I'll go with data whisperer.

Speaker A: Okay, sounds like there's another podcast episode right there. But one of the reasons that uh, uh, put you on my radar and why I was excited speak with you as having flicked through the recent state of the data integrity and AI readiness report. And again, for people hearing about that for the first time, tell them a little bit about what it is and maybe summarize some of the key findings in there. Because there's some pretty big stats, isn't there?

Speaker B: It is. Uh, so we did the study in conjunction with the Drexel LeBeau, uh, college and uh, one of the interesting things surveyed over 500 IT professionals, mostly in senior leadership roles in the US and Europe. And really the question we came back to and what the headline, for me that was around this was a disconnect, um, between how leaders feel and how ready their data actually is. And so I think that's, I called it the AI data integrity gap. It's where AI value gets stuck. And so what we saw in this is that AI ready is often interpreted as we bought the platforms, uh, we ran the pilots, but the real readiness is operational capabilities, ownership, lineage, metadata, lifecycle management, monitoring, and KPI linked outcomes. And so 87% of the respondents said their infrastructure is ready, but half still cited infrastructure as a key challenge. And so that really tells us the gap isn't hardware or software. It was a full stack capability problem. The skill distribution showed balance shortages across all capabilities. So there's really no single higher role that would fix this.

Speaker A: And um, if we just double click there on that disconnect between leaders perception of AI readiness and the reality of some of the obstacles in the way. I'm, um, curious, from your perspective, what are organizations underestimating most when it, when it comes to becoming AI ready, what are they forgetting or underestimating?

Speaker B: I Think that the perception is that AI is about the tooling and how we can build our agents and how we can transform the organization around them. And I think the real disconnect is that the foundation isn't there. It's things that I call below the waterline effort. So that's lineage, it's the data definitions, it's profiling, it's enrichment, and that's quiet and tedious work. But that's what makes AI trustworthy. And often it has not been funded in these organizations. It's been seen as something that, as we do a large acquisition or we build out a new set of systems, we come to the sort of technical debt that we leave, uh, in the organization, say we can fix that in post. Uh, and post is here, the time is now because AI is now looking at this data. It's finding, uh, data that's in our graph, it's going in and accessing data that's in our systems like Snowflake or databricks and coming back and we're like, where did that come from? Where are these insights? Where did you find that data? Uh, and that's really getting back to the fundamentals that data preparation and data quality leads to data integrity.

Speaker A: And what would you say the biggest obstacle is to AI success right now? And do you see this changing in the immediate future? As we look towards the end of 20, 26, 27 and beyond, uh, do you see this changing? And what is that big obstacle?

Speaker B: I think it truly is down to the data quality and the building out of high quality data with a strong semantic layer. So what I've been seeing is we do a lot of these pilots now. We build out, we have a hypothesis, we're going to go test this out. We have a tool, uh, that we're going to work with and we build this well contained pilot or proof of concept that sits within its walled garden. We curate the data for it very carefully. It's highly observed by all the participants that are in the process. And it's deemed like this is a successful pilot, we're ready to go to production. And production means setting it up in the wild. Now it doesn't have that continuous observation that was going on during the pilot phase. It needs to move more into its autonomous nature and all the tooling isn't there to keep it within its bounds. And so we'll come back to it and say, oh, well now I'm getting disappointing results. And we start to, to poke into that. And it was now in a broader set of data Or a different set of users who are coming at the model, you know, 90 degrees off of where we thought we were coding for. Uh, and our agents are suddenly returning very different results. And so I think from an organizational perspective, it's an investment in the, the maintenance of a model once it goes, um, an agent, once it goes into production, and, and the diligence it takes to build and curate the data set in context for it and for people listening.

Speaker A: And your words are really resonating with them right now. How can organizations address the critical data integrity gaps that we're talking about today? Because they keep persisting. I suspect we've got people around the world nodding in agreement here, but what should they be doing to address this?

Speaker B: I think of this like a layer cake. We start off with our source systems that are emitting the data that we want to be using. Often in this day and age, it's now SaaS applications, it used to be that we'd have on prem and cloud that we were working with, but each of those is building its own sort of data silo. Uh, so first of all, we have to have a plan on how do we create a cohesive data fabric that we have system one and system two interchange their identities so that I can watch a transaction flow from one to the other and see the sort of, um, totality of that record. So we start within. Precisely. We start within those source systems. We use a methodology called elt, Extract, load and transform. Um, for me, that was a really transformational change in the industry when we went from ETL to elt. It meant that we could get the data in its native form and transform it after the fact and make it fit for purpose. We do those typically in zones. We'll land at raw. We'll build out our cohesive data, um, component data within there. And at the end of that, we build out the sort of golden layer. This is where our data products live. And a data product is really fit for purpose. Um, how do we look at our opportunity and pipeline data? How do you look at bookings, how do you look at the, uh, customer engagement? Each of those becomes a data product that we then create its own data catalog on. And the catalog helps document what the assumptions were, what the meaning, what the origins, the lineage of the data that's in the catalog. And that's kind of where we had gotten to when we're building out for bi, as we move into AI, the missing layer on there was including a semantic layer, and that's really where I see us building now. If I'd say, where are we? If you look at the map of the world right now, we are at building out semantic layers so that our agents have context to be able to apply this consistently so that we can get to autonomous agents. And so without that context layer, without those semantics built in on top of this catalog, we're going to have agents that are operating autonomously, but are, uh, using their own intuitive nature of what they think the data should be and rather than the context that we're applying to it for the organization.

Speaker A: And as a data whisperer, we'd say, what's great experience in this field? I'm curious, was there anything in the report that you found surprising that caught you off guard, that you didn't expect to see? You've probably seen so many trends over the years, but anything particularly surprised you this time?

Speaker B: I did think that the disconnect between sort of the posturing, the when we were asked the organizations, are you AI ready? 87% came back and said, we are AI ready and we have all the tools and we have the infrastructure. And then we turn that question right back around and say, what's your biggest barrier? And they say, well, it's the tools, it's the infrastructure. So there's this confidence that we're projecting to the, to our external entities. But when we come back to when we say, how do we get down and execute on it? That's where we see that sort of data integrity gap. The other thing I think that was, I think, heartening for me, uh, out of the study was the focus in on organizations that had existing data governance programs and folded AI governance into data governance, were far more successful than those who were building out separate data governance programs. And so I think there's a real lesson for this, is that data governance and AI governance are not two separate entities out there. They're really part of the overall use of data and how we make that autonomous. And those two programs should fold together.

Speaker A: Uh, many leaders listening now, naturally expect to see fast ROI from those AI projects there. And that tech does love a good acronym is too right there. But I mean, relatively few organizations have clear metrics that are tied to business KPIs. So why do you think measuring AI value is proving so challenging for some organizations? And how are they currently approaching this wrong? Uh, are they not measuring the right things? What's happening here?

Speaker B: I think there's truly a disconnect because many of the metrics that I see organizations discussing are volume metrics. How many queries did we run how many new agents did we release? How many tokens did we use? I can't believe that token maxing is a term we're thinking about as a measure of success and it's not tied back to uh, the four fundamentals. Is it improving our revenue? Is it decreasing costs? Is it improving service? Is it decreasing risk? And by tying those back to the actual outcome metrics, that's where we're looking at that maturity gap and I'm not seeing it. We're designing these sort of agents, we're throwing them out there and they're not tied back to real business metrics that make a difference.

Speaker A: I guess the big question, which is almost a uh, podcast episode entirely on its own, and it's impossible to answer properly, I guess, but how can a business get the best out of those AI investments and how can they ensure that AI delivers that measurable value?

Speaker B: I think it's going back and it's co creating. So when you have your business leaders and your technology leaders working together to co create, you start with what is the business outcome that we want to achieve? And designing the entirety of that agent and that experience to how do we prove that we did what we said we were going to do? And that's, you know, making, ensuring that we have the right inputs into the, into the agent, that we're collecting the right components on there and that we build things like observability into our agents that allow us to measure both the, the, the relevance and coherence and correctness of the, of the agent itself. But, but the outcomes that come out of that. And that is I think, a step that we're missing in the process. Right now we're very focused in on how do we get this agent to achieve what it did and not how do we collect the data that's going to allow us to prove that it's operating with integrity, that it's achieving the outcomes that um, we're looking to achieve with the model. And that's uh, that it's a missed step in the process. I think we're trying to get to the flashy outcome without all the diligent work along the way.

Speaker A: And I guess it's slightly unsurprising that organizations with strong data governance, they're the ones that are reporting higher trust in their data. But what is the real world impact of that trust when it comes to AI outcomes? And how do you think agentic AI will affect this? Because again, big topic this year.

Speaker B: Yeah, I think we're in to go back to the Early Internet years, we're in the 1.0 experience, right? Uh, we're building summarization agents. It's sort of very human oriented, human initiated AI. Where this is going to go is going to be uh, autonomous AI. And so I almost think of this as we're going to build out an agent as an employee, we're going to give it a job, we're going to um, give it supervision over that, but expect that to work uh, behind the scenes and report back when it needs help. So that pivots our view of how we're going to engage with AI from this sort of summarization and contextualization agents to the autonomous agents operating in the background and us being able to confidently observe their outcomes and that the tech's not there yet, the stack doesn't exist that allows us to govern and manage and observe. We're so focused in on the build and the agent component right now.

Speaker A: And um, in a world where many leaders and indeed workers and as consumers we're bombarded with so much information it can feel um, incredibly overwhelming. So I always try and give everyone listening a few valuable takeaways. So what would you say are a few governance best practices that organizations must be prioritizing right now because it's easy to get distracted by the shiny big features and things. But what should they be prioritizing now in terms of data governance, do you think?

Speaker B: Here's what I tell my team M the governance has to produce automation, not documents. Policies don't scale AI, automated controls, quality checks, uh, they're embedded in the pipelines, access rules, lineage tracking, privacy enforcement, baked in cicd, uh, like this, this whole concept around ML ops, that's what actually moves the needle. And so when I see a governance program that's focused in on, you know, we have a 100 page document all about our AI policies in there. Like that's got to be baked into the code. Uh, when governance is in code it accelerates that innovation rather than slowing it down. And I think that's the mindset shift that we need to make.

Speaker A: And I think over the last 12 months we've all heard the phrase no data, no AI. So how can organizations better ensure that they provide that highest quality data for their AI implementations? Any tips there?

Speaker B: I think you have to look at our landscapes and most of us have a, uh, fairly complicated data landscape that's out there, multiple tools that are uh, specific for the functional purpose, SaaS, applications that are functionally isolated, uh, but that we need the overall visibility to. So I would look the first step in that is how do you build that sort of cohesive data landscape where your apps are performing the tasks that they need to, but you're building that sort of data catalog off of that. If you haven't built out that inventory, if you haven't looked at how the systems, uh, are identifying the same entity and building that sort of, that cross reference between those, if you haven't built that into your integration, that's really an area that you need to start. Because most of the data sets that we want to work with are not singular in their nature. Uh, I call it the power of. And we want to combine the data from our CRM system with the data from our ERP system. And so the two of those together means that we have to have a complete exchange of identities between the systems managed in a consistent fashion. That sort of data catalog and building out the fundamental architecture allows you to get to the next step. For a lot of the agents that we're working with today, they're actually working off established data products from our data catalog rather than with the native data that sits in the application. So I hear a lot of pressure now, can't you just spin up an MCP server, let me get access directly into the CRM data, or I want to get into the ERP data and I'm just going to go at it. And what, uh, we focus on is building out that catalog experience so that we're already creating the cohesive, denormalized view of those two things with enrichment in it, with governance baked into it, and observability so that we can see where the data is aligned and it allows the agent to work with clean data. And so I arc up an MCP server, go at the system view of the world is kind of, um, short sighted.

Speaker A: And, um, we will have people listening that subscribe to tech newsletters, listen to podcasts, continuously involved in forums and tech discussions. And as I said, it can get incredibly overwhelming with so much information. So if anyone listening from an organization that wants to make big changes and if, if they could remember one big takeaway from our discussion today and indeed the findings from your report, what, what do you think that should be?

Speaker B: Um, data integrity comes first. That's uh, autonomous AI, amplifies whatever data it's fed, good, uh, or bad. So you have to start your AI programs with understanding in your data landscape where you have quality curated data in context and already your semantic layer to make that consistent. If you don't have that, you're going to have poor, um, results from your agentic experience. The other thing I would say is define clear ownership. Um, assign responsibility for the data inputs, the model behavior, the oversight, and build those guardrails into your data products. Um, they have to scale with the new regulations and, and with the fast changing use cases. Uh, so I think from an organizational perspective you have to balance speed with control. We're constantly getting bombarded with new components, new capabilities. Uh, I used to say that six months ago our organization was learning to spell AI. Now every single one of my users has access to, uh, whether it's Copilot or whether it's Claude or other models, and they're looking to unlock the value out of that. And I think that as we move with speed and we're exploring into these new use cases, we have to ensure that what we're doing has those safeguards, uh, to prevent bias and to protect us from risk.

Speaker A: Well, thank you so much for sitting down with me today. For everybody listening, I'm going to include a link in the show notes to the 2026 State of Data Integrity and AI Readiness Report. I do urge people listening to go check that. So many big stats in there. It's a great read. I'll also include a link to your LinkedIn and indeed the Precisely website. But more than anything, just thank you for joining me today and bringing all this to life. Thank you so much, Neil.

Speaker B: I appreciate it. Thanks for having me on.

Speaker A: I think the big takeaway here is that AI success begins long before anyone launches an agent or tests a new model. Because Dave brought the conversation right back to the work that many organizations would rather avoid. Data definitions, profiling, enrichment, governance, semantic layers, observability, all these things, they don't, they don't make the headlines, but they do ultimately decide whether AI can be trusted in the real world. And I also liked his point about governance needing to produce automation rather than just documents, because a 100 page policy will not scale with autonomous AI. And controls need to be built into the pipelines, quality checks, access rules and privacy enforcement, and indeed the way that data products are created and monitored. So for business leaders listening today, there's a clear warning here. Measuring token usage, query volume, or the number of agents launched. All these things sound productive, but those numbers mean very little unless they connect to real business outcomes. So over to you. Is AI improving your revenue? Is it reducing cost, improving service, or reducing risk? This is where the conversation has to go. Dave's message today is quite clear. Data integrity comes first. Autonomous AI will amplify the data it receives. And that ultimately means data will create poor outcomes faster than ever before. Or to put it more bluntly, yeah, garbage in, garbage out. So I'll include links to precisely Dave's LinkedIn profile and the 2026 state of data, uh, integrity and AI, uh, readiness report in the show notes. And you can find a blog post associated with this episode over@techtalksnetwork.com. so please, I invite you to let me know what you think. Are you and your business moving too quickly with AI agents before they truly understand, before truly understanding the quality and context of the data behind them? How are you getting around this? What are you doing? Let me know. And while you marinate on that, I'm going to walk off into the sunset. But I will return again bright and early tomorrow. Thanks for listening. Speak to you then. Bye for now.

More from Tech Talks Daily

All episodes →

Explore the best B2B AI & Data podcasts →

Listen to this episode All Tech Talks Daily episodes →