The AI Guide Everyone Needs in 2026 ft. Gaurav Pathak

Bringing Data and AI to Life · 2026-04-02 · 24 min

Substance score

27 / 100

Five dimensions, 20 points each

Insight Density8 / 20

Originality5 / 20

Guest Caliber4 / 20

Specificity & Evidence6 / 20

Conversational Craft4 / 20

What our scoring noted

Our reviewer’s read on each dimension, with quotes from the episode.

Insight Density

8 / 20

A handful of useful conceptual distinctions emerge (grounding data vs. training data, synthetic data for long-tail scenario coverage) but they are buried in vendor-speak and high-level generalities that a data-literate operator would mostly already know. There is little per-minute density of non-obvious ideas.

for every single mile of real life data they had 20 million more miles of synthetic data for every single mile that they generated using a synthetic data generation techniques

the function of governance is not to thwart innovation, it's not to stop it, it's to establish clear, transparent controls so that the innovation is even faster

Originality

5 / 20

The episode leans entirely on recycled analogies and talking points already saturating AI discourse—the Linux moment, the space race, governance-as-guardrails—with no first-principles or counterintuitive arguments offered. The 'master before they master us' line epitomises the platitude density.

I wonder what's going to be the moon landing moment for AI

I think we all need to master before they master us

Guest Caliber

4 / 20

The guest is an internal Informatica product VP being interviewed by an Informatica sales VP; this is a company promotional conversation rather than an independent practitioner sharing battle-tested external experience. No external operator credibility is demonstrated in the transcript itself.

He's a Vice President of Product Management for AI and Metadata at Informatica

I've been at Informatica for almost three years and I like to consider Gaurav my partner in crime

Specificity & Evidence

6 / 20

A few named references (Waymo, DeepSeek, OpenAI O3, Grok 3, Hugging Face, Apache license) provide some grounding, but the Waymo statistics are presented confusingly with no clear source, and there are zero named customer examples, ROI figures, or project timelines anywhere in the episode.

read somewhere that they were using like 20 million miles of real life data to train their self driving agent

for every single mile of real life data they had 20 million more miles of synthetic data

Conversational Craft

4 / 20

The host exclusively asks broad, leading questions ('What are the key hurdles?', 'What are you excited about?') with zero follow-up probing, zero pushback, and constant affirmation; the episode is further interrupted by a live promotional ad, cementing its nature as a marketing vehicle rather than substantive dialogue.

I am so excited to have you on today. Thank you so much Gaurav for being here today

That's a great question

Conversation analysis

Computed from the transcript - who did the talking, and the verbal tics along the way.

Share of words spoken

Speaker C76%
Speaker A21%
Speaker B3%

Filler words

so56like23right20you know6kind of6actually2I mean1basically1

Episode notes

AI isn't just a technology trend. It's fundamentally reshaping how enterprises manage, govern and activate their data to power intelligent systems. In this episode of Bringing Data and AI to Life, host Amy Horowitz, GVP Solutions Sales and Business Development at Informatica sits down with Gaurav Pathak, VP of Product Management for AI and Metadata at Informatica to explore how organizations can land on the winning side of this transformation. What You’ll Learn How to shift from manual, code-driven data pipelines to AI-assisted automation The critical role of data quality and governance in agentic AI success Why the "long tail of real-world scenarios" is your biggest AI challenge The three innovation vectors reshaping AI right now How to approach open-source and international AI models with strategic clarity The governance framework that enables rather than restricts AI experimentation If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts and Spotify. Instructions on how to do this are here . This podcast is

Full transcript

24 min

Transcribed and scored by The B2B Podcast Index.

Hello, we are Bringing Data and AI to Life, a podcast by Informatica. I'm Amy Horowitz, our VP for Solution Sales for data Integration and Data Governance. And I'm Nick Dobbins, Worldwide VP and Field CTO here in Informatica. If you've ever felt lost in the chaos of data and AI, you've come to the right place. We'll be conversing with industry experts who are here to shed light on the challenges all rocked up within these arenas. We're here to bring clarity to the chaos myth, busting the confusing parts and providing insights and guidance complex data problems by delivering trusted data for analytics and AI. Welcome to the Bringing Data and AI to Life podcast. Today I have a distinguished guest with me, my very good friend, Gaurav Pathak, who recently has been nominated for Data IQ's 2025 data and AI leader of the Year. This is so exciting to talk to you today, I cannot wait. Gaurav's journey in the data and AI domain is nothing short of inspiring. I've been at Informatica for almost three years and I like to consider Gaurav my partner in crime. I'm very excited to talk to today. He's a Vice President of Product Management for AI and Metadata at Informatica, driving innovation and transformation with data centric solutions. Let's dive into your experience. I'm so glad to have you here. Welcome Gaurav. So listen, every person I talk to, AI is top of mind. Whether it's at the dinner table, whether we're at a customer or prospect event, or even our partners. Let's dig into that if we can. So my first question and I really want to dig into with your extensive background in data strategy, how do you see the integration of AI really transforming traditional enterprise data management practices? What have you heard? What do you think the trends are? So traditional data management, the art, the science, the practice of collecting, organizing, making sure that the data is processed at the right cost structures, all of that is data management that involves the process of data integration, data quality, data cataloging, data governance, master data management, a lot of different sub disciplines that feed into it. Traditionally this has been a very manual exercise, mostly code driven, people using code, writing code every time. If you write a pipeline that reads from Salesforce to write to any cloud data warehouse to write the connector to Salesforce, first make sure that you understand those APIs and then start creating the data models in Snowflake. In the meantime, making sure that you are taking care of data quality, all of the regulations that you are required to comply with. Not sending sensitive data across GEOs. Right. Or to users who should not have access to it. Mastering important golden records which are important to the business, whether it is customer, patient, whatever be the line of industry that particular organization is in now, what AI is doing in this space is is looking at automating the most tedious, the most boring tasks that humans may realize. And we've been at it for a very, very long time now. Whether starting from the way we created connectors in the power center world so that you don't have to create connectors every time you provide a data pipeline, you just connect to something and are able to then focus on the transformations part of it. All the way down in 2010s where we started looking at these tasks from a predictive AI lens and what are some of the things that predictive AI can automate. So for example, for data stewards when they are doing data cataloging, one of the most tedious tasks is associating business concepts, business glossary terms, to technical data sets that may be coming from all the different sources of information within an organization. Doing that takes months and months of tedious time. And doing that by hand, it is never good work or party conversation for that matter. Right. So for that we used AI and predictive ML techniques to automate a large portion of these tasks. And then going from those predictive ML tasks to a human in the loop assistant, that helps understanding what is the nuance in this particular task that the user is doing and helping them with it. So that's where we reach the copilot and then conversational AI assistants that are in the market today. And then a lot of that coming out, especially from Informatica. We've led that as a charge from the clarity. But then the next step of all of this is going towards even more automation, giving AI lot more agency, giving AI capabilities that can help us design 1,000 pipelines in a day and make sure that all of those business outcomes are met. They are designed with high quality. They take into consideration all the governance, compliance, the regulatory environment that this particular organization is in, as well as organization perspectives around it as well. So it's a whole lot. But glad to be talking to you about all of those things today. I want to talk quickly about challenges that organizations are facing today. If you could summarize in your opinion, Gaurav, what are some of the key challenges organizations are facing when they're trying to adopt AI driven solutions and how can they overcome these hurdles? I heard you Talk about quality. Heard you talk about timing. What are the key hurdles that you're hearing from customers? So first I think let's understand what are the organizations doing internally? What are some of the new projects that they're launching? And no surprise, a lot of them are about AI using generative AI, using new AI agents to make sure their business processes are more automated, streamlined. They are able to do customer success support and all different use cases using these new technologies. Now you and I both know, Amy, that the foundation of all of this is data and good quality data. You can have the best AI technology, but if it does not have information about what this particular customer is looking for, when they are coming to you with a customer support request, it will fall on its knees. It will not be able to serve them better. What this new revolution has pointed to is if anything, you know, getting that data house in order. So some of the challenges that we have seen organizations facing now is figuring out what some of those metrics are, quality metrics for the organization. We hear a lot about AI agents. AI agents are these new automations that perceive the environment as a classical definition, something that can perceive an environment and can take an action in it based on the goal that is given to it. Now, very simple definition, but if you have to think of like real agents, AI agents that are in the world today, like yourself, driving cars, being able to understand every single kind of scenario that can develop in a real life situation. So you're driving down highways or roads and then for that there, the best way to do it is to collect data, understand what those scenarios are and make sure that AI can behave well in those scenarios. That's easier said than done. You have to first train that agent in the lab. You have to give it those scenarios. Environments can be modeled as data sets and these data sets create those scenarios. Now what is has been the biggest challenge in the AI world is the long tail of real life. So you can teach it the 50% of when everything is all right. But God forbid, if AI has to drive on Indian roads, for example, it will see something that it's training data for sure. So the goal is to give it data and training safely without it harming in real life scenarios. So this long tail of data, this long tail of scenarios, how do we give an AI that particular thing? That's what where data management comes in, being able to create simulations for this AI agent, synthetic data which is based on real life scenarios that can be given to an AI agent. And then there are now techniques that you don't even need to get the AI agent on Indian roads for it to learn driving on roads there. So all of that is done through data and data management practices. It's very, very important to make sure that the data quality of such data sets is is important is properly done. And I don't think for this kind of scale that we are talking about, only manual processes will be enough. For example, VEMO uses a lot of training data to train itself driving car read somewhere that they were using like 20 million miles of real life data to train their self driving agent. But for every single mile of real life data they had 20 million more miles of synthetic data for every single mile that they generated using a synthetic data generation techniques and AI techniques so that the Waymo models are safe, governed, are for the task, you know, can work for the task that they are good for. Now take that example and multiply across all the different use cases that we have. Whether it's customer success, hr, whether it is marketing, sales, product development, coding, all these require the modeling of that long tail. All these require the modeling of the data that goes into that long tail, the scenarios or that data quality governance of data. You don't want data of user faces for example when you're creating self driving data because that's regulated and there's privacy concerns around it. So doing that at scale data governance, data quality, all of the Data Management foundation as number one, number two skills in an organization. We talk about how the world is changing so fast and people are fearful about what happens to their jobs and the automation that's bringing in. But I see it from a different perspective. I see it from a perspective of all the new things that we have to learn to keep ourselves competitive in this environment. And I think AI tools definitely are one of those things that we all need to master before they master us. Hey there listeners, it's Nick. I hope you're enjoying this episode. I wanted to let you know about a special upcoming event Informatica is having so the incredible Informatica World. Our annual premier AI ready data management conference. So this year we're coming live from Mandalay Bay in Las Vegas on the 19th to the 21st of May. We'll have keynotes from industry leaders, we have a bunch of technical tracks, AI data initiatives and of course the Informatica Innovation Awards. It's not to be missed and we hope to see you there. Thanks for tuning in. Enjoy the rest of the episode. So I would be remiss if I didn't bring up agentic AI we hear that everywhere, it's all over the Internet. Tell me your thoughts about it. What does it mean for people that have never heard about it? My definition was a more simpler definition of AI agent. And even from that simple definition, let's look at what AI has done for us in the past few years. It was there recommending things. So if you are on a social media platform, it would create a feed for you, it would recommend based on what are the things that you've liked in the past. These are the things that you might be interested in. Sometimes it was hit, most of the time that was a miss. But you know, it kept learning and then getting better. So most of it was recommendations or buying this. You may like to buy this based on reviews and so on. And since then we are now in a world with self driving cars, Waymos, eglas and soon a lot more. We are in the world of agents or AI systems that can take decisions on behalf of the user within an environment and are able to get the goal done for whatever that end user user wanted. So this is going to only accelerate in the next few. We are seeing innovations in three different vectors here. So one is the models themselves are becoming lot more intelligent. And then we had models like O3 from OpenAI or today Grok 3 was released from XAI or Twitter. And then, then of course we had the open source versions like Deep SEQ that came out that actually brought down the cost of including these very, very intelligent models within your products or within your services as well. So models on an exponential development track. I think at this development rate, by end of the year we'll have intelligent and human models too cheap to meter, which is great news. So we can get a lot of our things automated using the intelligence. This is raw intelligence. Then we have to think about the second vector, which is data. Models themselves are no longer like commodities are interesting word, but models will be available to everybody. It's the data available in the right context, with the right quality, or what's called memory in AI. How do we give these models the right memory and update it as time goes on? And that's where the second vector of innovation will happen. Informatica plays a big part making sure that this memory is 100% accurate. Real time has all the latest data, whether it's from a support systems or emails, or your salesforce systems or any systems, basically all integrated very well together. And third is of course the human element that we talked about, making sure that the humans are ready for all of these innovations. So tell Me one thing or two things that you're so excited about that you see from a futurist perspective, where are we going with AI? And more importantly, how can our prospects and customers get prepared for this? In the forefront you're learning all this, you're teaching all this. Some of us that are not there, what are you excited about? What does the next year look like for you and how can we prepare? That's a great question. So for me, I think one of the things that, like we were discussing earlier on the use cases of AI, when we were working on these AI systems, we were thinking that the kind of things that AI will democratize or AI will automate first will be things like laundry and will be things which will be like dishes. I wish we had robots for that already. But it's not only going into robotics, et cetera. It's also looking at creative tasks, whether it is create generation of videos, generation of stories, all of those interesting things, including education. So this democratization and automation of creativity, I think surprised me and I think is the one that will be the most interesting to watch in the next few. We already have systems on the coding side, for example, that you could give a website to, and then they can start creating a replica of that. And then they had videos of giving it YouTube and then trying to create YouTube with it. And then YouTube took of course 2006 and all of the infrastructure side, of course, to build that kind of site today, 15, 16, 20 years of work. But the question is not to replicate YouTube. I think we will get into a world where people will be able to creatively visualize what they want to see and ask the AI to generate a story based on what they want to see and then just tell the system that, and then the videos or animated series that can be created on that. So creativity at scale. And I think that's number one. Number two, all the things that we do with data, I think, you know, I've been in this industry for such a long time. I've seen how we have looked at making business processes easier, automating through multiple different layers of foundation, whether it's creating 800 connectors that users can use right away to automations in classifications and then governance and other places that make the next steps easier as well. I think we are going towards a world where we can give these AIs larger goals. We can tell it to create a pipeline with certain characteristics, certain runtime and optimization characteristics, and it will be able to do so. And of course humans then review that code make sure that that is all right and only then have it go ahead. I mean, all in all, if you have to ask me what I'm most excited about, I think it's a lot of free time and with AI doing a lot of the work that we did not like. What is AI governance, what are you hearing about it in the industry and what can our customers and prospects do to prepare for this? AI governance is one of those hot topics in the industry right now. There are so many different sides of it and you are yourself expert in all things data and AI governance. So it'd be great to have this topic to discuss on. What I'm seeing with the customers are things about the wild, wild west of AI. They are seeing how AI models are being downloaded left and right from sites like Hugging Face, which is a good thing. Being able to bring those open source models and lots of experimentations going on across the enterprise. Now, while experimentation is great for innovation, that makes our AI IT teams, legal teams, governance teams, a little bit more nervous. And I think that is the function of governance, right? The function of governance is not to thwart innovation, it's not to stop it, it's to establish clear, transparent controls so that the innovation is even faster. Right? Nobody is thinking and wondering while downloading the model from Hugging Face whether the data that they're trying to use is right for this AI model, should they be doing it or not? If there are clear and transparent policies around that, I think everybody moves faster. So the key things that people are seeing the governance around these AI models, what models are good for the enterprise or approved by the enterprise to use for experimentation and in products versus what are not so clear policies around that as well. So lots of governance teams are starting to write policies, right? Or in the early stages of thinking about policies for those. Second is of course, data. What kind of data goes into AI models? And there are two different kinds. One is the grounding data, the data which tells the reality about the enterprise. There is nothing, there's no new process here that we are trained AI model on. For example, you could train AI model on converting natural language to SQL or to drive a car on the highway, et cetera. But here we are not teaching it a new skill, but here we are giving it the realities of these are my 10,000 customers. And out of these 10,000 customers, each of these customers have this customer success profile and these what their users have asked in the past on the support channels, et cetera. So giving that grounding data to AI so that it can Then process it and establish the right decision frameworks, the right analytics frameworks for those as well. So the grounding data and then the training data for it as well. The training data is all about how if we are giving it new skills, if we're trying to teach an AI, like we are trying to teach AI all things data management, whether it's creating a new master data management model or creating the right data quality rules, or creating the new data pipelines for that, these are new skill building initiatives. And then there we give it examples of this is a good pipeline, this is a bad pipeline. These are the kind of tests you have to write for a pipeline. So the data for that, so data management and data governance of all of that data that feeds into these pipelines, Very, very similar to Amy, all the work that we did in data governance. What are your thoughts? Yeah, same. I was actually going to just comment on what you said about governance. People think that it's like no, stop, slow down. When in fact it's about putting the guardrails around it to be successful. I think what you said, you know, legal to me is a big piece of this in legal organizations, they're very skittish about wanting to leverage AI because of what you just talked about leveraging bad data to make decisions, giving people almost the wild wild west to do whatever they want. And I think what I'm hearing and when I talk to other customers and prospects in the industry, it's exactly what you said. It's we have to do it. We are not going to slow down production and we're not going to slow down innovation, but we've got to put some guardrails around it. So I feel like we're on the same page there. Let me ask you another question. We here on the news, we hear about other countries coming up with models and going to market faster without requiring all the storage on the back end. I feel like we're almost in the arm like the space race that we were in when we were younger, that other countries want to get there first. Are you seeing the same thing you mentioned? Other models that were being developed and coming out? Are you seeing that still go forward? Are you seeing other organizations and cultures and countries develop these models? That's a great question. And I wonder what's going to be the moon landing moment for AI. But I think one of the things we have seen recently is the Linux moment for AI which was release of this deep seq model that came out of China, the open sourced reasoning model which was claimed to be lot more cheaper to train compared to the models that we are training here in the United States. Gave a completely different perspective on how this intelligence is getting built and getting deployed in real world. I think it was close to being the Linux moment because something like that Deepsea model, even though it originated from China, was open weights, which meant that any organization and was Apache license as well, which means that any organization could download it, deploy it on their cloud. Right? Lots of our partners like Azure AWS already did so made that model available for everyone to use. But more than that, fine tune it to your requirements. If you require a reasoning for creating a better customer success chatbot, give it your customer success transcripts and the model learns and then reasons a lot better. So reasoning which was thought to be like a $10 million, $100 million model is now much, much cheaper as well. So that is all thanks to the open source community that's coming around AI. I think Meta did a great job with open source Loom Hub and I've seen that in the community in sites like Hugging Face where you have now more models that you can count, more AI models that you can count. In fact they have an AI marketplace sporting. So my take is for organizations adopting them, adopt them with open eyes. Try to understand whether you are adopting the model itself or you're trying to look at the entire application. It is when you're looking at the entire application is when you're sending your data to sovereignties that you cannot govern. And that I think is definitely something that everybody should take very seriously. Try to understand what the requirements around that are and making sure you avoid those. But at the same time, when you're looking at open source models, things that you can deploy in your own clouds and your own firewalls, I think there is definitely a lot of scope to experiment and innovate. Listen, I am so excited to have you on today. Thank you so much Gaurav for being here today and to our listeners. Thank you again for listening to our podcast. Please remember to share with your colleagues. You can download it where you get your podcast today. And let's Bring Data and AI to Life. Thanks. Thank you Amy. Thanks for having me. Thank you. Bringing Data and AI to Life is brought to you by Informatica. To find out more about Informatica and how our intelligent data management cloud can help you achieve better business outcomes, head to informatica.com stay tuned for more illuminating discussions until we meet. Next time, keep harnessing the power of data and AI to bring transformative outcomes to your life and business. Make sure to click subscribe so you don't miss any future episodes. And tell your friends about us too. On behalf of the team at Informatica, thank you so much for listening.

Listen to this episode All Bringing Data and AI to Life episodes →