AWS re:Invent 2022 - Why operationalizing data mesh is critical for operating in the cloud (PRT222)
AWS re:Invent 2022 - Why operationalizing data mesh is critical for operating in the cloud (PRT222)
As companies look to scale in the cloud, they face new and unique challenges related to data management. Data mesh offers a framework and a set of principles that companies can adopt to help them scale a well-managed cloud data ecosystem. In this session, learn how Capital One approached scaling its data ecosystem by federating data governance responsibility to data product owners within their lines of business. Also hear how companies can operate more efficiently by combining centralized tooling and policy with federated data management responsibility. This presentation is brought to you by Capital One, an AWS Partner.
ABOUT AWS Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.
AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.
#reInvent2022 #AWSreInvent2022 #AWSEvents
Content
0.33 -> - I'm Patrick Barch.
1.5 -> I am a senior director of product
management at Capital One.
5.52 -> I currently lead product management
7.44 -> for Capital One Slingshot,
9.87 -> which is a new product
11.55 -> to come out of a new line
of business from Capital One
14.16 -> called Capital One Software.
16.425 -> We announced this business in June,
18.48 -> and it's dedicated
19.313 -> to bringing our cloud and
data management products
21.54 -> that we've built internally to market.
25.32 -> I am here to talk about data mesh
27.84 -> and how we operationalized
29.37 -> some of the core principles
of data mesh at Capital One.
33.12 -> The story has roughly three parts.
35.82 -> I'll talk about our journey,
37.2 -> I'll talk about how it applies
to the data mesh principles,
39.84 -> and then I'm gonna walk
through four sample use cases
42.6 -> to try to ground what we did.
46.56 -> But first, some background
info you may not know
49.2 -> about the company.
50.67 -> From our first credit card in 1994,
53.04 -> Capital One has recognized
55.02 -> the data and technology can enable,
57.66 -> even large companies, to be
innovative and personalized.
61.02 -> And about a decade ago,
62.64 -> we set out on a journey
to completely reinvent
65.82 -> the way we use technology
67.59 -> to deliver value to our customers.
70.17 -> We shut down our data centers.
72.06 -> We went all in on the cloud.
74.1 -> We re-architected our data center
75.84 -> or our data ecosystem in the
cloud from the ground up,
79.26 -> and along the way,
80.31 -> we had to build a number
of products and platforms
83.79 -> that the market wasn't offering yet
85.83 -> that enabled us to operate at scale.
92.04 -> Let me take a step back
93.33 -> and walk you through
some of the key learnings
95.94 -> from our journey
97.65 -> but the macro context
99.63 -> in which we're all operating these days.
102.15 -> Moving to the cloud creates an environment
104.43 -> with way more data coming
from way more sources
107.91 -> being stored in way more places,
110.43 -> and your analysts and scientists
112.35 -> are demanding instant
access to all of that data
115.14 -> via self-service
116.55 -> in the tool and consumption
pattern of their choice.
119.46 -> That's all happening against a backdrop
121.59 -> of patchwork privacy legislation
123.39 -> that's popping up all over the world.
125.43 -> So, like, how do you
manage something like that?
128.43 -> And, oh by the way,
130.08 -> you have to get this right because,
132.54 -> pick your phrase,
133.56 -> data is the new oil,
134.64 -> data is the new gold,
136.38 -> at Capital One, we say data's
the air that we breathe.
139.02 -> You know, companies are recognizing
140.67 -> that the key to success in
today's tech-driven landscape
144.36 -> is creating value out of your data.
147.18 -> So no pressure.
151.71 -> Early in our journey,
153.33 -> we knew we were gonna
have to think differently
155.04 -> about some of the challenges
in our data ecosystem,
158.43 -> so we deliberately invested
in product management
161.4 -> and design thinking.
162.69 -> And so we pretty classically started with
165.06 -> who are our customers,
166.29 -> who are our users,
167.94 -> what are their jobs to be done,
169.53 -> and what are their challenges.
171.72 -> And what we found is that you've
got your teams responsible
174.72 -> for publishing high-quality
data to a shared location,
177.99 -> downstream, you've got your
analysts and scientists
180.63 -> looking to use that high-quality data
182.49 -> to make business decisions,
184.5 -> you've got your teams responsible
186.09 -> for defining and enforcing
data governance policy
189.24 -> across the enterprise,
190.86 -> and lastly, you've got your
infrastructure management teams
194.13 -> that have to manage the platforms
196.68 -> that power all of these use cases.
199.32 -> Now, this is oversimplified.
202.38 -> These aren't necessarily unique people.
204.48 -> These are more modes of operating
206.64 -> that a single person may adopt
208.5 -> in the course of doing their job.
210.75 -> An analyst who creates a new insight
212.82 -> in something like a Databricks notebook
214.86 -> and now they wanna publish it back
216.45 -> to the shared environment.
218.55 -> So when you have all these people
219.78 -> operating across all
these different modes,
222.15 -> there is a lot of room
for miscommunication
224.1 -> and there is a lot of room for error.
229.11 -> And so you're seeing the market respond
231.45 -> by offering lots of
different types of tools
234 -> for these user groups,
235.56 -> and your company may go out
and get a bunch of these tools.
238.71 -> And so it'll be common for a single person
240.9 -> to have to hop between six
or seven different tools
243.24 -> and processes
244.41 -> just to complete a task like
publishing a new dataset.
249.12 -> And, by the way, this list isn't complete
251.94 -> or as neatly aligned as it's shown here.
254.88 -> Data catalogs these
days are being marketed
257.1 -> both as discovery tools
and governance tools.
260.34 -> Data protection actually
requires a suite of tools
263.82 -> to scan for sensitive data in the clear
266.31 -> and make sure it's protected
267.39 -> with something like
tokenization or encryption.
270.36 -> So your company takes
a bunch of these tools,
273.45 -> they piece them together,
274.77 -> and at the end of the day,
276.09 -> maybe you get something
that looks like this.
279.87 -> Now, before you take a
picture of this slide,
282.27 -> I just wanna warn you, people in the back,
284.58 -> this doesn't work.
286.241 -> (attendees laugh)
288.21 -> Why doesn't this work?
289.83 -> Well, let's look at our
data publishing friend
291.9 -> here on the left.
293.55 -> This person first needs to go
to their ETL and pipeline tool
296.49 -> to configure some jobs,
298.11 -> then they have to go to their catalog
299.64 -> to get data registered.
301.11 -> They have to make sure their
data quality is being checked.
303.6 -> They have to make sure
their data is protected
305.67 -> with the right entitlement.
306.99 -> They have to make sure
lineage is being captured.
309.18 -> And then they probably
have to go tap somebody
311.19 -> on the infrastructure team on the shoulder
313.44 -> to get an S3 bucket, an S3 location,
315.69 -> a Snowflake table created,
318.39 -> you know, and then what happens
if changes are required?
321.93 -> What happens if the schema changes,
323.52 -> what happens if there's
a data quality issue?
325.68 -> You know, how does this data publisher
327.6 -> find and contact all
of the downstream users
330.69 -> to let them know a change is coming?
333.36 -> If you're a consumer of this data,
336.12 -> how do you know whether
you're using the right data
338.91 -> across all of these
different touch points?
341.04 -> And how is anybody on
the data governance team
343.44 -> supposed to enforce policy
345 -> when they have to go to
so many different places?
348.12 -> You know, scaling this ecosystem
becomes really complicated
351.72 -> both for your engineering teams
353.67 -> that have to build and
maintain the integrations,
356.22 -> but also for your users
357.66 -> that have to navigate this place map.
363.03 -> So this brings us to data mesh.
367.08 -> I'm assuming you've all
heard of this thing,
369 -> otherwise you wouldn't be here.
372.51 -> Data mesh is a set of principles.
374.76 -> It's an architectural framework.
376.26 -> It's an operating model
that companies can adopt
379.5 -> to help them scale a
well-managed data ecosystem.
382.92 -> And, for me, the heart of this thing
385.53 -> is treating data like a product,
388.11 -> because once your company
makes that mindset shift,
391.38 -> and it really is a mindset shift
393.09 -> that requires full and
total buy-in from everybody,
396.03 -> the rest of these principles
kind of naturally follow.
398.22 -> You have to decide how you're
gonna organize, in domains,
402.48 -> those data products,
403.74 -> and then you have to enable
a whole bunch of activities
406.65 -> via self-service for
those data product owners.
410.97 -> Now, data mesh was coined
or invented in 2019
415.71 -> but it really started to
gain traction in 2020,
418.32 -> which, if you'll recall,
419.37 -> was right around the time
420.3 -> we were shutting down
our last data center,
422.31 -> and so this concept came
out too late for us.
425.76 -> But when you see how we
approached our data ecosystem,
429.45 -> the similarities are pretty striking.
434.52 -> So we approached scaling
our data ecosystem
437.37 -> really through two prongs,
439.35 -> centralized policy tooled
into a central platform
443.64 -> that then enables
federated data management.
448.83 -> I'm gonna walk through each
one of these pillars now.
453.12 -> So the first thing that we did
455.88 -> was break our lines of business
458.43 -> into discreet organizations and
units of data responsibility
462.66 -> with hierarchy,
464.31 -> but we didn't enforce the same hierarchy
466.8 -> on all of our lines of business.
468.93 -> Our big lines of business
had three or four levels,
471.48 -> our smaller lines of
business really only had one.
474.24 -> But each line of business
had the same set of roles
476.7 -> supporting it.
477.99 -> Performing data stewards
479.37 -> are responsible for the
risk of one or more datasets
482.1 -> in their business unit.
483.84 -> Managing data stewards
485.58 -> are responsible for the
risk of all of the datasets
488.13 -> within the business unit,
489.6 -> and each business organization
or line of business
492.48 -> also has a data risk officer
494.67 -> that's responsible for the entire thing.
497.97 -> Now, these weren't new roles.
499.62 -> We didn't go out and
hire a bunch of people.
501.87 -> These are all side of desk activities
503.76 -> and each one of these
people also has a day job.
508.14 -> Next thing we did was define
common enterprise standards
512.43 -> for metadata management
across the company,
515.13 -> and our big learning here
516.54 -> is that not all data is created equal
519.84 -> and you need to slope your
governance based on risk.
523.44 -> You know, we're a bank,
524.46 -> so, of course, we always need to know
526.95 -> where is all of our data,
528.6 -> which of that data is sensitive
530.31 -> and who's responsible for it,
531.99 -> but temporary user data or staging tables
535.89 -> requires a different
standard of governance
537.93 -> than data used in regulatory reports,
540.87 -> and so we needed to make sure
541.98 -> that our policies reflected that reality.
546.12 -> Next thing we did
548.46 -> was define different
standards for data quality
552.9 -> depending on the importance of the data.
555.18 -> And so, you know, if you never intend
557.64 -> to share data beyond a single application,
560.43 -> we really only enforce a bare minimum
562.26 -> of data quality standards,
563.79 -> but if you do plan to share your data
565.47 -> with others at the company,
566.76 -> now we enforce more rigorous checks
569.28 -> like ensuring that the schema
you're trying to publish
572.85 -> matches the schema the consumers expect
575.28 -> and making sure that data is complete
577.29 -> from point A to point B.
579.93 -> Our most valuable data
582.48 -> also has to pass business
data quality checks
585.78 -> like making sure that FICO fields
589.11 -> fall within the allowable range.
593.01 -> Entitlements.
595.29 -> Early in our journey,
596.73 -> every dataset was protected
with its own entitlement,
600.27 -> and so it could take
you as an analyst weeks
603.24 -> to figure out which role
you need to get access to,
605.94 -> and then when you did get access,
607.5 -> the majority of the time,
609.12 -> the data was either bad quality
610.68 -> or it just wasn't what you wanted,
612.48 -> and so the process started again
615.21 -> and rarely would that mistaken
entitlement be revoked.
619.41 -> So now you got all these
people running around
621.6 -> requesting access to data they don't need.
623.88 -> It's a time suck, it creates risk.
626.97 -> And so what we did was we created mappings
631.95 -> between lines of business
and data sensitivity,
635.58 -> and so now you as a
user can request access
637.77 -> to all non-sensitive data
in commercial, for example,
642.18 -> and you only need to re-request access
644.04 -> when you need to step up permissions.
653.09 -> Okay.
655.14 -> I just rattled off a whole bunch of,
657.03 -> if your scenario was that, do this,
659.01 -> if your scenario was this,
do that type situations.
662.01 -> How is anybody supposed
to keep that straight
664.29 -> and how is any data governance team
666.72 -> supposed to enforce all of that?
669.15 -> The answer is deceptively simple,
671.82 -> you make it easy.
673.68 -> We surveyed our data teams
676.08 -> and, by and large, they all
wanted to do the right thing.
679.44 -> They wanted to be good corporate stewards.
680.82 -> They didn't want to create risk
682.2 -> but they didn't know how.
684.39 -> Our policies were confusing
685.92 -> and our policies were opaque.
688.11 -> So how do you make it easy?
692.16 -> Well, the way that we made it easy
694.2 -> was giving our teams a usability layer
697.56 -> for them to do their work,
699.78 -> and this usability layer is aligned
702.51 -> not in terms of a technology,
704.52 -> like catalog or data quality,
706.89 -> but in terms of a job to be done,
709.14 -> publishing a new dataset,
710.76 -> finding and getting access to a dataset,
713.04 -> protecting sensitive data,
714.78 -> reconciling my infrastructure bill.
717.93 -> This usability layer talks
to an orchestration layer
721.26 -> that handles keeping all of
those different systems in sync,
724.44 -> and it also goes all the way down
726 -> to the infrastructure layer
727.53 -> to automatically provision resources,
729.6 -> whether it's a table, an
S3 bucket, Kafka topic,
733.74 -> on the user's behalf.
737.22 -> So the key to federating data
management responsibility
740.7 -> is giving your teams an experience
743.13 -> that aligns to the job
they're trying to do.
747.81 -> But, like, that's kind of theoretical
749.88 -> and so what I'm gonna do next
752.88 -> is try to ground that statement
757.53 -> in four different use cases
760.05 -> that we've enabled for
our teams at Capital One.
766.65 -> The first use case is our
data producing experience,
771.66 -> and you may wonder, like,
773.857 -> "Why do you need a data
producing experience
775.92 -> in the first place?
777.42 -> I can just go talk to somebody
on the infrastructure team,
780.06 -> they can provision me an S3 bucket,
781.92 -> and then we can use a native AWS service
784.14 -> to move data from point A to point B."
786.96 -> And, you know, that may
work in smaller companies
789.84 -> where issues of scale data governance
792.57 -> haven't cropped up yet,
794.1 -> but in large companies,
796.08 -> publishing data is like a
one to two month project.
799.59 -> You have to coordinate across
five or six different teams,
802.62 -> you have to have lots of
meetings to make small decisions,
805.56 -> and so we needed to simplify this process
808.68 -> so our teams could move faster.
812.13 -> Now put yourself in the shoes
of this data producer here.
815.97 -> All this person cares about
817.41 -> is getting their data
from point A to point B
819.87 -> so it can be consumed by
others in their company.
822.3 -> Any additional step or task
824.31 -> related to compliance or governance
827.34 -> is really just a roadblock
829.11 -> on the way to them doing their job.
832.59 -> So the first thing this person does
835.77 -> is register their metadata.
837.78 -> This is where they
provide business meaning,
840.6 -> this is where they define
data quality thresholds,
843.18 -> and this is where they
define retention policies.
846.12 -> Then, in the background, we
will register the catalog,
850.29 -> we will provision a location,
853.02 -> and we will configure and schedule jobs
856.44 -> that check for data quality
and enforce data retention.
861.42 -> The next thing this person
does is classify their data,
866.13 -> and they do this by either
approving or overriding
871.65 -> sensitivity values that have
been pre-populated for them,
875.28 -> and once they complete this step,
877.41 -> we update that registration,
878.88 -> we update that physical layer
880.68 -> to protect it with the
appropriate entitlement.
884.94 -> Next, the user configures
their data pipeline,
887.61 -> they point the system at a source,
889.17 -> they configure their transformation logic,
892.08 -> and when they're done,
893.01 -> we'll automatically build them a pipeline
895.11 -> without any assistance from
the data engineering team.
900.09 -> Once that pipeline is turned on,
902.52 -> all of those governance
steps that I configured
905.01 -> are executed automatically.
906.99 -> We'll check the data quality
for each new instance.
909.66 -> We will track the lineage
for each new instance.
912.78 -> We will scan each new instance
914.85 -> for sensitive data that's
being inappropriately loaded
918.09 -> to the target system automatically.
923.58 -> Now, lastly, and sort of most
controversially, probably,
930.93 -> for this to work,
932.76 -> you need to have one way to
ingest data into your ecosystem.
937.47 -> You need one way in.
938.97 -> Otherwise, you cannot be 100% certain
942.24 -> that data governance is
being applied consistently
944.85 -> across the enterprise.
947.67 -> But that one way in, it can't be rigid.
950.58 -> It has to be flexible.
952.23 -> It has to support the individual use cases
955.29 -> of your lines of business,
957.45 -> and only then will you...
963.3 -> Did I do that?
964.429 -> (staff speaks faintly)
965.73 -> Only then will you get the buy-in you need
967.41 -> to drive adoption.
972.3 -> I'm gonna take this opportunity
and grab some water.
995.37 -> Can you fix me in the back here,
996.99 -> or do I have to go and do it up here?
1019.04 -> Hold, I think I can do this from my end.
1021.742 -> (Patrick mumbling)
1023.96 -> - [Staff] I'm gonna help you, sorry.
1025.217 -> - Oh, yeah. (mumbling)
1032.63 -> We're back.
1035.03 -> Thank you.
1035.863 -> - [Staff] Glad I could help.
1036.716 -> - (laughs) Awesome.
1039.2 -> All right.
1051.14 -> Okay, data producer experience.
1055.55 -> It was really this
automation of governance
1058.49 -> that was the key driver
of our business teams
1063.41 -> adopting this workflow.
1064.73 -> You know, people ask us
after these presentations,
1066.387 -> "How do you get buy-in at your company?"
1068.6 -> and, you know, you need one
way in to your data ecosystem
1072.83 -> like I talked about,
1073.91 -> but it's the automation of governance
1076.88 -> that's really gonna be that carrot
1078.68 -> that drives your teams to
adopt whatever you build.
1084.17 -> The next experience is the
data consumer experience,
1089.45 -> and, again, you might be thinking like,
1092.547 -> "My company is small.
1094.16 -> I only have a couple dozen tables,
1096.14 -> like I'm fine relying on tribal knowledge
1098.81 -> to figure out which data I need
1102.26 -> and which role I need
to request access to,"
1105.2 -> and that may work when you've
got a couple dozen tables.
1107.78 -> But in a large company
with hundreds or thousands
1111.98 -> or hundreds of thousands of tables,
1115.76 -> it gets really difficult for your analysts
1118.46 -> to find, evaluate, and use the right data.
1124.13 -> So imagine you're an analyst
1128.87 -> and you wanna understand
1130.07 -> the results of a recent
marketing campaign.
1132.65 -> You come to this experience
and you search for Axiom,
1137 -> which is one of our marketing vendors.
1140.48 -> Not only do we give you
1141.98 -> a list of all of the
data produced by Axiom,
1144.8 -> but we also give you
1145.73 -> a series of recommendations and insights,
1148.31 -> we show you what data
is used with frequently
1152.48 -> the Axiom data that you're looking at,
1154.58 -> because we know that very few analyses
1156.74 -> are done with a single dataset,
1158.9 -> and we also show you
1163.52 -> information about common queries
that are run on that data
1169.34 -> so that we can maybe save you a step,
1171.32 -> and we also show you popular reports
1173.78 -> that are using that data to
maybe save you two steps.
1178.07 -> But when you're searching
for data as an analyst,
1180.23 -> you don't just want any data,
1181.61 -> like there's a lot of data out there,
1182.72 -> you don't want anything.
1183.86 -> You want the right data.
1185.69 -> But how do you identify the right data?
1187.97 -> We give our teams signals of relevance
1190.91 -> to help them understand whether
the data is high quality.
1194.72 -> They can track
1195.71 -> or check the status of
the data quality rules,
1198.29 -> they can check the lineage,
1199.73 -> they can check a profile,
1201.29 -> they can check how fresh the data is
1203.3 -> and how often it's updated,
1205.16 -> they can see who else is using that data
1207.29 -> and whether anybody on their
team is using that data.
1212.69 -> Once they understand
1213.8 -> kind of if this is the
right data for them,
1216.68 -> the next step is requesting access.
1219.14 -> And because this experience
1221.3 -> is integrated into our
identity management system
1225.47 -> and our LDAP groups,
1227 -> we know whether the
user has access already,
1230.18 -> and so we can let them know
directly in the experience.
1234.2 -> If they don't have access,
1236.84 -> through the same workflow,
1238.73 -> they can submit a request
1240.71 -> that's then routed to the
appropriate stewardship group
1243.65 -> to either approve or reject.
1251.81 -> The next experience I'm gonna talk about
1253.91 -> is our self-service data
governance experience,
1258.68 -> and what I wanna highlight here
1260.63 -> is how what we've built enables
two different persona groups
1264.95 -> to work together seamlessly.
1267.56 -> So on one hand, on the
left hand side here,
1270.2 -> you've got your risk management teams
1272.87 -> that are responsible for
defining data governance policy
1277.28 -> that is then automatically incorporated
1279.71 -> into all of our data workflows.
1282.62 -> And then on the right hand side,
1284.36 -> those same teams proactively
receive compliance reports
1289.4 -> that let them know things
1290.39 -> like what percentage of
our data is registered,
1294.05 -> how are we doing addressing
our data quality incidents,
1297.5 -> have we discovered any
sensitive data in the clear
1299.81 -> that we need to remediate.
1302.57 -> And what's cool about this
1303.68 -> is it truly does enable
seamless automated integration
1309.38 -> between these two groups.
1311.06 -> And so, you know, if one of
our automated processes detects
1315.74 -> that their sensitive data
in the clear somewhere,
1318.53 -> it'll automatically trigger an alert
1320.81 -> to this data product owner.
1323.33 -> This data product owner
can jump into a workflow
1326.51 -> and initiate a remediation plan,
1328.82 -> whether that's tokenizing the data,
1331.37 -> whether that's purging the data,
1333.23 -> whether it's something else,
1335.63 -> and the action that they take
1337.58 -> is automatically added to one
of these compliance reports
1342.62 -> that's then regularly reviewed
by our risk management team.
1353.63 -> CCPA is another use case
1356.72 -> where this experience works really well,
1359.6 -> customer calls up,
1361.07 -> they say, "Delete all of my data."
1363.74 -> We use almost this exact same workflow
1367.19 -> to ensure a fully auditable
and complete purge process.
1374 -> But anytime you mess
with data in production,
1378.47 -> you know, you can never take that lightly.
1380.78 -> The decisions need to be auditable.
1383.18 -> You have to maintain separation of duties,
1386.18 -> your actions need to be
approved and confirmed
1389.72 -> before they're executed,
1391.61 -> and all of that's possible
through this workflow.
1399.23 -> Now, this is the last
experience I'm gonna talk about.
1404.72 -> Data mesh calls for not just
federating data management
1410.21 -> but also data infrastructure,
1412.82 -> and I'm gonna tell this story
1414.38 -> through the lens of how we
manage our Snowflake costs
1417.32 -> at Capital One,
1418.43 -> because we're actually
showcasing this product
1420.65 -> at our booth on the Expo floor.
1423.68 -> We've built a self-service tool
1426.65 -> that lets you as a business team
1429.02 -> manage your own infrastructure
1431.42 -> while trusting that DBA best
practices are being followed
1435.44 -> and good cost controls are being enforced.
1439.37 -> Now, let's say you're a team
lead for a group of analysts.
1443.75 -> You're a line of business tech lead,
1446.33 -> you have a new project that
requires some dedicated compute.
1449.9 -> You can come to this experience
1452 -> and you can request the provisioning
1454.28 -> of a new Snowflake warehouse
1456.02 -> and you can manage who has access to it.
1459.23 -> That request goes through
an approval workflow
1461.81 -> and at the end,
1462.643 -> the resource is automatically
provisioned for you
1465.08 -> without any data engineering help.
1468.56 -> On the back end,
1469.88 -> we're capturing business metadata
1471.86 -> like the line of business,
1473.84 -> the project, the owner, the approver
1476.42 -> so that it makes it really
easy for your central team
1480.05 -> to charge back resources to
the appropriate cost center
1483.32 -> at the end of the month.
1486.8 -> But we realized that provisioning,
1488.51 -> just provisioning
infrastructure wasn't enough.
1491.66 -> We also needed to give our
teams a way to self-manage
1496.25 -> and make sure that they were
using that infrastructure
1499.1 -> as efficiently as possible.
1501.44 -> And so we built, you
know, several dashboards
1504.35 -> that track cost predictions,
cost trends, cost spikes.
1510.02 -> We've built several alerts
that detect cost anomalies
1514.7 -> and let you know when there's a problem.
1517.04 -> And some of those alerts also
come with recommendations
1520.79 -> on how you can troubleshoot the issue.
1524.63 -> Now, this product
1529.04 -> really drove a ton of
value at Capital One.
1532.34 -> We saved ourselves about 27%
1534.98 -> on our projected Snowflake costs.
1536.72 -> We saved our teams about
55,000 hours of manual activity
1540.62 -> through the elimination of change orders.
1542.69 -> We reduced our cost
per query by about 43%,
1546.53 -> and our business teams
were able to onboard
1551.69 -> like 450 new use cases on their own
1555.05 -> since our Teradata migration.
1558.23 -> And so, you know, if you're interested
1559.82 -> in seeing more about how we did this,
1561.56 -> like I said, the product
is at our booth here,
1564.89 -> you can also find more on
capitalone.com/software.
1570.95 -> So the key takeaway on this slide though
1573.83 -> is, you know, once your costs
1576.59 -> become predictable and manageable,
1578.96 -> particularly in cloud environments
1580.55 -> where you're now paying as you go,
1583.28 -> your central team stops
being a bottleneck,
1586.4 -> and that's really what
enables your business teams
1589.13 -> to move at their own pace.
1595.1 -> All right, closing thoughts here
1598.37 -> before I take questions.
1599.78 -> At the end of the day,
1601.16 -> you know, data mesh is
just, it's a concept,
1604.01 -> it's a set of principles,
1605.18 -> it's an operating model.
1607.67 -> If you really want to
operationalize this thing
1610.73 -> at your organization,
1612.59 -> not only do you need to
build these four experiences
1615.83 -> and then some,
1617.54 -> you have to make traditional
data engineering activity
1620.75 -> completely transparent to your users,
1623.09 -> and you do this through
easy to use tooling
1627.83 -> and self-service.
1630.02 -> If you can do these
things, you know, remember,
1635.69 -> central policy built
into a central platform
1639.59 -> that then enables
federated data management.
1642.44 -> That's how you unlock your technology
1644.54 -> and enable it to move at
the speed of business.
1648.38 -> This is where we end.
1650.54 -> I'm happy to stick around
for some questions.