AWS re:Invent 2022 - Why operationalizing data mesh is critical for operating in the cloud (PRT222)

AWS re:Invent 2022 - Why operationalizing data mesh is critical for operating in the cloud (PRT222)


AWS re:Invent 2022 - Why operationalizing data mesh is critical for operating in the cloud (PRT222)

As companies look to scale in the cloud, they face new and unique challenges related to data management. Data mesh offers a framework and a set of principles that companies can adopt to help them scale a well-managed cloud data ecosystem. In this session, learn how Capital One approached scaling its data ecosystem by federating data governance responsibility to data product owners within their lines of business. Also hear how companies can operate more efficiently by combining centralized tooling and policy with federated data management responsibility. This presentation is brought to you by Capital One, an AWS Partner.

Learn more about AWS re:Invent at https://go.aws/3ikK4dD.

Subscribe:
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4

ABOUT AWS
Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.

AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

#reInvent2022 #AWSreInvent2022 #AWSEvents


Content

0.33 -> - I'm Patrick Barch.
1.5 -> I am a senior director of product management at Capital One.
5.52 -> I currently lead product management
7.44 -> for Capital One Slingshot,
9.87 -> which is a new product
11.55 -> to come out of a new line of business from Capital One
14.16 -> called Capital One Software.
16.425 -> We announced this business in June,
18.48 -> and it's dedicated
19.313 -> to bringing our cloud and data management products
21.54 -> that we've built internally to market.
25.32 -> I am here to talk about data mesh
27.84 -> and how we operationalized
29.37 -> some of the core principles of data mesh at Capital One.
33.12 -> The story has roughly three parts.
35.82 -> I'll talk about our journey,
37.2 -> I'll talk about how it applies to the data mesh principles,
39.84 -> and then I'm gonna walk through four sample use cases
42.6 -> to try to ground what we did.
46.56 -> But first, some background info you may not know
49.2 -> about the company.
50.67 -> From our first credit card in 1994,
53.04 -> Capital One has recognized
55.02 -> the data and technology can enable,
57.66 -> even large companies, to be innovative and personalized.
61.02 -> And about a decade ago,
62.64 -> we set out on a journey to completely reinvent
65.82 -> the way we use technology
67.59 -> to deliver value to our customers.
70.17 -> We shut down our data centers.
72.06 -> We went all in on the cloud.
74.1 -> We re-architected our data center
75.84 -> or our data ecosystem in the cloud from the ground up,
79.26 -> and along the way,
80.31 -> we had to build a number of products and platforms
83.79 -> that the market wasn't offering yet
85.83 -> that enabled us to operate at scale.
92.04 -> Let me take a step back
93.33 -> and walk you through some of the key learnings
95.94 -> from our journey
97.65 -> but the macro context
99.63 -> in which we're all operating these days.
102.15 -> Moving to the cloud creates an environment
104.43 -> with way more data coming from way more sources
107.91 -> being stored in way more places,
110.43 -> and your analysts and scientists
112.35 -> are demanding instant access to all of that data
115.14 -> via self-service
116.55 -> in the tool and consumption pattern of their choice.
119.46 -> That's all happening against a backdrop
121.59 -> of patchwork privacy legislation
123.39 -> that's popping up all over the world.
125.43 -> So, like, how do you manage something like that?
128.43 -> And, oh by the way,
130.08 -> you have to get this right because,
132.54 -> pick your phrase,
133.56 -> data is the new oil,
134.64 -> data is the new gold,
136.38 -> at Capital One, we say data's the air that we breathe.
139.02 -> You know, companies are recognizing
140.67 -> that the key to success in today's tech-driven landscape
144.36 -> is creating value out of your data.
147.18 -> So no pressure.
151.71 -> Early in our journey,
153.33 -> we knew we were gonna have to think differently
155.04 -> about some of the challenges in our data ecosystem,
158.43 -> so we deliberately invested in product management
161.4 -> and design thinking.
162.69 -> And so we pretty classically started with
165.06 -> who are our customers,
166.29 -> who are our users,
167.94 -> what are their jobs to be done,
169.53 -> and what are their challenges.
171.72 -> And what we found is that you've got your teams responsible
174.72 -> for publishing high-quality data to a shared location,
177.99 -> downstream, you've got your analysts and scientists
180.63 -> looking to use that high-quality data
182.49 -> to make business decisions,
184.5 -> you've got your teams responsible
186.09 -> for defining and enforcing data governance policy
189.24 -> across the enterprise,
190.86 -> and lastly, you've got your infrastructure management teams
194.13 -> that have to manage the platforms
196.68 -> that power all of these use cases.
199.32 -> Now, this is oversimplified.
202.38 -> These aren't necessarily unique people.
204.48 -> These are more modes of operating
206.64 -> that a single person may adopt
208.5 -> in the course of doing their job.
210.75 -> An analyst who creates a new insight
212.82 -> in something like a Databricks notebook
214.86 -> and now they wanna publish it back
216.45 -> to the shared environment.
218.55 -> So when you have all these people
219.78 -> operating across all these different modes,
222.15 -> there is a lot of room for miscommunication
224.1 -> and there is a lot of room for error.
229.11 -> And so you're seeing the market respond
231.45 -> by offering lots of different types of tools
234 -> for these user groups,
235.56 -> and your company may go out and get a bunch of these tools.
238.71 -> And so it'll be common for a single person
240.9 -> to have to hop between six or seven different tools
243.24 -> and processes
244.41 -> just to complete a task like publishing a new dataset.
249.12 -> And, by the way, this list isn't complete
251.94 -> or as neatly aligned as it's shown here.
254.88 -> Data catalogs these days are being marketed
257.1 -> both as discovery tools and governance tools.
260.34 -> Data protection actually requires a suite of tools
263.82 -> to scan for sensitive data in the clear
266.31 -> and make sure it's protected
267.39 -> with something like tokenization or encryption.
270.36 -> So your company takes a bunch of these tools,
273.45 -> they piece them together,
274.77 -> and at the end of the day,
276.09 -> maybe you get something that looks like this.
279.87 -> Now, before you take a picture of this slide,
282.27 -> I just wanna warn you, people in the back,
284.58 -> this doesn't work.
286.241 -> (attendees laugh)
288.21 -> Why doesn't this work?
289.83 -> Well, let's look at our data publishing friend
291.9 -> here on the left.
293.55 -> This person first needs to go to their ETL and pipeline tool
296.49 -> to configure some jobs,
298.11 -> then they have to go to their catalog
299.64 -> to get data registered.
301.11 -> They have to make sure their data quality is being checked.
303.6 -> They have to make sure their data is protected
305.67 -> with the right entitlement.
306.99 -> They have to make sure lineage is being captured.
309.18 -> And then they probably have to go tap somebody
311.19 -> on the infrastructure team on the shoulder
313.44 -> to get an S3 bucket, an S3 location,
315.69 -> a Snowflake table created,
318.39 -> you know, and then what happens if changes are required?
321.93 -> What happens if the schema changes,
323.52 -> what happens if there's a data quality issue?
325.68 -> You know, how does this data publisher
327.6 -> find and contact all of the downstream users
330.69 -> to let them know a change is coming?
333.36 -> If you're a consumer of this data,
336.12 -> how do you know whether you're using the right data
338.91 -> across all of these different touch points?
341.04 -> And how is anybody on the data governance team
343.44 -> supposed to enforce policy
345 -> when they have to go to so many different places?
348.12 -> You know, scaling this ecosystem becomes really complicated
351.72 -> both for your engineering teams
353.67 -> that have to build and maintain the integrations,
356.22 -> but also for your users
357.66 -> that have to navigate this place map.
363.03 -> So this brings us to data mesh.
367.08 -> I'm assuming you've all heard of this thing,
369 -> otherwise you wouldn't be here.
372.51 -> Data mesh is a set of principles.
374.76 -> It's an architectural framework.
376.26 -> It's an operating model that companies can adopt
379.5 -> to help them scale a well-managed data ecosystem.
382.92 -> And, for me, the heart of this thing
385.53 -> is treating data like a product,
388.11 -> because once your company makes that mindset shift,
391.38 -> and it really is a mindset shift
393.09 -> that requires full and total buy-in from everybody,
396.03 -> the rest of these principles kind of naturally follow.
398.22 -> You have to decide how you're gonna organize, in domains,
402.48 -> those data products,
403.74 -> and then you have to enable a whole bunch of activities
406.65 -> via self-service for those data product owners.
410.97 -> Now, data mesh was coined or invented in 2019
415.71 -> but it really started to gain traction in 2020,
418.32 -> which, if you'll recall,
419.37 -> was right around the time
420.3 -> we were shutting down our last data center,
422.31 -> and so this concept came out too late for us.
425.76 -> But when you see how we approached our data ecosystem,
429.45 -> the similarities are pretty striking.
434.52 -> So we approached scaling our data ecosystem
437.37 -> really through two prongs,
439.35 -> centralized policy tooled into a central platform
443.64 -> that then enables federated data management.
448.83 -> I'm gonna walk through each one of these pillars now.
453.12 -> So the first thing that we did
455.88 -> was break our lines of business
458.43 -> into discreet organizations and units of data responsibility
462.66 -> with hierarchy,
464.31 -> but we didn't enforce the same hierarchy
466.8 -> on all of our lines of business.
468.93 -> Our big lines of business had three or four levels,
471.48 -> our smaller lines of business really only had one.
474.24 -> But each line of business had the same set of roles
476.7 -> supporting it.
477.99 -> Performing data stewards
479.37 -> are responsible for the risk of one or more datasets
482.1 -> in their business unit.
483.84 -> Managing data stewards
485.58 -> are responsible for the risk of all of the datasets
488.13 -> within the business unit,
489.6 -> and each business organization or line of business
492.48 -> also has a data risk officer
494.67 -> that's responsible for the entire thing.
497.97 -> Now, these weren't new roles.
499.62 -> We didn't go out and hire a bunch of people.
501.87 -> These are all side of desk activities
503.76 -> and each one of these people also has a day job.
508.14 -> Next thing we did was define common enterprise standards
512.43 -> for metadata management across the company,
515.13 -> and our big learning here
516.54 -> is that not all data is created equal
519.84 -> and you need to slope your governance based on risk.
523.44 -> You know, we're a bank,
524.46 -> so, of course, we always need to know
526.95 -> where is all of our data,
528.6 -> which of that data is sensitive
530.31 -> and who's responsible for it,
531.99 -> but temporary user data or staging tables
535.89 -> requires a different standard of governance
537.93 -> than data used in regulatory reports,
540.87 -> and so we needed to make sure
541.98 -> that our policies reflected that reality.
546.12 -> Next thing we did
548.46 -> was define different standards for data quality
552.9 -> depending on the importance of the data.
555.18 -> And so, you know, if you never intend
557.64 -> to share data beyond a single application,
560.43 -> we really only enforce a bare minimum
562.26 -> of data quality standards,
563.79 -> but if you do plan to share your data
565.47 -> with others at the company,
566.76 -> now we enforce more rigorous checks
569.28 -> like ensuring that the schema you're trying to publish
572.85 -> matches the schema the consumers expect
575.28 -> and making sure that data is complete
577.29 -> from point A to point B.
579.93 -> Our most valuable data
582.48 -> also has to pass business data quality checks
585.78 -> like making sure that FICO fields
589.11 -> fall within the allowable range.
593.01 -> Entitlements.
595.29 -> Early in our journey,
596.73 -> every dataset was protected with its own entitlement,
600.27 -> and so it could take you as an analyst weeks
603.24 -> to figure out which role you need to get access to,
605.94 -> and then when you did get access,
607.5 -> the majority of the time,
609.12 -> the data was either bad quality
610.68 -> or it just wasn't what you wanted,
612.48 -> and so the process started again
615.21 -> and rarely would that mistaken entitlement be revoked.
619.41 -> So now you got all these people running around
621.6 -> requesting access to data they don't need.
623.88 -> It's a time suck, it creates risk.
626.97 -> And so what we did was we created mappings
631.95 -> between lines of business and data sensitivity,
635.58 -> and so now you as a user can request access
637.77 -> to all non-sensitive data in commercial, for example,
642.18 -> and you only need to re-request access
644.04 -> when you need to step up permissions.
653.09 -> Okay.
655.14 -> I just rattled off a whole bunch of,
657.03 -> if your scenario was that, do this,
659.01 -> if your scenario was this, do that type situations.
662.01 -> How is anybody supposed to keep that straight
664.29 -> and how is any data governance team
666.72 -> supposed to enforce all of that?
669.15 -> The answer is deceptively simple,
671.82 -> you make it easy.
673.68 -> We surveyed our data teams
676.08 -> and, by and large, they all wanted to do the right thing.
679.44 -> They wanted to be good corporate stewards.
680.82 -> They didn't want to create risk
682.2 -> but they didn't know how.
684.39 -> Our policies were confusing
685.92 -> and our policies were opaque.
688.11 -> So how do you make it easy?
692.16 -> Well, the way that we made it easy
694.2 -> was giving our teams a usability layer
697.56 -> for them to do their work,
699.78 -> and this usability layer is aligned
702.51 -> not in terms of a technology,
704.52 -> like catalog or data quality,
706.89 -> but in terms of a job to be done,
709.14 -> publishing a new dataset,
710.76 -> finding and getting access to a dataset,
713.04 -> protecting sensitive data,
714.78 -> reconciling my infrastructure bill.
717.93 -> This usability layer talks to an orchestration layer
721.26 -> that handles keeping all of those different systems in sync,
724.44 -> and it also goes all the way down
726 -> to the infrastructure layer
727.53 -> to automatically provision resources,
729.6 -> whether it's a table, an S3 bucket, Kafka topic,
733.74 -> on the user's behalf.
737.22 -> So the key to federating data management responsibility
740.7 -> is giving your teams an experience
743.13 -> that aligns to the job they're trying to do.
747.81 -> But, like, that's kind of theoretical
749.88 -> and so what I'm gonna do next
752.88 -> is try to ground that statement
757.53 -> in four different use cases
760.05 -> that we've enabled for our teams at Capital One.
766.65 -> The first use case is our data producing experience,
771.66 -> and you may wonder, like,
773.857 -> "Why do you need a data producing experience
775.92 -> in the first place?
777.42 -> I can just go talk to somebody on the infrastructure team,
780.06 -> they can provision me an S3 bucket,
781.92 -> and then we can use a native AWS service
784.14 -> to move data from point A to point B."
786.96 -> And, you know, that may work in smaller companies
789.84 -> where issues of scale data governance
792.57 -> haven't cropped up yet,
794.1 -> but in large companies,
796.08 -> publishing data is like a one to two month project.
799.59 -> You have to coordinate across five or six different teams,
802.62 -> you have to have lots of meetings to make small decisions,
805.56 -> and so we needed to simplify this process
808.68 -> so our teams could move faster.
812.13 -> Now put yourself in the shoes of this data producer here.
815.97 -> All this person cares about
817.41 -> is getting their data from point A to point B
819.87 -> so it can be consumed by others in their company.
822.3 -> Any additional step or task
824.31 -> related to compliance or governance
827.34 -> is really just a roadblock
829.11 -> on the way to them doing their job.
832.59 -> So the first thing this person does
835.77 -> is register their metadata.
837.78 -> This is where they provide business meaning,
840.6 -> this is where they define data quality thresholds,
843.18 -> and this is where they define retention policies.
846.12 -> Then, in the background, we will register the catalog,
850.29 -> we will provision a location,
853.02 -> and we will configure and schedule jobs
856.44 -> that check for data quality and enforce data retention.
861.42 -> The next thing this person does is classify their data,
866.13 -> and they do this by either approving or overriding
871.65 -> sensitivity values that have been pre-populated for them,
875.28 -> and once they complete this step,
877.41 -> we update that registration,
878.88 -> we update that physical layer
880.68 -> to protect it with the appropriate entitlement.
884.94 -> Next, the user configures their data pipeline,
887.61 -> they point the system at a source,
889.17 -> they configure their transformation logic,
892.08 -> and when they're done,
893.01 -> we'll automatically build them a pipeline
895.11 -> without any assistance from the data engineering team.
900.09 -> Once that pipeline is turned on,
902.52 -> all of those governance steps that I configured
905.01 -> are executed automatically.
906.99 -> We'll check the data quality for each new instance.
909.66 -> We will track the lineage for each new instance.
912.78 -> We will scan each new instance
914.85 -> for sensitive data that's being inappropriately loaded
918.09 -> to the target system automatically.
923.58 -> Now, lastly, and sort of most controversially, probably,
930.93 -> for this to work,
932.76 -> you need to have one way to ingest data into your ecosystem.
937.47 -> You need one way in.
938.97 -> Otherwise, you cannot be 100% certain
942.24 -> that data governance is being applied consistently
944.85 -> across the enterprise.
947.67 -> But that one way in, it can't be rigid.
950.58 -> It has to be flexible.
952.23 -> It has to support the individual use cases
955.29 -> of your lines of business,
957.45 -> and only then will you...
963.3 -> Did I do that?
964.429 -> (staff speaks faintly)
965.73 -> Only then will you get the buy-in you need
967.41 -> to drive adoption.
972.3 -> I'm gonna take this opportunity and grab some water.
995.37 -> Can you fix me in the back here,
996.99 -> or do I have to go and do it up here?
1019.04 -> Hold, I think I can do this from my end.
1021.742 -> (Patrick mumbling)
1023.96 -> - [Staff] I'm gonna help you, sorry.
1025.217 -> - Oh, yeah. (mumbling)
1032.63 -> We're back.
1035.03 -> Thank you.
1035.863 -> - [Staff] Glad I could help.
1036.716 -> - (laughs) Awesome.
1039.2 -> All right.
1051.14 -> Okay, data producer experience.
1055.55 -> It was really this automation of governance
1058.49 -> that was the key driver of our business teams
1063.41 -> adopting this workflow.
1064.73 -> You know, people ask us after these presentations,
1066.387 -> "How do you get buy-in at your company?"
1068.6 -> and, you know, you need one way in to your data ecosystem
1072.83 -> like I talked about,
1073.91 -> but it's the automation of governance
1076.88 -> that's really gonna be that carrot
1078.68 -> that drives your teams to adopt whatever you build.
1084.17 -> The next experience is the data consumer experience,
1089.45 -> and, again, you might be thinking like,
1092.547 -> "My company is small.
1094.16 -> I only have a couple dozen tables,
1096.14 -> like I'm fine relying on tribal knowledge
1098.81 -> to figure out which data I need
1102.26 -> and which role I need to request access to,"
1105.2 -> and that may work when you've got a couple dozen tables.
1107.78 -> But in a large company with hundreds or thousands
1111.98 -> or hundreds of thousands of tables,
1115.76 -> it gets really difficult for your analysts
1118.46 -> to find, evaluate, and use the right data.
1124.13 -> So imagine you're an analyst
1128.87 -> and you wanna understand
1130.07 -> the results of a recent marketing campaign.
1132.65 -> You come to this experience and you search for Axiom,
1137 -> which is one of our marketing vendors.
1140.48 -> Not only do we give you
1141.98 -> a list of all of the data produced by Axiom,
1144.8 -> but we also give you
1145.73 -> a series of recommendations and insights,
1148.31 -> we show you what data is used with frequently
1152.48 -> the Axiom data that you're looking at,
1154.58 -> because we know that very few analyses
1156.74 -> are done with a single dataset,
1158.9 -> and we also show you
1163.52 -> information about common queries that are run on that data
1169.34 -> so that we can maybe save you a step,
1171.32 -> and we also show you popular reports
1173.78 -> that are using that data to maybe save you two steps.
1178.07 -> But when you're searching for data as an analyst,
1180.23 -> you don't just want any data,
1181.61 -> like there's a lot of data out there,
1182.72 -> you don't want anything.
1183.86 -> You want the right data.
1185.69 -> But how do you identify the right data?
1187.97 -> We give our teams signals of relevance
1190.91 -> to help them understand whether the data is high quality.
1194.72 -> They can track
1195.71 -> or check the status of the data quality rules,
1198.29 -> they can check the lineage,
1199.73 -> they can check a profile,
1201.29 -> they can check how fresh the data is
1203.3 -> and how often it's updated,
1205.16 -> they can see who else is using that data
1207.29 -> and whether anybody on their team is using that data.
1212.69 -> Once they understand
1213.8 -> kind of if this is the right data for them,
1216.68 -> the next step is requesting access.
1219.14 -> And because this experience
1221.3 -> is integrated into our identity management system
1225.47 -> and our LDAP groups,
1227 -> we know whether the user has access already,
1230.18 -> and so we can let them know directly in the experience.
1234.2 -> If they don't have access,
1236.84 -> through the same workflow,
1238.73 -> they can submit a request
1240.71 -> that's then routed to the appropriate stewardship group
1243.65 -> to either approve or reject.
1251.81 -> The next experience I'm gonna talk about
1253.91 -> is our self-service data governance experience,
1258.68 -> and what I wanna highlight here
1260.63 -> is how what we've built enables two different persona groups
1264.95 -> to work together seamlessly.
1267.56 -> So on one hand, on the left hand side here,
1270.2 -> you've got your risk management teams
1272.87 -> that are responsible for defining data governance policy
1277.28 -> that is then automatically incorporated
1279.71 -> into all of our data workflows.
1282.62 -> And then on the right hand side,
1284.36 -> those same teams proactively receive compliance reports
1289.4 -> that let them know things
1290.39 -> like what percentage of our data is registered,
1294.05 -> how are we doing addressing our data quality incidents,
1297.5 -> have we discovered any sensitive data in the clear
1299.81 -> that we need to remediate.
1302.57 -> And what's cool about this
1303.68 -> is it truly does enable seamless automated integration
1309.38 -> between these two groups.
1311.06 -> And so, you know, if one of our automated processes detects
1315.74 -> that their sensitive data in the clear somewhere,
1318.53 -> it'll automatically trigger an alert
1320.81 -> to this data product owner.
1323.33 -> This data product owner can jump into a workflow
1326.51 -> and initiate a remediation plan,
1328.82 -> whether that's tokenizing the data,
1331.37 -> whether that's purging the data,
1333.23 -> whether it's something else,
1335.63 -> and the action that they take
1337.58 -> is automatically added to one of these compliance reports
1342.62 -> that's then regularly reviewed by our risk management team.
1353.63 -> CCPA is another use case
1356.72 -> where this experience works really well,
1359.6 -> customer calls up,
1361.07 -> they say, "Delete all of my data."
1363.74 -> We use almost this exact same workflow
1367.19 -> to ensure a fully auditable and complete purge process.
1374 -> But anytime you mess with data in production,
1378.47 -> you know, you can never take that lightly.
1380.78 -> The decisions need to be auditable.
1383.18 -> You have to maintain separation of duties,
1386.18 -> your actions need to be approved and confirmed
1389.72 -> before they're executed,
1391.61 -> and all of that's possible through this workflow.
1399.23 -> Now, this is the last experience I'm gonna talk about.
1404.72 -> Data mesh calls for not just federating data management
1410.21 -> but also data infrastructure,
1412.82 -> and I'm gonna tell this story
1414.38 -> through the lens of how we manage our Snowflake costs
1417.32 -> at Capital One,
1418.43 -> because we're actually showcasing this product
1420.65 -> at our booth on the Expo floor.
1423.68 -> We've built a self-service tool
1426.65 -> that lets you as a business team
1429.02 -> manage your own infrastructure
1431.42 -> while trusting that DBA best practices are being followed
1435.44 -> and good cost controls are being enforced.
1439.37 -> Now, let's say you're a team lead for a group of analysts.
1443.75 -> You're a line of business tech lead,
1446.33 -> you have a new project that requires some dedicated compute.
1449.9 -> You can come to this experience
1452 -> and you can request the provisioning
1454.28 -> of a new Snowflake warehouse
1456.02 -> and you can manage who has access to it.
1459.23 -> That request goes through an approval workflow
1461.81 -> and at the end,
1462.643 -> the resource is automatically provisioned for you
1465.08 -> without any data engineering help.
1468.56 -> On the back end,
1469.88 -> we're capturing business metadata
1471.86 -> like the line of business,
1473.84 -> the project, the owner, the approver
1476.42 -> so that it makes it really easy for your central team
1480.05 -> to charge back resources to the appropriate cost center
1483.32 -> at the end of the month.
1486.8 -> But we realized that provisioning,
1488.51 -> just provisioning infrastructure wasn't enough.
1491.66 -> We also needed to give our teams a way to self-manage
1496.25 -> and make sure that they were using that infrastructure
1499.1 -> as efficiently as possible.
1501.44 -> And so we built, you know, several dashboards
1504.35 -> that track cost predictions, cost trends, cost spikes.
1510.02 -> We've built several alerts that detect cost anomalies
1514.7 -> and let you know when there's a problem.
1517.04 -> And some of those alerts also come with recommendations
1520.79 -> on how you can troubleshoot the issue.
1524.63 -> Now, this product
1529.04 -> really drove a ton of value at Capital One.
1532.34 -> We saved ourselves about 27%
1534.98 -> on our projected Snowflake costs.
1536.72 -> We saved our teams about 55,000 hours of manual activity
1540.62 -> through the elimination of change orders.
1542.69 -> We reduced our cost per query by about 43%,
1546.53 -> and our business teams were able to onboard
1551.69 -> like 450 new use cases on their own
1555.05 -> since our Teradata migration.
1558.23 -> And so, you know, if you're interested
1559.82 -> in seeing more about how we did this,
1561.56 -> like I said, the product is at our booth here,
1564.89 -> you can also find more on capitalone.com/software.
1570.95 -> So the key takeaway on this slide though
1573.83 -> is, you know, once your costs
1576.59 -> become predictable and manageable,
1578.96 -> particularly in cloud environments
1580.55 -> where you're now paying as you go,
1583.28 -> your central team stops being a bottleneck,
1586.4 -> and that's really what enables your business teams
1589.13 -> to move at their own pace.
1595.1 -> All right, closing thoughts here
1598.37 -> before I take questions.
1599.78 -> At the end of the day,
1601.16 -> you know, data mesh is just, it's a concept,
1604.01 -> it's a set of principles,
1605.18 -> it's an operating model.
1607.67 -> If you really want to operationalize this thing
1610.73 -> at your organization,
1612.59 -> not only do you need to build these four experiences
1615.83 -> and then some,
1617.54 -> you have to make traditional data engineering activity
1620.75 -> completely transparent to your users,
1623.09 -> and you do this through easy to use tooling
1627.83 -> and self-service.
1630.02 -> If you can do these things, you know, remember,
1635.69 -> central policy built into a central platform
1639.59 -> that then enables federated data management.
1642.44 -> That's how you unlock your technology
1644.54 -> and enable it to move at the speed of business.
1648.38 -> This is where we end.
1650.54 -> I'm happy to stick around for some questions.
1654.11 -> Go team USA.
1657.085 -> (attendees applauding)

Source: https://www.youtube.com/watch?v=DyB2iueJa6I