AWS re:Invent 2022 - Enabling agility with data governance on AWS (ANT204)

AWS re:Invent 2022 - Enabling agility with data governance on AWS (ANT204)


AWS re:Invent 2022 - Enabling agility with data governance on AWS (ANT204)

Data governance is the process of managing data throughout an end-to-end process, ensuring its accuracy and completeness and making sure it is accessible to those who need it. Join this session to learn how AWS is delivering comprehensive data governance— from data preparation and integration to data access, data quality, and metadata management—across analytics services.

Learn more about AWS re:Invent at https://go.aws/3ikK4dD.

Subscribe:
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4

ABOUT AWS
Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.

AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

#reInvent2022 #AWSreInvent2022 #AWSEvents


Content

0.023 -> - Welcome to Enabling Agility with Data Governance on AWS.
4.17 -> Very excited to talk to you about data governance.
6.27 -> I don't think I've said that before,
8.34 -> but today it's gonna be a good talk
9.75 -> between myself and Shihas from Prudential, so thank you.
13.35 -> I run our Lake Formation product
14.97 -> as well as the Glue data catalog here at AWS.
17.742 -> I've been with Amazon for about eight years.
19.71 -> I was running our analytics practice
21.18 -> in professional services for most of that time.
24.27 -> We're helping customers build data lakes,
26.79 -> help them migrate their data platforms to AWS
29.37 -> and become data driven.
31.32 -> Let's go see what we're gonna talk about today.
34.08 -> The first thing we're gonna talk about
36 -> is how does data governance help you become data driven?
40.17 -> The second thing we're gonna talk about
41.37 -> is data governance patterns within AWS analytic services.
45.6 -> And then Prudential Financial will come up and talk about
47.91 -> their data journey and how they actually did it for real
50.34 -> versus me just talking about some PowerPoint examples.
54.99 -> So let's start first with data driven themes
57.99 -> we hear from customers and I like to start
60.96 -> with this about data governance
63.3 -> because it's so critical hand in hand.
66.36 -> The first thing you wanna learn is the top part of it
69.777 -> and the bottom part of it are coupled in a way.
72.39 -> But the top part is more about business context
75.84 -> and the bottom part is more about people process technology.
79.23 -> So what customers tell us they want to be data driven,
81.54 -> they struggle with the few areas and these are consistent
84.06 -> themes across many customers we talked to.
86.85 -> The first thing is understanding what great looks like.
89.37 -> And this isn't about just building a roadmap
91.68 -> or understanding what I'm gonna do next.
93.15 -> It's actually what is great
94.29 -> and what will delight your customer.
96.63 -> At Amazon, we talk about working backwards
98.73 -> or writing PR, FAQs to go through that process
101.25 -> and really articulate what is great.
103.23 -> And that's your first start in your journey
105.27 -> with data governance as well,
106.32 -> and we'll tie these together in a moment.
108.3 -> Once you know what's great is, looks like you prioritize
111 -> those use cases and you create sponsorship
112.86 -> just like you would with any other initiative.
115.71 -> On the bottom row, is when it fits into data governance.
119.73 -> As you start building these use cases
121.56 -> to solve business problems, you integrate things
124.89 -> like data driven culture, data literacy programs,
127.8 -> understanding data definitions, you focus on gaps of skills.
131.37 -> Maybe you need to introduce a new tool
132.99 -> that users need to understand.
134.85 -> And the last thing is security's number one,
137.19 -> if we're gonna drive business agility with data governance,
140.79 -> we need to make sure your data is protected.
142.71 -> Secure compliance controls are built
145.59 -> because your customer and users or data
148.35 -> need to make decisions at the speed of their business.
150.99 -> And without protecting data at that same speed,
154.08 -> you could create additional business risk.
158.01 -> So some other areas that we've seen
162.36 -> other publications talk about is, for example,
165.15 -> Forbes gave some statistics,
167.01 -> 85% of businesses want to be data driven,
170.37 -> but yet 37% have really struggled with it.
173.19 -> And why have they struggled?
174.48 -> And you see some quotes around from IDC.
177.21 -> Data governance is a core struggling area
179.43 -> for becoming data driven.
180.93 -> So it's no longer optional is what IDC says.
184.23 -> For enterprise organization to get the most value
186.93 -> of their data, to make decisions,
188.91 -> they really need to treat data as an asset.
191.85 -> The second part is organizations lack knowledge
195.45 -> of efficient and effective data governance activities.
198.21 -> And 30% of the time is wasted doing data governance things,
201.87 -> data definitions, data management, data security.
205.26 -> And these are areas you can definitely automate
207.24 -> that we're gonna talk to in a few minutes.
210.24 -> Let's put some definition around this.
212.49 -> This is how we define data governance here at Amazon.
215.16 -> There are thousands of definitions of data governance,
217.65 -> so I'd love to hear yours as well,
219.39 -> but this is how we define it,
220.807 -> "Data governance is the collection of policies,
225.24 -> processes and systems that organizations use
228.57 -> to ensure that quality and appropriate handling
231.06 -> of their data throughout its lifecycle
234.54 -> for the purpose of generating business value."
237.96 -> And to me what really sticks out the most is a few things,
241.08 -> one of which is the last line, generating business value.
244.62 -> We do not want to do any work in data governance
247.5 -> without a business value associated with.
249.9 -> And let's go a little bit deeper into that.
253.44 -> So data governance starts with business.
258 -> The first thing you want to do
259.29 -> is tie it to your business strategy.
261.72 -> And I'll tell a bit of a story.
263.64 -> So I'd say like 15 years ago I got very excited,
266.19 -> I was managing a customer data system,
269.07 -> it was probably pre-sales force at the time,
271.29 -> and I'm like, our data's really bad,
273.36 -> I need a data quality tool.
275.61 -> And I went up to my management, I'm like
276.96 -> we're gonna go buy a data quality tool,
278.34 -> it's gonna be X hundreds of thousands of dollars
280.05 -> and we're gonna solve our customer data problem.
282.69 -> It wasn't funded and I didn't know what to do.
285.72 -> I'm like, I can't understand why this wasn't funded.
287.64 -> I wasn't really thinking about it.
289.62 -> We didn't have a customer analytics problem to solve
292.47 -> at the time, you'd think that most customers
294.48 -> had that problem, but we didn't have that.
297.36 -> We knew our customers, our business
299.4 -> wasn't the biggest company at the time I was dealing with,
301.83 -> our business people knew our customers really, really well
304.89 -> and they weren't going to invest in a data quality tool.
307.5 -> So what's important here is understand
309.63 -> your business challenges and start your data governance
312.72 -> journey with that.
314.52 -> Once you know those business challenges,
316.92 -> you have data consumers.
318.96 -> Data consumers are those that need to use data
321.42 -> to make decisions.
322.74 -> So those are the folks that will understand what data
325.17 -> they're gonna ask for, what data they're gonna look for,
326.91 -> and what metrics they need to solve business problems.
330 -> They're gonna choose to inform data producers.
332.91 -> So data producers could be data owners, system owners,
337.17 -> data lake admins, here is the data I need to do
340.47 -> to solve my problem that ties to the business strategy.
342.6 -> How can you help me?
344.16 -> And that's where data governance really kicks in.
347.85 -> So this is our data governance, I guess bracelet,
352.41 -> charm slide that we talk through.
355.53 -> And there's two areas, there's the top portion of the slide
358.41 -> and the bottom portion of the slide.
360.66 -> So we've been talking about schema-on-read for a while
365.16 -> here in this industry.
366.51 -> Schema-on-read's an interesting concept
367.95 -> 'cause you definitely wanna just put the data in
369.45 -> and people can get value really quickly.
371.4 -> But does that really help a data user make decisions faster?
376.05 -> What you wanna do here is the top part as a data consumer
379.59 -> is asking a data producer for more information.
383.07 -> Those producers should ingest that data quickly.
386.4 -> They need to automate classification,
388.32 -> automate data profiling, automate data quality
391.11 -> and secure and encrypt that data before they give access.
394.35 -> And that brings that real understanding
396.09 -> of what that data is, so that users can quickly be provided
400.02 -> access to that information.
402.03 -> For example, data classification.
403.53 -> What if your data had PII and the producer of data
406.65 -> didn't really know what their role is in the organization
409.59 -> to protect compliance controls or GDPR?
413.4 -> Automate data classification on the way in.
416.34 -> The last thing is catalog, so before giving access to data,
419.94 -> you want to catalog your information,
421.77 -> bring that technical and business context to your data
425.1 -> and then give users access to that to make decisions.
428.79 -> I was at a talk at Amazon a couple years now
431.28 -> and they talked a bit about the value of data
433.86 -> and they talked a lot about schema-on-read
436.26 -> and how this fits in.
438.51 -> But what you want to do is maximize and get data
441.06 -> in the hands of users as fast as possible
443.49 -> in a secure and compliant way so that they can make
447.09 -> decisions at the speed of their business.
449.49 -> So we'll talk a bit about automating
451.14 -> the top part of the slide.
453.54 -> The bottom part of the slide is once data is in the hand
456.36 -> of consumers, they can manage that data,
458.22 -> they can create new data sets,
459.36 -> they can then become producers on their own.
462.12 -> So if you think about a consumer in that point in time,
464.73 -> they might be using data, accessing,
466.77 -> querying, making decisions.
468.57 -> They don't just make a decision and it goes away.
470.88 -> They want to create something new to share
472.53 -> with their executives, share with their business partners.
475.23 -> That's where they actually become a producer
476.97 -> and they're gonna be data curating, data integration,
480 -> building lineage, securing that data and start
482.85 -> and continuing that life cycle of data throughout.
486.54 -> So how do AWS services help?
491.37 -> I'm gonna talk about three main challenges here.
494.19 -> The first one is how do we automate data integration,
497.19 -> classification and data quality
499.68 -> on the way into your data lake?
501.99 -> The second area is how do we catalog
504.51 -> and make that data more usable?
506.67 -> And the third area is how do we automate sharing
509.97 -> so that users can get access to the data they need
512.55 -> at the right time and as fast as possible?
515.1 -> So let's go a bit deeper into automating ingestion,
519.06 -> or lastly, you wanna make sure that it can publish
521.37 -> those data sets back through that same ingestion pipeline.
528.549 -> So automating data governance on ingestion
532.23 -> is a hard problem, let's talk about why it's a hard problem.
535.44 -> The first thing is data comes in various forms,
538.95 -> it comes streaming, it comes in batch, it comes in CDC,
541.59 -> it comes from SaaS applications,
543.6 -> it comes from different file formats
545.76 -> and it's variable over time.
548.07 -> So automating that process is hard but can be done over time
552.9 -> by building a set of patterns.
554.1 -> And we'll talk about an example pattern in a moment.
556.65 -> The second thing is automation with CICD pipelines.
559.44 -> So to make sure that your enterprise standards are built
562.5 -> into that automation of ingestion.
564.84 -> So when a new pattern comes in,
566.58 -> your CICD pipelines can adapt.
569.31 -> And you want to handle it for many data sources, RDBMS
572.61 -> files, streams, SaaS applications.
576.09 -> The second challenge you want to handle
577.89 -> when you're automating data or data ingestion
581.04 -> is handling inconsistent performance,
583.23 -> reliability and quality.
585.15 -> So as part of these reusable ingestion frameworks
588.33 -> that you might build at your customer sites
590.91 -> is you want to understand how data is stored.
592.71 -> You want to handle inconsistent data formats.
595.56 -> Over time as you see more patterns across
598.35 -> all of your systems, you'll actually understand
600.93 -> that those patterns happen over and over again
604.14 -> across variable systems.
605.28 -> You can build a library of data quality rules,
607.29 -> a library of data profiling statistics.
611.49 -> And the last thing is you wanna standardize
612.99 -> your data quality so that users in one system
615.93 -> understand the data quality rules that were applied.
619.32 -> So let's say you have multiple SAP instances,
621.54 -> there might the same data quality rule can apply
623.91 -> to different instances.
625.38 -> So you get the consistent data use on the way in.
629.16 -> And the last thing is bringing in compliance and regulation.
633.33 -> So as you're monitoring your data quality,
635.64 -> you're running data profiling, you look at the data,
638.49 -> you inspect the data and use things like machine learning
640.98 -> such as Glue as PII detection.
642.93 -> Then there's Macie for example, Comprehend can do PHI,
646.38 -> some PHI detection as well.
648.03 -> How do you automate that as part of a pipeline
650.4 -> and then classify that data on the way in
653.34 -> and carry it through your environment.
656.7 -> Let's talk through a real example
658.62 -> that covers the ingestion piece of it.
661.08 -> I'm gonna cover data classification, data profiling,
663.48 -> data quality securing catalog
665.607 -> and this can all be done with CICD.
667.38 -> I picked a different icon here
668.88 -> for the AWS data ops development kit,
671.07 -> which is an open source project built on top
673.56 -> of the CDK for building data pipelines faster.
679.56 -> And that can help you automate a pattern such as this.
682.05 -> So you see here is the pattern on the left is Salesforce
685.8 -> and on the right you have a ingestion pattern
687.9 -> for SaaS application.
689.58 -> So you can use a product like AppFlow.
691.95 -> AppFlow can push that data to S3
694.02 -> and that's where you can run your data quality rules
696.03 -> as it lands in S3 and as you move it to curated.
699.6 -> As it lands, we're gonna talk a bit about Glue crawlers
703.35 -> and how they remove the heavy listing.
704.82 -> So Glue crawlers will inspect your data,
706.95 -> you can build custom classification rules
709.35 -> within your Glue crawlers to manage your schema.
711.66 -> You populate your data catalog and you propagate
714.15 -> that to Lake Formation to manage auspice controls.
716.97 -> This is just a standard pipeline
718.53 -> and Shihas will talk in more detail
719.85 -> of what happens in real world.
722.25 -> Obviously it's not as simple as this,
723.81 -> but this is just an example of what you build.
725.55 -> So you build this, once you apply it for Salesforce
728.22 -> and as other databases come in, rinse and repeat,
730.98 -> then you end up just replacing that first line
732.96 -> that goes into the box over time.
738.15 -> So let's talk about the second area.
740.1 -> The second area is cataloging data
742.26 -> and making that data more usable.
743.82 -> You wanna make sure that data is findable, searchable,
746.94 -> accessible so people can look at it, understand it,
750.21 -> request access to it, all from a single place.
754.2 -> So how do our services help with that?
756.96 -> I'm gonna talk a bit about crawlers, for example.
759.09 -> There's Glue crawlers that eliminate
761.25 -> the heavy lifting on managing schema.
763.47 -> So now we've ingested that data,
765.51 -> we have data quality built into it,
767.58 -> we have technical catalog.
769.8 -> This is really where we want to extract that information
772.62 -> out of your data and start populating
775.86 -> and enriching your catalog over time.
777.357 -> And you can start with your technical catalog
779.25 -> and we'll talk about the business catalog in a moment.
781.89 -> The first thing is crawlers, discover new data sets
785.04 -> and extract schema definitions.
787.98 -> I know what you're saying, yes, I can do this in code,
789.84 -> yes, I can do this in APIs sure.
791.4 -> There are many ways to handle schema definitions,
795.39 -> but what you're doing here is you're reducing
797.43 -> that heavy lifting so programmers don't have to worry
799.71 -> about catalog management and you just put it on the crawler
803.1 -> and you can let us know if our crawlers
804.54 -> aren't working well enough for you
805.83 -> and we can fix it and improve it.
809.43 -> Crawlers cover a wide array of sources from S3, for example,
813.478 -> DynamoDB, MongoDB, all of our RDS, Aurora for example.
818.73 -> Recently we announced Snowflake last week,
821.16 -> so we can now cross Snowflake and provide that schema
823.86 -> available for you so that you can start understanding
826.98 -> what data's coming into Snowflake over time
829.08 -> and manage that incrementally.
832.95 -> Then there's additional catalogs, so Glue catalog,
835.59 -> you can add catalogs that aren't covered by crawlers.
838.23 -> So there is a bit of a difference there.
840.9 -> Things like CloudTrail, Kafka,
842.43 -> for example with schema registry.
847.05 -> The next thing Glue Catalog do is they use
849.81 -> a built in classifier, so you can do popular data things
853.17 -> such as PII identification, you can write your own
856.11 -> custom classifiers with RoC and that allows you to enrich
859.68 -> your catalog with the technical metadata about what
863.34 -> that information is so you can better
865.14 -> protect that in the long run.
867.18 -> And by adding those patterns over time,
869.4 -> you start building up a library of them.
871.47 -> So going back to our business case,
873.03 -> let's say our business case is customer analytics
875.61 -> and I need to identify PII,
877.59 -> you can then apply that first PII rule
879.84 -> to other data sets that might have customer data in it
883.11 -> and leverage that over time.
885.3 -> The last thing is crawler is run on demand,
887.4 -> they can run on S3 events now and incrementally,
890.55 -> so you can just just have them run an S3 events,
892.65 -> run 'em incrementally and all you have to do is watch
894.75 -> monitoring alarms if they fail for some odd reason.
899.13 -> And the last thing is a couple new releases is AppFlow,
902.4 -> for example, AppFlow is not a crawler feature
904.56 -> but a catalog feature.
905.94 -> You can now populate your data directly from AppFlow
908.79 -> within features with an AppFlow as you're pulling data
912.59 -> in from their connectors.
917.43 -> The next area I'm super excited to talk about is what Adam
920.04 -> talked about this morning, which is Amazon DataZone.
923.67 -> In the catalog space we talked about technical catalog,
925.8 -> but what about business catalog here?
928.08 -> So Amazon DataZone is a new service
930.66 -> that enables customers to discover and share data
933.75 -> across your organizational boundaries, lines of businesses
937.29 -> with built in governance access controls.
940.137 -> And that removes all of the heavy lifting
942.3 -> when it comes to making data available
945.84 -> to everyone in your organization.
948.45 -> It improves the operational efficiency of managing data,
951.48 -> managing access controls, so that data teams focus
954.93 -> on working with data and not worrying about
957.9 -> who has access to what data when.
960.51 -> It's built on top of Lake Formation.
962.34 -> So you can actually extract and present that information
964.86 -> on who has access to what data.
966.81 -> You have centralized logging monitoring and it provides
970.59 -> you that single point without having
972.96 -> to worry about AWS Analytic services.
976.2 -> So super excited about this launch today
977.85 -> comes with four free key features as an organizational wide
981 -> business data catalog and you make that available
984.45 -> for the context of all your users
986.19 -> to find what data you have.
988.41 -> The second area is it has governance and access controls
991.29 -> with built in workflows for that example
993.54 -> we talked about earlier.
994.59 -> So data consumers find their data
996.54 -> within a single user interface.
997.86 -> They request access right from that interface
1000.32 -> and then that system has that workflow for giving
1004.52 -> those consumers access to that data, right?
1006.98 -> So they can use the tools within the simplified analytics
1010.19 -> portal that they've created with you.
1012.41 -> Out-of-the-box, it works directly with Redshift Athena
1015.59 -> for example, and other services that you'll learn about
1018.38 -> more over the upcoming week.
1020.48 -> The last thing is there's a data portal
1022.4 -> for an integrated data experience for users
1024.26 -> to promote their data exploration and drive innovation
1027.68 -> throughout your organization.
1032.63 -> Let's go to the last area which is automating
1034.82 -> sharing of data.
1035.93 -> We've done a couple things now we ingested our data,
1039.32 -> we know the quality, we know the technical metadata,
1041.84 -> we know the classification,
1043.7 -> we've added the business context.
1045.71 -> Now we need to actually share that information
1047.99 -> and share it at scale.
1050.72 -> Customers are adopting many patterns.
1052.94 -> Some customers that I talk to all the time might start
1055.28 -> with a single account central data lake that might work
1058.04 -> for them for the short term or even the long term.
1060.89 -> Over time they might move to a hub and spoke architecture
1063.92 -> where you have a central data lake, for example,
1065.87 -> and then you start adding extensible consumers
1068.6 -> to that information.
1070.49 -> Then there's customers obviously adopting data mesh.
1072.83 -> There are a couple talks on data mesh this week,
1074.72 -> I recommend you see them.
1076.85 -> Super exciting cutting edge about how customers are building
1079.58 -> interoperable data sets and leveraging AWS infrastructure
1083.6 -> to both provide that flexibility,
1086.21 -> but also protect that data at the same time.
1089.12 -> And the last pattern we're seeing
1090.5 -> is business to business data sharing.
1092.6 -> Obviously we've got Amazon Data Exchange,
1094.4 -> but also customers are learning different ways to share data
1097.52 -> and we'll talk about that in a moment.
1102.2 -> So how do we do this?
1103.34 -> The first thing we recommend is looking at
1105.41 -> the Lake Formation permissions model.
1107.45 -> Lake Formation provides database style,
1110.54 -> fine-grain permissions on your resources.
1113.15 -> Your resources are our tables and columns, for example.
1119.06 -> You scale your permissions model through Lake Formation
1122.81 -> tag-based access controls.
1124.4 -> We'll talk about tag-based access controls
1125.9 -> in the next slide.
1127.1 -> But what you're doing is moving from what we say
1130.13 -> resource based access controls, managing your tables based
1133.16 -> on your user community, to tags which are focused more
1137.21 -> of your energy on like what data sets
1138.95 -> they are and how do I scale giving people access to that
1141.92 -> data set based on what they are and what classification
1145.22 -> rules they provide.
1147.17 -> It provides unified Amazon S3 permissions,
1149.51 -> it's integrated with services and tools
1152.66 -> and also third parties are integrating with Lake Formation
1156.56 -> as well, and easy to audit permissions for access.
1161.51 -> Let's go a bit deeper into tag-based access controls.
1167.12 -> Imagine you have hundreds of databases and thousands
1170.84 -> of tables and tens of thousands of users.
1173.9 -> That becomes an awfully terrible matrix of access control
1177.35 -> permissioning that you need to manage
1178.79 -> for role-based access controls.
1181.01 -> So this is why we introduce tag-based access controls.
1183.527 -> And if you are using one of our original ways
1186.56 -> when we launched Lake Formation, it didn't launch
1188.3 -> with tag-based access controls.
1189.62 -> If you are not using it today and you're running into
1192.11 -> scaling challenges, we definitely recommend
1194.54 -> talking to your SAs, talking to us about how you can think
1197.93 -> about moving your access control permissions to tags.
1201.35 -> The first thing you want to do is define your tag ontology.
1204.14 -> So your tag ontology could be things like
1206.48 -> organizational structure or classification rules.
1209.99 -> And the next thing you do is you apply those tags
1213.68 -> to catalog resources, databases, tables.
1218.42 -> The higher you apply the tag,
1219.92 -> the more predominant the access is given, right?
1222.53 -> So you could provide it at the database level, for example
1226.67 -> they're hierarchical nature and if there are conflicts,
1230.24 -> the system resolves those conflicts.
1233.45 -> The last thing is you create those policies on those
1235.55 -> Lake Formation tags for IAM users and roles
1239.09 -> and active directory users and groups using SAML assertions.
1243.2 -> That's how you scale your access permission,
1246.08 -> let's put this in context to your architecture, right?
1250.49 -> So we're seeing customers adopt
1252.41 -> a few different patterns here.
1254.27 -> On the top left you have your business data catalog section.
1257.12 -> I'll simplify it 'cause these systems do a lot more
1258.92 -> than business data catalogs,
1260.12 -> so please don't take this too seriously.
1262.55 -> There's Amazon DataZone which does way more
1264.32 -> than business data catalog, but I just put it in a box.
1266.72 -> So entry point I find my business data
1269.597 -> and I need to request access.
1271.94 -> The next area is third party tools.
1274.19 -> You could be using third party tools to find data,
1276.59 -> request access and understand those data sets.
1279.23 -> Or there's open source solutions like data.all for example,
1282.59 -> that you could leverage.
1283.423 -> So there's a few different options
1284.42 -> we're seeing customers do today.
1287.57 -> The next thing you want to do is regardless of the option
1289.94 -> you're doing, so DataZone works out-of-the-box,
1292.4 -> but if you're working with a third party, work with them,
1294.86 -> I've seen many examples, I've actually blogged about one
1297.29 -> with Informatica about how to automate workflows
1300.95 -> within Informatica to drive your access control permissions
1303.83 -> down to your data catalog.
1308.24 -> Most of the ISVs can do those workflow processes
1310.97 -> and push those access permissions down.
1314.03 -> A lot of them can also read up your Glue data catalog
1316.37 -> to understand your classification rules.
1319.1 -> When you do that, you then push down and make sure
1322.79 -> that you're using Lake Formation for your catalog,
1325.73 -> your Glue data catalog for your catalog information,
1328.19 -> for your permission model, your policy control
1331.55 -> and access down to your data domains.
1333.68 -> It could be your S3 data lake or Redshift for that example.
1336.89 -> So that way when your users are coming in
1338.81 -> through various engines, whether or not it's Amazon, Athena
1342.32 -> or Spark and EMR, it's enforcing those policies
1346.64 -> consistently on how they requested access
1348.86 -> from their business context,
1350.09 -> It provides that end-to-end experience.
1353.84 -> With that, I'm actually gonna, oh, I have one more slide.
1356.42 -> Key takeaways and I'll hand it over to Shihas.
1359.66 -> Key takeaways on enabling agility with data governance
1363.02 -> and AWS is automate ingestion.
1366.05 -> Build up those patterns as part of your business strategy
1369.23 -> and understand how you could automate compliance controls,
1372.95 -> standard data quality rules so that data engineers can move
1376.61 -> at the speed of your business.
1378.35 -> The second thing is automate classifying, cataloging
1381.56 -> and profiling your data on the way in,
1384.98 -> so that your catalog understands the technical information
1387.89 -> and your business catalog starts to understand
1389.81 -> what is the classification rules you need to apply
1392.42 -> when you're giving people access.
1394.46 -> The third area is automate the management
1396.86 -> through tag-based access controls.
1398.87 -> And make sure you can scale through your enterprise
1401.24 -> so that your data is protected throughout its life cycle
1404.84 -> and you're not consistently managing permissions every time.
1408.29 -> And the last thing is automate data sharing
1410.39 -> through one interface,
1411.29 -> whether or not it's through Amazon DataZone or ISV partners
1414.98 -> or open source solutions you build on your own.
1417.44 -> Automate that experience so as user comes in,
1419.9 -> finds the data requests access from a single user interface
1423.44 -> and that they know what they're getting and they know
1425.6 -> their data's protected on the way in.
1427.55 -> So with that, I'm gonna hand it over to Shihas
1430.19 -> to tell his story, thank you.
1432.138 -> (audience applauds)
1441.692 -> - Can you hear me well?
1444.47 -> Right. I'm Shihas Vamanjoor from Prudential Financial.
1448.16 -> My team made a request for me to open this 30 minute
1452.45 -> presentation in their words.
1454.16 -> Here goes, over the next 30 minutes you will be introduced
1460.19 -> to the next generation of data platforms,
1463.67 -> a fully self-creating, self-organizing
1467.21 -> and self-managing platform
1469.37 -> we call the autogenic data platform.
1472.52 -> Folks in today's audience are among the first
1475.43 -> outside of Prudential to learn of this innovation.
1479.06 -> This is the future of data platforms
1481.457 -> and the future has already arrived at Prudential.
1485.54 -> Harpreet and Zane from the data platform team,
1488.18 -> are witness for me having said this at re:Invent.
1492.307 -> (audience murmurs)
1499.16 -> A little bit of background,
1502.64 -> I'm from Prudential Financial Services.
1504.95 -> Prudential is a global leader in financial services.
1507.83 -> We serve both institutional as well as individual customers.
1511.22 -> We are in over 50 countries, 50 million customers,
1515.47 -> 40,000 employees.
1517.16 -> Let's focus on the 40,000 employees for the rest
1519.41 -> of the conversation.
1522.56 -> Me, I'm situated within the Prudential chief data office.
1526.46 -> I'm the Product Owner for Enterprise and Data Platforms.
1530.06 -> The mission of the data platform is a big and bold one.
1534.29 -> Number one on our mission statement is to democratize data
1539.15 -> to increase the value creator base within (indistinct)
1544.4 -> that's the number one thing.
1545.99 -> Other pieces, increase velocity of data innovation,
1550.07 -> reduce cost, reduce risk, so on so forth.
1555.23 -> Now, who does this data platform serve?
1560.87 -> At Prudential in the multiple lines of business,
1563.63 -> both domestic and overseas, we have a large community
1568.25 -> of technical and non-technical users.
1571.55 -> Technical users are data scientists, data engineers,
1576.41 -> data analysts, data stewards,
1578.48 -> business intelligence professionals,
1580.04 -> machine learning engineers, so on, you get the drift.
1582.95 -> Non-technical users, line of business managers,
1586.97 -> business initiative owners, executive.
1590.51 -> What do they do with data?
1592.64 -> They want to exploit data to create business value.
1596.42 -> What are the common challenges?
1598.31 -> On the slide is some of the common challenges.
1600.62 -> These challenges all lead to long time-to-value
1604.34 -> First of them, hard to locate data.
1606.47 -> Why is this hard? Why is this such a challenging problem?
1611 -> Our companies are growing continually.
1614.24 -> Data is also growing continually, exponentially.
1617.24 -> So we have a large number of systems and knowledge
1620.9 -> about these systems is usually tribal.
1623.24 -> That basically means the owner of the system is the one
1625.88 -> that you tap on the shoulder to dig out information
1628.46 -> about this data, time consuming process.
1630.74 -> Now when you start to multiply the number of systems
1633.68 -> that you have to deal with.
1635.03 -> Second, long time-to-access.
1637.55 -> Since there are a large number of systems,
1639.32 -> you got to go through governance hoops.
1641.6 -> You got to go from system one, two, three, four, five,
1644.66 -> bring it to another system,
1646.25 -> another set of apples to deal with.
1648.8 -> Third, lots and lots of human engineering.
1652.28 -> Since we have so many systems in the source site
1656.75 -> and the technology space is constantly, rapidly changing.
1661.04 -> You have a lot of technology requirements
1662.99 -> in terms of understanding this.
1664.67 -> You gotta be a superman engineer to try and extract data
1668.84 -> from all of these systems, bring it to another system.
1671.72 -> Complex governance.
1672.71 -> I talked to this about the number of systems
1674.87 -> and how governance become complex.
1676.55 -> And finally there's a lot of tedious and repetitive work
1679.91 -> before you actually start to do
1681.38 -> what you really want to do, right?
1685.43 -> So when I started this journey of building out
1689.03 -> this enterprise data platform, I went on a listening tour,
1692.78 -> two eyes, two ears, one mouth,
1695.33 -> keep the mouth shut kind of tour.
1697.37 -> So I was listening to what they really wanted.
1700.04 -> Here are the desired experiences.
1703.04 -> One, they said, can you make it simple for me to discover
1708.29 -> data within this entire enterprise architecture
1711.17 -> wherever data lies really don't care?
1713.99 -> Is it possible for you to deliver
1716.6 -> to me a e-commerce like experience?
1719.63 -> I just wanna shop for data products.
1722.72 -> No, this is not a shoe or a watch, it's a data table.
1726.38 -> It's a data view, a file, I wanna shop for it.
1729.98 -> Second, I really don't care what these systems
1735.05 -> that house the data are in.
1736.43 -> I don't wanna be bothered about pulling data
1739.04 -> from system X to system Y,
1741.38 -> data shipping is not a heroic job.
1745.31 -> There's no glamour in it anymore.
1747.26 -> It's very hard to convince people to do that.
1750.86 -> So this no ETL is a no-brainer, let's second.
1754.94 -> Third, why can't you build me a system
1758.27 -> where you optimize my time as a data value creator
1762.38 -> so I can focus on actual value creation
1765.2 -> rather than spending 90% of my time bringing data
1768.14 -> over from arcane systems, cleaning it, massaging it,
1772.04 -> and getting it ready so that I can create value?
1775.67 -> Finally, if this data is in a system
1778.19 -> that I can get access to, how do you make it accessible
1781.1 -> to me through a few mouse clicks?
1783.2 -> Those are the challenges.
1785.81 -> This is from a very initial draft of our concept.
1790.52 -> I said, I hear you.
1792.17 -> So what you're really asking for is the genesis
1795.32 -> of an autogenic data platform,
1797.84 -> a platform that creates itself,
1800.12 -> and you, the value creator are the master,
1805.97 -> not the platform engineering team,
1808.28 -> not a data engineering team,
1809.93 -> but you have the keys to this entire ecosystem.
1814.52 -> How would it work?
1816.11 -> The autogenic data platform has to catalog and discover
1820.25 -> at scale, all the data assets that you have
1822.32 -> within the enterprise, right?
1824.03 -> It's gotta make it available in a marketplace,
1825.92 -> that's number one.
1827.39 -> Once it is in a marketplace, what do you do next?
1831.02 -> You use an order fulfillment request system.
1834.05 -> So you press those buttons,
1835.7 -> make sure the data goes over
1837.11 -> to a data and analytics platform where you can exploit it.
1840.23 -> Number three, you do not want to do manual data governance,
1845.27 -> manual data engineering, none of those things.
1848.12 -> You expect the system to take care of it for you.
1851.72 -> So you bring the data through automated data governance
1854.39 -> data previously through a highly automated
1857.84 -> data processing engine into a zone.
1860.66 -> Finally gimme another set of order fulfillment requests
1864.02 -> in the same marketplace so I can get access to other data.
1867.65 -> This was the ask.
1869.3 -> At this point, I had a few important questions
1872.27 -> that I need to answer.
1873.74 -> Looking at this end state vision, I had to decide,
1878.99 -> number one, which partner would I choose?
1883.85 -> There's a series of services that you need,
1886.67 -> intricate engineering that you need to put together
1889.58 -> and then hide it all away, that's the goal.
1893.21 -> Number two, what style of architecture would I use?
1900.56 -> What method of development would I use for the first?
1905.69 -> It became quite clear to me
1907.46 -> of all the cloud service providers,
1909.35 -> the most mature one with these services was AWS.
1914.87 -> So we made the selection to go with AWS.
1918.89 -> Second, for the partner, we chose AWS professional services.
1922.91 -> They're closest to the technology and this was a design
1925.88 -> that was not trivial.
1928.31 -> Finally, we chose Agile as the method of development.
1931.49 -> I told my team that we would deploy the initial version
1936.32 -> of the autogenic data platform into production
1938.63 -> within three months from which time onward,
1941.24 -> every release would be a sprint, which would be two weeks.
1945.05 -> And the reason we did this is we always found
1947.81 -> that building with the expectation of someone coming
1950.93 -> to use it, never works in practice.
1954.29 -> So that's the method that we chose.
1957.44 -> Fast forward nine months, what do we have?
1960.83 -> We have a completely automated, hyper automated system.
1966.98 -> We have a marketplace.
1968.84 -> We use a front end technology for the marketplace,
1971.54 -> which I'm glad to see, Jason's presentation that Amazon's
1975.68 -> done some wonderful work, which we will reverse engineer.
1979.67 -> We have some front-end technologies for now,
1982.79 -> which take the place of the marketplace,
1985.04 -> which does the order fulfillment.
1987.32 -> We have a Lake House construct,
1991.22 -> a construct of bringing data from whatever pattern,
1995.54 -> whether it is files, whether it's JDBC,
1998.15 -> whether it is bad streaming,
1999.86 -> all the fancy bells and whistles that people usually want
2003.19 -> in a data platform, we have all of that,
2004.81 -> but incrementally we built that over.
2007.27 -> Now what we then went forward and said,
2009.73 -> the true value is in shifting engineering right.
2014.35 -> So we then said, okay, let's look at the key aspects
2017.2 -> of any data and analytics platform.
2020.02 -> Let's do things like data quality.
2023.5 -> Let's do things like data standardization, PII detection,
2028.18 -> change data capture and management.
2029.8 -> All within an automated environment
2032.62 -> that no human being has to write code for.
2035.86 -> That was the goal, so over a period of 12 months,
2039.85 -> we've achieved this goal.
2041.53 -> We are not at the completion of the journey,
2044.14 -> the journey is still a long one,
2046.12 -> but the usage has been significant, right?
2048.01 -> From month three, we've been able to onboard different
2051.67 -> data journey teams to the platform and now we use a process
2055.24 -> of user generated demand to build any new features.
2059.83 -> Now, one would think that this requires an army of engineers
2063.37 -> and Prudential is a global giant.
2065.11 -> You would have access to an army of engineers to build this,
2067.66 -> not true, small core team.
2070.66 -> They're sitting in the first row.
2071.8 -> We got Zane, Harpreet, Samir,
2073.81 -> these are the cloud and data engineers along with ProServe.
2077.62 -> Jason was part of the original ProServe team
2079.78 -> that helped build this out.
2081.88 -> Dozen or so people, under a dozen,
2084.01 -> including management was all it took,
2086.65 -> great effort from the collective at Prudential.
2089.35 -> They provided all the collective muscle
2091.69 -> to make this happen, because this was a common shared goal.
2097.15 -> In terms of architecture, let's get down into a level
2099.97 -> of detail that is necessary for this conversation
2102.7 -> and it ties closely to what AWS and Jason
2105.46 -> have been talking to in the previous presentation.
2109.96 -> Number one, here's a representation of sources,
2114.37 -> logical representation.
2115.54 -> And this is what you are likely to see
2117.79 -> in your own enterprises, it has everything.
2120.22 -> Data center based databases, file systems,
2125.26 -> SaaS applications like Salesforce and ServiceNow, APIs.
2129.73 -> Basically source of first party, second party,
2132.46 -> third party data, wherever it may be.
2135.46 -> The second part of the standard architecture
2138.46 -> is ingestion service layer.
2141.43 -> This is where incrementally, the autogenic data platform
2146.38 -> has features that allow for different types of sources
2150.49 -> to be handled differently but intelligently.
2153.94 -> So you start with flat file data transfers,
2157.69 -> okay, here's AWS data sync.
2160.45 -> But here's the difference, nobody needs to know
2163.72 -> how AWS data sync works.
2166.54 -> That's coded into the automation.
2169.18 -> You provide the order that says, I am a file source.
2175.69 -> The autogenic data platform detects data file sources
2179.77 -> best suited for a data sync transfer program.
2184.75 -> You are a JDBC source, so I'm going to use Glue JDBC,
2189.28 -> oh you are Salesforce, I'm going to use AppFlow.
2195.25 -> So that's the detection mechanism
2197.56 -> that's built into the programming.
2199.72 -> Now as you go rightward, you'll start to see persistent
2204.07 -> layers in the architecture come through.
2206.62 -> Persistent layers are both a lake as well as a house.
2210.82 -> Now up to this point, what you have to imagine is
2214.48 -> there is no human being writing any of this code.
2216.94 -> Not for infrastructure, not for the services,
2219.88 -> not for the pipelines, not for the orchestration.
2222.22 -> All happening by the application.
2224.5 -> The application has taken control.
2227.29 -> So it is self-creating.
2228.64 -> So what happens next is data is now moved
2231.82 -> through this persistent layer from raw to standardized,
2238.87 -> applying the data quality rules, the governance rules.
2244 -> Change data management and change data capture
2246.07 -> is embedded within this pipeline.
2249.85 -> And then the data is now available to the curator.
2252.82 -> But at this time, from a governance standpoint,
2255.04 -> one would thing, hey, is this data all accessible to people?
2258.25 -> No, the process has decoupled access provisioning
2262.33 -> from data movement.
2263.77 -> None of this data is actually accessible.
2266.74 -> Now comes the central governance,
2268.96 -> which is what Lake Formation is for.
2271.39 -> We centralize governance to all the assets
2274.24 -> through Lake Formation, one single point of access
2278.02 -> provisioning to this entire ecosystem.
2280.87 -> Finally, we have all kinds of consumption patterns
2285.76 -> that get access through this Lake Formation
2288.85 -> central governance layer to this entire lake house.
2292 -> Now what I didn't talk is just as important
2295.09 -> as what I just said.
2297.28 -> What we have done thus far
2299.92 -> is hyper automate the journey to a standardized layer.
2303.94 -> The journey of data exploitation does not end there.
2307.54 -> It begins there, but now is the value creation exercise.
2311.83 -> Now you have these highly skilled resources,
2314.14 -> the talented folks who can use this ecosystem
2317.98 -> to refine the data.
2320.32 -> Now I have another innovation project in the works,
2323.14 -> which is trying to machine this curation as well.
2325.93 -> So that regular refinement to create a data product
2330.37 -> is the focus of my data journey teams today
2333.91 -> and this architecture has allowed them to do so.
2337.15 -> Let's unpack this a little bit more.
2341.89 -> Here's a sample user onboarding experience, right?
2347.47 -> This is how users onboard data to our platform today.
2352.69 -> Here's a data athlete, technical user, non-technical user,
2356.23 -> as long as you can use the browser, you're welcome.
2360.37 -> You go scan this marketplace, it looks like Amazon.com
2364.69 -> or whatever your favorite e-commerce site is, right?
2367.6 -> You get a collective that is a bunch of objects
2371.56 -> or a single object, it's your choice.
2374.2 -> You then say,
2375.25 -> I want this object to be refreshed at a given frequency.
2382.48 -> You then said, submit order.
2386.26 -> Internally, the metadata generated by your order
2390.91 -> is the heart of this intelligent platform.
2393.61 -> This uses all the metadata that's been collected so far
2397.75 -> to provide instructions to this auto generating system.
2404.11 -> Auto generation actually creates the infrastructure code,
2407.83 -> that's the first order of business for it.
2410.44 -> If that is an order that is coming in for the first time,
2414.55 -> all the infrastructure necessary is created dynamically
2418.66 -> because the metadata the manifest contains the information
2422.38 -> that is sufficient for you to have a targeted creation
2425.92 -> of both infrastructure and pipeline and orchestration.
2430.21 -> Post that, it then identifies the necessary elements
2434.98 -> to actually execute on this.
2437.86 -> So as an example, when I process an order for a database
2443.98 -> for a file system to be transported,
2447.25 -> the first time creation of the infrastructure takes place,
2450.85 -> subsequent runs of that infrastructure pipeline
2454.39 -> are now available as an order fulfillment request
2457.69 -> that this entire automation takes care of.
2461.29 -> That's how this whole thing works.
2464.29 -> So in a sense, what we've now been able to do
2467.777 -> is to say, for the foreseeable future,
2472.06 -> run this kind of pipeline moving this kind of data
2475.54 -> from these systems to a secondary system
2478.78 -> for data and analytics.
2482.47 -> And the system is responsible for the orchestration,
2486.43 -> the constant redelivery, the remediation, the notification.
2490.54 -> All of that is now built into the system
2492.67 -> with embedded governance, embedded data quality,
2495.76 -> embedded everything.
2498.16 -> That's what hyper automation is all about.
2501.04 -> Now, if you take an example of a user experience for access,
2508.9 -> similar, nothing much different, the toys are different.
2513.85 -> What we use as AWS services are different,
2518.68 -> but the experience for the user is the same.
2521.71 -> He doesn't need to see this.
2523.9 -> Like I said,
2524.733 -> the heroics belong to the data engineering team.
2529.33 -> Vendors will provide all the necessary services,
2533.2 -> it's up to us, as users of those services
2535.84 -> to determine the experiences that we deliver.
2538.66 -> The experiences that we chose to deliver as a platform team
2541.75 -> is an abstracted experience.
2543.52 -> We try to figure out why is it necessary
2545.77 -> that you have to do something.
2547.42 -> The answer usually is you don't need to.
2549.7 -> So if the answer is you don't need to,
2551.32 -> we hide it behind a browser.
2554.41 -> So you can focus on the actual value add.
2557.74 -> Similar experiences you will find in data warehouses.
2560.74 -> The reason why I have this extra slide,
2562.63 -> is to tell you that no matter what kind
2565.21 -> of infrastructure aspect that you have underneath today,
2568.273 -> I have a data lake or data warehouses,
2570.73 -> tomorrow I have a graph database.
2572.89 -> Tomorrow I have another kind of repository,
2575.47 -> doesn't really matter.
2577.45 -> The method is the same. The ideas are the same.
2580.81 -> The way you do it is exactly the same.
2585.67 -> Now, key lessons learned.
2589.04 -> This is from our journey over the last nine months.
2593.02 -> You know it's going to differ from journey to journey.
2595.84 -> Super important to have a clear vision of end-state
2599.8 -> and this is important because there's too many shiny tools
2602.74 -> in the market, too many distractions.
2605.41 -> You're going to hear about
2607.15 -> do this with domain driven design.
2609.91 -> Do this with application oriented design.
2614.14 -> Yeah, those are important, relevant topics,
2617.59 -> take away from the real focus,
2619.27 -> which is you stay focused on your users.
2621.94 -> Who are they? In this case, they're internal.
2624.61 -> According to our CEO, two classification of employees,
2628.21 -> colleagues who help customers directly,
2630.58 -> colleagues who help colleagues who help customers.
2634.57 -> My customer is internal.
2636.31 -> The product that they build
2637.63 -> will be used by external customers.
2640.21 -> I've got to stay focused on them.
2642.46 -> So I gotta do everything that helps them do things
2645.49 -> at a faster clip, at a lower energy expenditure level.
2651.1 -> That's my focus, so stay focused on your user.
2653.98 -> It's super easy to get lost in this jungle of innovation
2658.09 -> that every vendor is out there peddling.
2661.24 -> Second, automate governance at every step.
2665.53 -> You will hear that governance is a beast, hard to automate.
2672.13 -> Tackle that head on. Ask what exactly is hard to automate?
2677.23 -> Data quality routines? No.
2679.81 -> Configurable, you apply a rule set.
2684.16 -> What else is hard to do, data ownership? Nope.
2687.46 -> Embedded at an object level like we have
2689.65 -> every asset within our ecosystem, every table,
2693.73 -> has a data owner embedded in it.
2696.1 -> What does that do? Helps you approve things quickly.
2699.94 -> Helps you understand ownership models quickly.
2702.55 -> So question anything that takes away from hyper automation.
2707.17 -> Third, this is very important as well.
2709.63 -> You need to build a small core talented team.
2714.25 -> They need to do two things.
2716.32 -> Number one, they're responsible for feature development.
2720.46 -> This feature development is driven by user demand.
2724.18 -> You don't try to build for the future
2725.83 -> without knowing your actual user demand,
2727.9 -> so you gotta be super close to the user.
2729.58 -> At the same time, while they build these shiny new features,
2732.67 -> it's also super important that they support users
2734.86 -> who are currently using the platform.
2736.33 -> So you got to have two parts to the team.
2739.12 -> One that keeps the light on, keeps remediating,
2741.64 -> keeps fixing, the other is to build out the new feature.
2747.58 -> The other tried and tested value
2751.45 -> in trying, value and failing.
2754.72 -> There's a lot of lip service in enterprises
2758.14 -> about fail fast, be brave.
2761.59 -> But when it comes to impacting your timelines,
2764.32 -> your deliveries, it's not looked upon kindly.
2766.96 -> In our experiences, there is no escaping that fact,
2770.59 -> you simply accept it, this is the cost of doing business.
2774.43 -> You're going to try with numerous AWS services
2777.55 -> or other cloud services for that matter.
2779.59 -> You're going to try combination of things.
2781.81 -> There are certain things that you don't want to invent.
2784.51 -> You just don't want to invent authentication,
2786.79 -> you don't want to invent authorization,
2789.04 -> you don't want to invent encryption, lots of things.
2792.46 -> You want to take services that have already been created.
2797.35 -> But how you glue them together,
2799.57 -> you'll have to test it yourself.
2802.39 -> Now the reason why I bring this forward is, I posit,
2806.26 -> that you have to create an engineering team for yourself.
2810.79 -> You have to treat this as an application.
2813.43 -> Historically, data platforms
2815.38 -> have not been treated as applications.
2817.9 -> They usually have been approached as a data product
2821.98 -> or a data platform product by itself with some integrations.
2827.14 -> The change here is, we talk about an autogenic data platform
2830.98 -> being an application and here's a product owner
2833.74 -> and here's a development team for that application alone.
2836.5 -> No different than any software product.
2839.56 -> So this is a software engineering team
2842.2 -> internally servicing it for the enterprise customers.
2846.79 -> There is a super important construct too, remember.
2852.52 -> Now in terms of outcomes, everybody loves outcomes,
2855.727 -> and everybody has these OKRs that measure
2862.03 -> the value of what you have done.
2864.52 -> For us, it's quite clear that this is the winner.
2869.71 -> The increased talent pool, what does that talk to?
2873.25 -> That talks to now, us not having this class system.
2878.92 -> Oh, you don't know AWS Glue, do you?
2882.04 -> You can't participate in our ecosystem.
2884.11 -> No, we don't have that conversation anymore.
2887.53 -> You can use a browser? You're welcome.
2889.99 -> What more do you need in order to be successful?
2892.6 -> You need that browser to have SQL embedded, a query engine?
2895.99 -> Okay, the platform teams hear that, to support you.
2899.71 -> You need Excel embedded within the same data portal?
2902.47 -> So be it, we deliver to you.
2905.419 -> We don't levy a tax, a learning tax.
2909.49 -> Historically, platforms which have not been created
2912.85 -> through this vision apply these taxes.
2915.52 -> There's a regime of taxes everywhere.
2917.8 -> You pay repeat taxes every day in your work life.
2921.85 -> The mission of this platform is to eliminate those taxes.
2927.22 -> Second, time savings.
2930.1 -> Our original goal was to shift right
2932.92 -> as much of human engineering as possible.
2936.43 -> Current state of the platform has shifted.
2939.4 -> The first point where human engineering
2941.35 -> starts to be involved is when you have to curate
2944.47 -> or refine the data.
2945.97 -> So you've got the taxes all the way to that standard layer
2949.18 -> of your lake house architecture
2950.86 -> through this autogenic data platform.
2954.61 -> Third, data access.
2957.46 -> That's been cut from days to minutes now,
2959.8 -> it's within the application,
2960.88 -> it's within the same data marketplace.
2962.53 -> It's a application that is governed, it is auditable,
2966.49 -> it tells you who granted access to who and for what data.
2970.15 -> All of that information is present within the tool itself.
2974.56 -> Cost savings.
2976.39 -> When you release such an application
2978.7 -> to the enterprise, you decrease the appetite
2981.88 -> for building bespoke data solutions
2984.61 -> because now you have to compare it against something
2987.88 -> that has so many features and works so well.
2992.601 -> Success is super important for that reason that it decreases
2996.1 -> the appetite for folks to try this on their own.
2998.35 -> There's no harm in trying things on their own.
3000.03 -> But then you would see the engineering that is necessary
3003.27 -> to create a holistic integrated system
3005.85 -> that's super important to realize.
3011.16 -> Now, in terms of governance, this was a governance topic,
3015.63 -> so I want to talk to a certain aspect of governance as well.
3019.8 -> Trust in data gets better when human beings
3024.15 -> don't finger the data.
3026.76 -> How far right can you move that? That's the question.
3031.62 -> How can you report on the hops of data
3035.22 -> that's taking through an automation?
3038.61 -> How much of this quality stuff can you move
3041.4 -> towards the left of the equation
3042.81 -> which is the source of data?
3043.977 -> And how much of it can be subject to the automation?
3046.71 -> So it improves trust in data
3049.2 -> because you have not touched the data
3051.12 -> up to a certain aspect.
3052.47 -> Now you can also focus your energies
3055.29 -> as a data governance organization on the actual place
3059.37 -> where data is changed, which is the refinement zones.
3063.87 -> If our innovation is successful,
3065.49 -> we'll move it even more to the right.
3067.89 -> But for now there is a narrow focused area
3070.83 -> which is either on the left side of the entire
3074.01 -> architecture within the source
3075.75 -> or, on the right side within the curation zones
3078.6 -> where you can now focus on, your focus is not spread.
3083.01 -> And lastly, because of the popularity of the platform,
3087.18 -> we are seeing reduced data sprawl.
3090.15 -> We are seeing more and more multi-tenant teams,
3093.6 -> data journey teams, come and use the platform,
3097.11 -> it's also integrating our data.
3098.79 -> So it's subjective whether that was one of the primary views
3102.45 -> because we still live in a very federated business model.
3105 -> But the goal for us is to reduce the data silos.
3109.77 -> I know I've said a lot for additional thinking
3115.92 -> around this topic, also published a Medium
3119.28 -> and LinkedIn article,
3120.39 -> under the header Autogenic Data Platforms.
3123.3 -> Feel free to hit me at my social media handle at LinkedIn,
3127.92 -> happy to collaborate.
3129.36 -> At this point, I think Jason and me are happy
3132.21 -> to take questions.
3134.444 -> - [Jason] Two more slides. Yeah, two more slides.
3137.464 -> - Sure. - Sure.
3138.87 -> - So just two more slides. How do I get started?
3144.27 -> There's three programs that AWS offers
3147.24 -> to help you get started.
3149.07 -> The first one is if you want to build a data strategy,
3152.31 -> we have our Data-Driven Everything program.
3154.56 -> Feel free to reach out to your SAs or your account team
3157.95 -> to understand more about that program.
3160.14 -> The second one is Data Labs,
3161.64 -> is have that strategy and help executing it.
3165.15 -> That would be our data lab team.
3166.41 -> And the third one is if you need help with implementation,
3169.41 -> there's obviously ProServe
3170.45 -> in our great partner community as well.
3172.65 -> So those are our first ways to help you get started
3174.87 -> with data governance.
3176.34 -> We do have a new workshop that both our our D2E,
3180.45 -> our Data-Driven Everything team and our ProServe team
3183.42 -> can help you execute, it helps you understand
3185.43 -> where you're at with data governance.
3187.11 -> So please reach out to your account team to understand
3190.2 -> where you're at in your journey with data governance
3192.03 -> and they're happy to help you with that.
3193.83 -> And the last thing is getting started.
3196.2 -> Think big, use that Discovery Workshop and data governance.
3199.77 -> Use things like Data-Driven Everything.
3202.05 -> You want to think big, but you wanna start small.
3204.72 -> How does this apply to your business strategy?
3206.97 -> How do you use tools such as Data Labs
3209.43 -> and ProServe through POCs?
3211.71 -> And then how do you scale fast through partners in Proserve?
3214.83 -> So thank you, we're happy to take a couple questions.
3218.73 -> We have a few minutes if people have questions.

Source: https://www.youtube.com/watch?v=vznDgJkoH7k