AWS re:Invent 2022 - Enabling agility with data governance on AWS (ANT204)

Aug 16, 2023

AWS re:Invent 2022 - Enabling agility with data governance on AWS (ANT204)

Data governance is the process of managing data throughout an end-to-end process, ensuring its accuracy and completeness and making sure it is accessible to those who need it. Join this session to learn how AWS is delivering comprehensive data governance— from data preparation and integration to data access, data quality, and metadata management—across analytics services.

Learn more about AWS re:Invent at https://go.aws/3ikK4dD.

Subscribe:
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4

ABOUT AWS
Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.

AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

#reInvent2022 #AWSreInvent2022 #AWSEvents

Content

0.023 -> - Welcome to Enabling Agility with Data Governance on AWS.

4.17 -> Very excited to talk to you about data governance.

6.27 -> I don't think I've said that before,

8.34 -> but today it's gonna be a good talk

9.75 -> between myself and Shihas from Prudential, so thank you.

13.35 -> I run our Lake Formation product

14.97 -> as well as the Glue data catalog here at AWS.

17.742 -> I've been with Amazon for about eight years.

19.71 -> I was running our analytics practice

21.18 -> in professional services for most of that time.

24.27 -> We're helping customers build data lakes,

26.79 -> help them migrate their data platforms to AWS

29.37 -> and become data driven.

31.32 -> Let's go see what we're gonna talk about today.

34.08 -> The first thing we're gonna talk about

36 -> is how does data governance help you become data driven?

40.17 -> The second thing we're gonna talk about

41.37 -> is data governance patterns within AWS analytic services.

45.6 -> And then Prudential Financial will come up and talk about

47.91 -> their data journey and how they actually did it for real

50.34 -> versus me just talking about some PowerPoint examples.

54.99 -> So let's start first with data driven themes

57.99 -> we hear from customers and I like to start

60.96 -> with this about data governance

63.3 -> because it's so critical hand in hand.

66.36 -> The first thing you wanna learn is the top part of it

69.777 -> and the bottom part of it are coupled in a way.

72.39 -> But the top part is more about business context

75.84 -> and the bottom part is more about people process technology.

79.23 -> So what customers tell us they want to be data driven,

81.54 -> they struggle with the few areas and these are consistent

84.06 -> themes across many customers we talked to.

86.85 -> The first thing is understanding what great looks like.

89.37 -> And this isn't about just building a roadmap

91.68 -> or understanding what I'm gonna do next.

93.15 -> It's actually what is great

94.29 -> and what will delight your customer.

96.63 -> At Amazon, we talk about working backwards

98.73 -> or writing PR, FAQs to go through that process

101.25 -> and really articulate what is great.

103.23 -> And that's your first start in your journey

105.27 -> with data governance as well,

106.32 -> and we'll tie these together in a moment.

108.3 -> Once you know what's great is, looks like you prioritize

111 -> those use cases and you create sponsorship

112.86 -> just like you would with any other initiative.

115.71 -> On the bottom row, is when it fits into data governance.

119.73 -> As you start building these use cases

121.56 -> to solve business problems, you integrate things

124.89 -> like data driven culture, data literacy programs,

127.8 -> understanding data definitions, you focus on gaps of skills.

131.37 -> Maybe you need to introduce a new tool

132.99 -> that users need to understand.

134.85 -> And the last thing is security's number one,

137.19 -> if we're gonna drive business agility with data governance,

140.79 -> we need to make sure your data is protected.

142.71 -> Secure compliance controls are built

145.59 -> because your customer and users or data

148.35 -> need to make decisions at the speed of their business.

150.99 -> And without protecting data at that same speed,

154.08 -> you could create additional business risk.

158.01 -> So some other areas that we've seen

162.36 -> other publications talk about is, for example,

165.15 -> Forbes gave some statistics,

167.01 -> 85% of businesses want to be data driven,

170.37 -> but yet 37% have really struggled with it.

173.19 -> And why have they struggled?

174.48 -> And you see some quotes around from IDC.

177.21 -> Data governance is a core struggling area

179.43 -> for becoming data driven.

180.93 -> So it's no longer optional is what IDC says.

184.23 -> For enterprise organization to get the most value

186.93 -> of their data, to make decisions,

188.91 -> they really need to treat data as an asset.

191.85 -> The second part is organizations lack knowledge

195.45 -> of efficient and effective data governance activities.

198.21 -> And 30% of the time is wasted doing data governance things,

201.87 -> data definitions, data management, data security.

205.26 -> And these are areas you can definitely automate

207.24 -> that we're gonna talk to in a few minutes.

210.24 -> Let's put some definition around this.

212.49 -> This is how we define data governance here at Amazon.

215.16 -> There are thousands of definitions of data governance,

217.65 -> so I'd love to hear yours as well,

219.39 -> but this is how we define it,

220.807 -> "Data governance is the collection of policies,

225.24 -> processes and systems that organizations use

228.57 -> to ensure that quality and appropriate handling

231.06 -> of their data throughout its lifecycle

234.54 -> for the purpose of generating business value."

237.96 -> And to me what really sticks out the most is a few things,

241.08 -> one of which is the last line, generating business value.

244.62 -> We do not want to do any work in data governance

247.5 -> without a business value associated with.

249.9 -> And let's go a little bit deeper into that.

253.44 -> So data governance starts with business.

258 -> The first thing you want to do

259.29 -> is tie it to your business strategy.

261.72 -> And I'll tell a bit of a story.

263.64 -> So I'd say like 15 years ago I got very excited,

266.19 -> I was managing a customer data system,

269.07 -> it was probably pre-sales force at the time,

271.29 -> and I'm like, our data's really bad,

273.36 -> I need a data quality tool.

275.61 -> And I went up to my management, I'm like

276.96 -> we're gonna go buy a data quality tool,

278.34 -> it's gonna be X hundreds of thousands of dollars

280.05 -> and we're gonna solve our customer data problem.

282.69 -> It wasn't funded and I didn't know what to do.

285.72 -> I'm like, I can't understand why this wasn't funded.

287.64 -> I wasn't really thinking about it.

289.62 -> We didn't have a customer analytics problem to solve

292.47 -> at the time, you'd think that most customers

294.48 -> had that problem, but we didn't have that.

297.36 -> We knew our customers, our business

299.4 -> wasn't the biggest company at the time I was dealing with,

301.83 -> our business people knew our customers really, really well

304.89 -> and they weren't going to invest in a data quality tool.

307.5 -> So what's important here is understand

309.63 -> your business challenges and start your data governance

312.72 -> journey with that.

314.52 -> Once you know those business challenges,

316.92 -> you have data consumers.

318.96 -> Data consumers are those that need to use data

321.42 -> to make decisions.

322.74 -> So those are the folks that will understand what data

325.17 -> they're gonna ask for, what data they're gonna look for,

326.91 -> and what metrics they need to solve business problems.

330 -> They're gonna choose to inform data producers.

332.91 -> So data producers could be data owners, system owners,

337.17 -> data lake admins, here is the data I need to do

340.47 -> to solve my problem that ties to the business strategy.

342.6 -> How can you help me?

344.16 -> And that's where data governance really kicks in.

347.85 -> So this is our data governance, I guess bracelet,

352.41 -> charm slide that we talk through.

355.53 -> And there's two areas, there's the top portion of the slide

358.41 -> and the bottom portion of the slide.

360.66 -> So we've been talking about schema-on-read for a while

365.16 -> here in this industry.

366.51 -> Schema-on-read's an interesting concept

367.95 -> 'cause you definitely wanna just put the data in

369.45 -> and people can get value really quickly.

371.4 -> But does that really help a data user make decisions faster?

376.05 -> What you wanna do here is the top part as a data consumer

379.59 -> is asking a data producer for more information.

383.07 -> Those producers should ingest that data quickly.

386.4 -> They need to automate classification,

388.32 -> automate data profiling, automate data quality

391.11 -> and secure and encrypt that data before they give access.

394.35 -> And that brings that real understanding

396.09 -> of what that data is, so that users can quickly be provided

400.02 -> access to that information.

402.03 -> For example, data classification.

403.53 -> What if your data had PII and the producer of data

406.65 -> didn't really know what their role is in the organization

409.59 -> to protect compliance controls or GDPR?

413.4 -> Automate data classification on the way in.

416.34 -> The last thing is catalog, so before giving access to data,

419.94 -> you want to catalog your information,

421.77 -> bring that technical and business context to your data

425.1 -> and then give users access to that to make decisions.

428.79 -> I was at a talk at Amazon a couple years now

431.28 -> and they talked a bit about the value of data

433.86 -> and they talked a lot about schema-on-read

436.26 -> and how this fits in.

438.51 -> But what you want to do is maximize and get data

441.06 -> in the hands of users as fast as possible

443.49 -> in a secure and compliant way so that they can make

447.09 -> decisions at the speed of their business.

449.49 -> So we'll talk a bit about automating

451.14 -> the top part of the slide.

453.54 -> The bottom part of the slide is once data is in the hand

456.36 -> of consumers, they can manage that data,

458.22 -> they can create new data sets,

459.36 -> they can then become producers on their own.

462.12 -> So if you think about a consumer in that point in time,

464.73 -> they might be using data, accessing,

466.77 -> querying, making decisions.

468.57 -> They don't just make a decision and it goes away.

470.88 -> They want to create something new to share

472.53 -> with their executives, share with their business partners.

475.23 -> That's where they actually become a producer

476.97 -> and they're gonna be data curating, data integration,

480 -> building lineage, securing that data and start

482.85 -> and continuing that life cycle of data throughout.

486.54 -> So how do AWS services help?

491.37 -> I'm gonna talk about three main challenges here.

494.19 -> The first one is how do we automate data integration,

497.19 -> classification and data quality

499.68 -> on the way into your data lake?

501.99 -> The second area is how do we catalog

504.51 -> and make that data more usable?

506.67 -> And the third area is how do we automate sharing

509.97 -> so that users can get access to the data they need

512.55 -> at the right time and as fast as possible?

515.1 -> So let's go a bit deeper into automating ingestion,

519.06 -> or lastly, you wanna make sure that it can publish

521.37 -> those data sets back through that same ingestion pipeline.

528.549 -> So automating data governance on ingestion

532.23 -> is a hard problem, let's talk about why it's a hard problem.

535.44 -> The first thing is data comes in various forms,

538.95 -> it comes streaming, it comes in batch, it comes in CDC,

541.59 -> it comes from SaaS applications,

543.6 -> it comes from different file formats

545.76 -> and it's variable over time.

548.07 -> So automating that process is hard but can be done over time

552.9 -> by building a set of patterns.

554.1 -> And we'll talk about an example pattern in a moment.

556.65 -> The second thing is automation with CICD pipelines.

559.44 -> So to make sure that your enterprise standards are built

562.5 -> into that automation of ingestion.

564.84 -> So when a new pattern comes in,

566.58 -> your CICD pipelines can adapt.

569.31 -> And you want to handle it for many data sources, RDBMS

572.61 -> files, streams, SaaS applications.

576.09 -> The second challenge you want to handle

577.89 -> when you're automating data or data ingestion

581.04 -> is handling inconsistent performance,

583.23 -> reliability and quality.

585.15 -> So as part of these reusable ingestion frameworks

588.33 -> that you might build at your customer sites

590.91 -> is you want to understand how data is stored.

592.71 -> You want to handle inconsistent data formats.

595.56 -> Over time as you see more patterns across

598.35 -> all of your systems, you'll actually understand

600.93 -> that those patterns happen over and over again

604.14 -> across variable systems.

605.28 -> You can build a library of data quality rules,

607.29 -> a library of data profiling statistics.

611.49 -> And the last thing is you wanna standardize

612.99 -> your data quality so that users in one system

615.93 -> understand the data quality rules that were applied.

619.32 -> So let's say you have multiple SAP instances,

621.54 -> there might the same data quality rule can apply

623.91 -> to different instances.

625.38 -> So you get the consistent data use on the way in.

629.16 -> And the last thing is bringing in compliance and regulation.

633.33 -> So as you're monitoring your data quality,

635.64 -> you're running data profiling, you look at the data,

638.49 -> you inspect the data and use things like machine learning

640.98 -> such as Glue as PII detection.

642.93 -> Then there's Macie for example, Comprehend can do PHI,

646.38 -> some PHI detection as well.

648.03 -> How do you automate that as part of a pipeline

650.4 -> and then classify that data on the way in

653.34 -> and carry it through your environment.

656.7 -> Let's talk through a real example

658.62 -> that covers the ingestion piece of it.

661.08 -> I'm gonna cover data classification, data profiling,

663.48 -> data quality securing catalog

665.607 -> and this can all be done with CICD.

667.38 -> I picked a different icon here

668.88 -> for the AWS data ops development kit,

671.07 -> which is an open source project built on top

673.56 -> of the CDK for building data pipelines faster.

679.56 -> And that can help you automate a pattern such as this.

682.05 -> So you see here is the pattern on the left is Salesforce

685.8 -> and on the right you have a ingestion pattern

687.9 -> for SaaS application.

689.58 -> So you can use a product like AppFlow.

691.95 -> AppFlow can push that data to S3

694.02 -> and that's where you can run your data quality rules

696.03 -> as it lands in S3 and as you move it to curated.

699.6 -> As it lands, we're gonna talk a bit about Glue crawlers

703.35 -> and how they remove the heavy listing.

704.82 -> So Glue crawlers will inspect your data,

706.95 -> you can build custom classification rules

709.35 -> within your Glue crawlers to manage your schema.

711.66 -> You populate your data catalog and you propagate

714.15 -> that to Lake Formation to manage auspice controls.

716.97 -> This is just a standard pipeline

718.53 -> and Shihas will talk in more detail

719.85 -> of what happens in real world.

722.25 -> Obviously it's not as simple as this,

723.81 -> but this is just an example of what you build.

725.55 -> So you build this, once you apply it for Salesforce

728.22 -> and as other databases come in, rinse and repeat,

730.98 -> then you end up just replacing that first line

732.96 -> that goes into the box over time.

738.15 -> So let's talk about the second area.

740.1 -> The second area is cataloging data

742.26 -> and making that data more usable.

743.82 -> You wanna make sure that data is findable, searchable,

746.94 -> accessible so people can look at it, understand it,

750.21 -> request access to it, all from a single place.

754.2 -> So how do our services help with that?

756.96 -> I'm gonna talk a bit about crawlers, for example.

759.09 -> There's Glue crawlers that eliminate

761.25 -> the heavy lifting on managing schema.

763.47 -> So now we've ingested that data,

765.51 -> we have data quality built into it,

767.58 -> we have technical catalog.

769.8 -> This is really where we want to extract that information

772.62 -> out of your data and start populating

775.86 -> and enriching your catalog over time.

777.357 -> And you can start with your technical catalog

779.25 -> and we'll talk about the business catalog in a moment.

781.89 -> The first thing is crawlers, discover new data sets

785.04 -> and extract schema definitions.

787.98 -> I know what you're saying, yes, I can do this in code,

789.84 -> yes, I can do this in APIs sure.

791.4 -> There are many ways to handle schema definitions,

795.39 -> but what you're doing here is you're reducing

797.43 -> that heavy lifting so programmers don't have to worry

799.71 -> about catalog management and you just put it on the crawler

803.1 -> and you can let us know if our crawlers

804.54 -> aren't working well enough for you

805.83 -> and we can fix it and improve it.

809.43 -> Crawlers cover a wide array of sources from S3, for example,

813.478 -> DynamoDB, MongoDB, all of our RDS, Aurora for example.

818.73 -> Recently we announced Snowflake last week,

821.16 -> so we can now cross Snowflake and provide that schema

823.86 -> available for you so that you can start understanding

826.98 -> what data's coming into Snowflake over time

829.08 -> and manage that incrementally.

832.95 -> Then there's additional catalogs, so Glue catalog,

835.59 -> you can add catalogs that aren't covered by crawlers.

838.23 -> So there is a bit of a difference there.

840.9 -> Things like CloudTrail, Kafka,

842.43 -> for example with schema registry.

847.05 -> The next thing Glue Catalog do is they use

849.81 -> a built in classifier, so you can do popular data things

853.17 -> such as PII identification, you can write your own

856.11 -> custom classifiers with RoC and that allows you to enrich

859.68 -> your catalog with the technical metadata about what

863.34 -> that information is so you can better

865.14 -> protect that in the long run.

867.18 -> And by adding those patterns over time,

869.4 -> you start building up a library of them.

871.47 -> So going back to our business case,

873.03 -> let's say our business case is customer analytics

875.61 -> and I need to identify PII,

877.59 -> you can then apply that first PII rule

879.84 -> to other data sets that might have customer data in it

883.11 -> and leverage that over time.

885.3 -> The last thing is crawler is run on demand,

887.4 -> they can run on S3 events now and incrementally,

890.55 -> so you can just just have them run an S3 events,

892.65 -> run 'em incrementally and all you have to do is watch

894.75 -> monitoring alarms if they fail for some odd reason.

899.13 -> And the last thing is a couple new releases is AppFlow,

902.4 -> for example, AppFlow is not a crawler feature

904.56 -> but a catalog feature.

905.94 -> You can now populate your data directly from AppFlow

908.79 -> within features with an AppFlow as you're pulling data

912.59 -> in from their connectors.

917.43 -> The next area I'm super excited to talk about is what Adam

920.04 -> talked about this morning, which is Amazon DataZone.

923.67 -> In the catalog space we talked about technical catalog,

925.8 -> but what about business catalog here?

928.08 -> So Amazon DataZone is a new service

930.66 -> that enables customers to discover and share data

933.75 -> across your organizational boundaries, lines of businesses

937.29 -> with built in governance access controls.

940.137 -> And that removes all of the heavy lifting

942.3 -> when it comes to making data available

945.84 -> to everyone in your organization.

948.45 -> It improves the operational efficiency of managing data,

951.48 -> managing access controls, so that data teams focus

954.93 -> on working with data and not worrying about

957.9 -> who has access to what data when.

960.51 -> It's built on top of Lake Formation.

962.34 -> So you can actually extract and present that information

964.86 -> on who has access to what data.

966.81 -> You have centralized logging monitoring and it provides

970.59 -> you that single point without having

972.96 -> to worry about AWS Analytic services.

976.2 -> So super excited about this launch today

977.85 -> comes with four free key features as an organizational wide

981 -> business data catalog and you make that available

984.45 -> for the context of all your users

986.19 -> to find what data you have.

988.41 -> The second area is it has governance and access controls

991.29 -> with built in workflows for that example

993.54 -> we talked about earlier.

994.59 -> So data consumers find their data

996.54 -> within a single user interface.

997.86 -> They request access right from that interface

1000.32 -> and then that system has that workflow for giving

1004.52 -> those consumers access to that data, right?

1006.98 -> So they can use the tools within the simplified analytics

1010.19 -> portal that they've created with you.

1012.41 -> Out-of-the-box, it works directly with Redshift Athena

1015.59 -> for example, and other services that you'll learn about

1018.38 -> more over the upcoming week.

1020.48 -> The last thing is there's a data portal

1022.4 -> for an integrated data experience for users

1024.26 -> to promote their data exploration and drive innovation

1027.68 -> throughout your organization.

1032.63 -> Let's go to the last area which is automating

1034.82 -> sharing of data.

1035.93 -> We've done a couple things now we ingested our data,

1039.32 -> we know the quality, we know the technical metadata,

1041.84 -> we know the classification,

1043.7 -> we've added the business context.

1045.71 -> Now we need to actually share that information

1047.99 -> and share it at scale.

1050.72 -> Customers are adopting many patterns.

1052.94 -> Some customers that I talk to all the time might start

1055.28 -> with a single account central data lake that might work

1058.04 -> for them for the short term or even the long term.

1060.89 -> Over time they might move to a hub and spoke architecture

1063.92 -> where you have a central data lake, for example,

1065.87 -> and then you start adding extensible consumers

1068.6 -> to that information.

1070.49 -> Then there's customers obviously adopting data mesh.

1072.83 -> There are a couple talks on data mesh this week,

1074.72 -> I recommend you see them.

1076.85 -> Super exciting cutting edge about how customers are building

1079.58 -> interoperable data sets and leveraging AWS infrastructure

1083.6 -> to both provide that flexibility,

1086.21 -> but also protect that data at the same time.

1089.12 -> And the last pattern we're seeing

1090.5 -> is business to business data sharing.

1092.6 -> Obviously we've got Amazon Data Exchange,

1094.4 -> but also customers are learning different ways to share data

1097.52 -> and we'll talk about that in a moment.

1102.2 -> So how do we do this?

1103.34 -> The first thing we recommend is looking at

1105.41 -> the Lake Formation permissions model.

1107.45 -> Lake Formation provides database style,

1110.54 -> fine-grain permissions on your resources.

1113.15 -> Your resources are our tables and columns, for example.

1119.06 -> You scale your permissions model through Lake Formation

1122.81 -> tag-based access controls.

1124.4 -> We'll talk about tag-based access controls

1125.9 -> in the next slide.

1127.1 -> But what you're doing is moving from what we say

1130.13 -> resource based access controls, managing your tables based

1133.16 -> on your user community, to tags which are focused more

1137.21 -> of your energy on like what data sets

1138.95 -> they are and how do I scale giving people access to that

1141.92 -> data set based on what they are and what classification

1145.22 -> rules they provide.

1147.17 -> It provides unified Amazon S3 permissions,

1149.51 -> it's integrated with services and tools

1152.66 -> and also third parties are integrating with Lake Formation

1156.56 -> as well, and easy to audit permissions for access.

1161.51 -> Let's go a bit deeper into tag-based access controls.

1167.12 -> Imagine you have hundreds of databases and thousands

1170.84 -> of tables and tens of thousands of users.

1173.9 -> That becomes an awfully terrible matrix of access control

1177.35 -> permissioning that you need to manage

1178.79 -> for role-based access controls.

1181.01 -> So this is why we introduce tag-based access controls.

1183.527 -> And if you are using one of our original ways

1186.56 -> when we launched Lake Formation, it didn't launch

1188.3 -> with tag-based access controls.

1189.62 -> If you are not using it today and you're running into

1192.11 -> scaling challenges, we definitely recommend

1194.54 -> talking to your SAs, talking to us about how you can think

1197.93 -> about moving your access control permissions to tags.

1201.35 -> The first thing you want to do is define your tag ontology.

1204.14 -> So your tag ontology could be things like

1206.48 -> organizational structure or classification rules.

1209.99 -> And the next thing you do is you apply those tags

1213.68 -> to catalog resources, databases, tables.

1218.42 -> The higher you apply the tag,

1219.92 -> the more predominant the access is given, right?

1222.53 -> So you could provide it at the database level, for example

1226.67 -> they're hierarchical nature and if there are conflicts,

1230.24 -> the system resolves those conflicts.

1233.45 -> The last thing is you create those policies on those

1235.55 -> Lake Formation tags for IAM users and roles

1239.09 -> and active directory users and groups using SAML assertions.

1243.2 -> That's how you scale your access permission,

1246.08 -> let's put this in context to your architecture, right?

1250.49 -> So we're seeing customers adopt

1252.41 -> a few different patterns here.

1254.27 -> On the top left you have your business data catalog section.

1257.12 -> I'll simplify it 'cause these systems do a lot more

1258.92 -> than business data catalogs,

1260.12 -> so please don't take this too seriously.

1262.55 -> There's Amazon DataZone which does way more

1264.32 -> than business data catalog, but I just put it in a box.

1266.72 -> So entry point I find my business data

1269.597 -> and I need to request access.

1271.94 -> The next area is third party tools.

1274.19 -> You could be using third party tools to find data,

1276.59 -> request access and understand those data sets.

1279.23 -> Or there's open source solutions like data.all for example,

1282.59 -> that you could leverage.

1283.423 -> So there's a few different options

1284.42 -> we're seeing customers do today.

1287.57 -> The next thing you want to do is regardless of the option

1289.94 -> you're doing, so DataZone works out-of-the-box,

1292.4 -> but if you're working with a third party, work with them,

1294.86 -> I've seen many examples, I've actually blogged about one

1297.29 -> with Informatica about how to automate workflows

1300.95 -> within Informatica to drive your access control permissions

1303.83 -> down to your data catalog.

1308.24 -> Most of the ISVs can do those workflow processes

1310.97 -> and push those access permissions down.

1314.03 -> A lot of them can also read up your Glue data catalog

1316.37 -> to understand your classification rules.

1319.1 -> When you do that, you then push down and make sure

1322.79 -> that you're using Lake Formation for your catalog,

1325.73 -> your Glue data catalog for your catalog information,

1328.19 -> for your permission model, your policy control

1331.55 -> and access down to your data domains.

1333.68 -> It could be your S3 data lake or Redshift for that example.

1336.89 -> So that way when your users are coming in

1338.81 -> through various engines, whether or not it's Amazon, Athena

1342.32 -> or Spark and EMR, it's enforcing those policies

1346.64 -> consistently on how they requested access

1348.86 -> from their business context,

1350.09 -> It provides that end-to-end experience.

1353.84 -> With that, I'm actually gonna, oh, I have one more slide.

1356.42 -> Key takeaways and I'll hand it over to Shihas.

1359.66 -> Key takeaways on enabling agility with data governance

1363.02 -> and AWS is automate ingestion.

1366.05 -> Build up those patterns as part of your business strategy

1369.23 -> and understand how you could automate compliance controls,

1372.95 -> standard data quality rules so that data engineers can move

1376.61 -> at the speed of your business.

1378.35 -> The second thing is automate classifying, cataloging

1381.56 -> and profiling your data on the way in,

1384.98 -> so that your catalog understands the technical information

1387.89 -> and your business catalog starts to understand

1389.81 -> what is the classification rules you need to apply

1392.42 -> when you're giving people access.

1394.46 -> The third area is automate the management

1396.86 -> through tag-based access controls.

1398.87 -> And make sure you can scale through your enterprise

1401.24 -> so that your data is protected throughout its life cycle

1404.84 -> and you're not consistently managing permissions every time.

1408.29 -> And the last thing is automate data sharing

1410.39 -> through one interface,

1411.29 -> whether or not it's through Amazon DataZone or ISV partners

1414.98 -> or open source solutions you build on your own.

1417.44 -> Automate that experience so as user comes in,

1419.9 -> finds the data requests access from a single user interface

1423.44 -> and that they know what they're getting and they know

1425.6 -> their data's protected on the way in.

1427.55 -> So with that, I'm gonna hand it over to Shihas

1430.19 -> to tell his story, thank you.

1432.138 -> (audience applauds)

1441.692 -> - Can you hear me well?

1444.47 -> Right. I'm Shihas Vamanjoor from Prudential Financial.

1448.16 -> My team made a request for me to open this 30 minute

1452.45 -> presentation in their words.

1454.16 -> Here goes, over the next 30 minutes you will be introduced

1460.19 -> to the next generation of data platforms,

1463.67 -> a fully self-creating, self-organizing

1467.21 -> and self-managing platform

1469.37 -> we call the autogenic data platform.

1472.52 -> Folks in today's audience are among the first

1475.43 -> outside of Prudential to learn of this innovation.

1479.06 -> This is the future of data platforms

1481.457 -> and the future has already arrived at Prudential.

1485.54 -> Harpreet and Zane from the data platform team,

1488.18 -> are witness for me having said this at re:Invent.

1492.307 -> (audience murmurs)

1499.16 -> A little bit of background,

1502.64 -> I'm from Prudential Financial Services.

1504.95 -> Prudential is a global leader in financial services.

1507.83 -> We serve both institutional as well as individual customers.

1511.22 -> We are in over 50 countries, 50 million customers,

1515.47 -> 40,000 employees.

1517.16 -> Let's focus on the 40,000 employees for the rest

1519.41 -> of the conversation.

1522.56 -> Me, I'm situated within the Prudential chief data office.

1526.46 -> I'm the Product Owner for Enterprise and Data Platforms.

1530.06 -> The mission of the data platform is a big and bold one.

1534.29 -> Number one on our mission statement is to democratize data

1539.15 -> to increase the value creator base within (indistinct)

1544.4 -> that's the number one thing.

1545.99 -> Other pieces, increase velocity of data innovation,

1550.07 -> reduce cost, reduce risk, so on so forth.

1555.23 -> Now, who does this data platform serve?

1560.87 -> At Prudential in the multiple lines of business,

1563.63 -> both domestic and overseas, we have a large community

1568.25 -> of technical and non-technical users.

1571.55 -> Technical users are data scientists, data engineers,

1576.41 -> data analysts, data stewards,

1578.48 -> business intelligence professionals,

1580.04 -> machine learning engineers, so on, you get the drift.

1582.95 -> Non-technical users, line of business managers,

1586.97 -> business initiative owners, executive.

1590.51 -> What do they do with data?

1592.64 -> They want to exploit data to create business value.

1596.42 -> What are the common challenges?

1598.31 -> On the slide is some of the common challenges.

1600.62 -> These challenges all lead to long time-to-value

1604.34 -> First of them, hard to locate data.

1606.47 -> Why is this hard? Why is this such a challenging problem?

1611 -> Our companies are growing continually.

1614.24 -> Data is also growing continually, exponentially.

1617.24 -> So we have a large number of systems and knowledge

1620.9 -> about these systems is usually tribal.

1623.24 -> That basically means the owner of the system is the one

1625.88 -> that you tap on the shoulder to dig out information

1628.46 -> about this data, time consuming process.

1630.74 -> Now when you start to multiply the number of systems

1633.68 -> that you have to deal with.

1635.03 -> Second, long time-to-access.

1637.55 -> Since there are a large number of systems,

1639.32 -> you got to go through governance hoops.

1641.6 -> You got to go from system one, two, three, four, five,

1644.66 -> bring it to another system,

1646.25 -> another set of apples to deal with.

1648.8 -> Third, lots and lots of human engineering.

1652.28 -> Since we have so many systems in the source site

1656.75 -> and the technology space is constantly, rapidly changing.

1661.04 -> You have a lot of technology requirements

1662.99 -> in terms of understanding this.

1664.67 -> You gotta be a superman engineer to try and extract data

1668.84 -> from all of these systems, bring it to another system.

1671.72 -> Complex governance.

1672.71 -> I talked to this about the number of systems

1674.87 -> and how governance become complex.

1676.55 -> And finally there's a lot of tedious and repetitive work

1679.91 -> before you actually start to do

1681.38 -> what you really want to do, right?

1685.43 -> So when I started this journey of building out

1689.03 -> this enterprise data platform, I went on a listening tour,

1692.78 -> two eyes, two ears, one mouth,

1695.33 -> keep the mouth shut kind of tour.

1697.37 -> So I was listening to what they really wanted.

1700.04 -> Here are the desired experiences.

1703.04 -> One, they said, can you make it simple for me to discover

1708.29 -> data within this entire enterprise architecture

1711.17 -> wherever data lies really don't care?

1713.99 -> Is it possible for you to deliver

1716.6 -> to me a e-commerce like experience?

1719.63 -> I just wanna shop for data products.

1722.72 -> No, this is not a shoe or a watch, it's a data table.

1726.38 -> It's a data view, a file, I wanna shop for it.

1729.98 -> Second, I really don't care what these systems

1735.05 -> that house the data are in.

1736.43 -> I don't wanna be bothered about pulling data

1739.04 -> from system X to system Y,

1741.38 -> data shipping is not a heroic job.

1745.31 -> There's no glamour in it anymore.

1747.26 -> It's very hard to convince people to do that.

1750.86 -> So this no ETL is a no-brainer, let's second.

1754.94 -> Third, why can't you build me a system

1758.27 -> where you optimize my time as a data value creator

1762.38 -> so I can focus on actual value creation

1765.2 -> rather than spending 90% of my time bringing data

1768.14 -> over from arcane systems, cleaning it, massaging it,

1772.04 -> and getting it ready so that I can create value?

1775.67 -> Finally, if this data is in a system

1778.19 -> that I can get access to, how do you make it accessible

1781.1 -> to me through a few mouse clicks?

1783.2 -> Those are the challenges.

1785.81 -> This is from a very initial draft of our concept.

1790.52 -> I said, I hear you.

1792.17 -> So what you're really asking for is the genesis

1795.32 -> of an autogenic data platform,

1797.84 -> a platform that creates itself,

1800.12 -> and you, the value creator are the master,

1805.97 -> not the platform engineering team,

1808.28 -> not a data engineering team,

1809.93 -> but you have the keys to this entire ecosystem.

1814.52 -> How would it work?

1816.11 -> The autogenic data platform has to catalog and discover

1820.25 -> at scale, all the data assets that you have

1822.32 -> within the enterprise, right?

1824.03 -> It's gotta make it available in a marketplace,

1825.92 -> that's number one.

1827.39 -> Once it is in a marketplace, what do you do next?

1831.02 -> You use an order fulfillment request system.

1834.05 -> So you press those buttons,

1835.7 -> make sure the data goes over

1837.11 -> to a data and analytics platform where you can exploit it.

1840.23 -> Number three, you do not want to do manual data governance,

1845.27 -> manual data engineering, none of those things.

1848.12 -> You expect the system to take care of it for you.

1851.72 -> So you bring the data through automated data governance

1854.39 -> data previously through a highly automated

1857.84 -> data processing engine into a zone.

1860.66 -> Finally gimme another set of order fulfillment requests

1864.02 -> in the same marketplace so I can get access to other data.

1867.65 -> This was the ask.

1869.3 -> At this point, I had a few important questions

1872.27 -> that I need to answer.

1873.74 -> Looking at this end state vision, I had to decide,

1878.99 -> number one, which partner would I choose?

1883.85 -> There's a series of services that you need,

1886.67 -> intricate engineering that you need to put together

1889.58 -> and then hide it all away, that's the goal.

1893.21 -> Number two, what style of architecture would I use?

1900.56 -> What method of development would I use for the first?

1905.69 -> It became quite clear to me

1907.46 -> of all the cloud service providers,

1909.35 -> the most mature one with these services was AWS.

1914.87 -> So we made the selection to go with AWS.

1918.89 -> Second, for the partner, we chose AWS professional services.

1922.91 -> They're closest to the technology and this was a design

1925.88 -> that was not trivial.

1928.31 -> Finally, we chose Agile as the method of development.

1931.49 -> I told my team that we would deploy the initial version

1936.32 -> of the autogenic data platform into production

1938.63 -> within three months from which time onward,

1941.24 -> every release would be a sprint, which would be two weeks.

1945.05 -> And the reason we did this is we always found

1947.81 -> that building with the expectation of someone coming

1950.93 -> to use it, never works in practice.

1954.29 -> So that's the method that we chose.

1957.44 -> Fast forward nine months, what do we have?

1960.83 -> We have a completely automated, hyper automated system.

1966.98 -> We have a marketplace.

1968.84 -> We use a front end technology for the marketplace,

1971.54 -> which I'm glad to see, Jason's presentation that Amazon's

1975.68 -> done some wonderful work, which we will reverse engineer.

1979.67 -> We have some front-end technologies for now,

1982.79 -> which take the place of the marketplace,

1985.04 -> which does the order fulfillment.

1987.32 -> We have a Lake House construct,

1991.22 -> a construct of bringing data from whatever pattern,

1995.54 -> whether it is files, whether it's JDBC,

1998.15 -> whether it is bad streaming,

1999.86 -> all the fancy bells and whistles that people usually want

2003.19 -> in a data platform, we have all of that,

2004.81 -> but incrementally we built that over.

2007.27 -> Now what we then went forward and said,

2009.73 -> the true value is in shifting engineering right.

2014.35 -> So we then said, okay, let's look at the key aspects

2017.2 -> of any data and analytics platform.

2020.02 -> Let's do things like data quality.

2023.5 -> Let's do things like data standardization, PII detection,

2028.18 -> change data capture and management.

2029.8 -> All within an automated environment

2032.62 -> that no human being has to write code for.

2035.86 -> That was the goal, so over a period of 12 months,

2039.85 -> we've achieved this goal.

2041.53 -> We are not at the completion of the journey,

2044.14 -> the journey is still a long one,

2046.12 -> but the usage has been significant, right?

2048.01 -> From month three, we've been able to onboard different

2051.67 -> data journey teams to the platform and now we use a process

2055.24 -> of user generated demand to build any new features.

2059.83 -> Now, one would think that this requires an army of engineers

2063.37 -> and Prudential is a global giant.

2065.11 -> You would have access to an army of engineers to build this,

2067.66 -> not true, small core team.

2070.66 -> They're sitting in the first row.

2071.8 -> We got Zane, Harpreet, Samir,

2073.81 -> these are the cloud and data engineers along with ProServe.

2077.62 -> Jason was part of the original ProServe team

2079.78 -> that helped build this out.

2081.88 -> Dozen or so people, under a dozen,

2084.01 -> including management was all it took,

2086.65 -> great effort from the collective at Prudential.

2089.35 -> They provided all the collective muscle

2091.69 -> to make this happen, because this was a common shared goal.

2097.15 -> In terms of architecture, let's get down into a level

2099.97 -> of detail that is necessary for this conversation

2102.7 -> and it ties closely to what AWS and Jason

2105.46 -> have been talking to in the previous presentation.

2109.96 -> Number one, here's a representation of sources,

2114.37 -> logical representation.

2115.54 -> And this is what you are likely to see

2117.79 -> in your own enterprises, it has everything.

2120.22 -> Data center based databases, file systems,

2125.26 -> SaaS applications like Salesforce and ServiceNow, APIs.

2129.73 -> Basically source of first party, second party,

2132.46 -> third party data, wherever it may be.

2135.46 -> The second part of the standard architecture

2138.46 -> is ingestion service layer.

2141.43 -> This is where incrementally, the autogenic data platform

2146.38 -> has features that allow for different types of sources

2150.49 -> to be handled differently but intelligently.

2153.94 -> So you start with flat file data transfers,

2157.69 -> okay, here's AWS data sync.

2160.45 -> But here's the difference, nobody needs to know

2163.72 -> how AWS data sync works.

2166.54 -> That's coded into the automation.

2169.18 -> You provide the order that says, I am a file source.

2175.69 -> The autogenic data platform detects data file sources

2179.77 -> best suited for a data sync transfer program.

2184.75 -> You are a JDBC source, so I'm going to use Glue JDBC,

2189.28 -> oh you are Salesforce, I'm going to use AppFlow.

2195.25 -> So that's the detection mechanism

2197.56 -> that's built into the programming.

2199.72 -> Now as you go rightward, you'll start to see persistent

2204.07 -> layers in the architecture come through.

2206.62 -> Persistent layers are both a lake as well as a house.

2210.82 -> Now up to this point, what you have to imagine is

2214.48 -> there is no human being writing any of this code.

2216.94 -> Not for infrastructure, not for the services,

2219.88 -> not for the pipelines, not for the orchestration.

2222.22 -> All happening by the application.

2224.5 -> The application has taken control.

2227.29 -> So it is self-creating.

2228.64 -> So what happens next is data is now moved

2231.82 -> through this persistent layer from raw to standardized,

2238.87 -> applying the data quality rules, the governance rules.

2244 -> Change data management and change data capture

2246.07 -> is embedded within this pipeline.

2249.85 -> And then the data is now available to the curator.

2252.82 -> But at this time, from a governance standpoint,

2255.04 -> one would thing, hey, is this data all accessible to people?

2258.25 -> No, the process has decoupled access provisioning

2262.33 -> from data movement.

2263.77 -> None of this data is actually accessible.

2266.74 -> Now comes the central governance,

2268.96 -> which is what Lake Formation is for.

2271.39 -> We centralize governance to all the assets

2274.24 -> through Lake Formation, one single point of access

2278.02 -> provisioning to this entire ecosystem.

2280.87 -> Finally, we have all kinds of consumption patterns

2285.76 -> that get access through this Lake Formation

2288.85 -> central governance layer to this entire lake house.

2292 -> Now what I didn't talk is just as important

2295.09 -> as what I just said.

2297.28 -> What we have done thus far

2299.92 -> is hyper automate the journey to a standardized layer.

2303.94 -> The journey of data exploitation does not end there.

2307.54 -> It begins there, but now is the value creation exercise.

2311.83 -> Now you have these highly skilled resources,

2314.14 -> the talented folks who can use this ecosystem

2317.98 -> to refine the data.

2320.32 -> Now I have another innovation project in the works,

2323.14 -> which is trying to machine this curation as well.

2325.93 -> So that regular refinement to create a data product

2330.37 -> is the focus of my data journey teams today

2333.91 -> and this architecture has allowed them to do so.

2337.15 -> Let's unpack this a little bit more.

2341.89 -> Here's a sample user onboarding experience, right?

2347.47 -> This is how users onboard data to our platform today.

2352.69 -> Here's a data athlete, technical user, non-technical user,

2356.23 -> as long as you can use the browser, you're welcome.

2360.37 -> You go scan this marketplace, it looks like Amazon.com

2364.69 -> or whatever your favorite e-commerce site is, right?

2367.6 -> You get a collective that is a bunch of objects

2371.56 -> or a single object, it's your choice.

2374.2 -> You then say,

2375.25 -> I want this object to be refreshed at a given frequency.

2382.48 -> You then said, submit order.

2386.26 -> Internally, the metadata generated by your order

2390.91 -> is the heart of this intelligent platform.

2393.61 -> This uses all the metadata that's been collected so far

2397.75 -> to provide instructions to this auto generating system.

2404.11 -> Auto generation actually creates the infrastructure code,

2407.83 -> that's the first order of business for it.

2410.44 -> If that is an order that is coming in for the first time,

2414.55 -> all the infrastructure necessary is created dynamically

2418.66 -> because the metadata the manifest contains the information

2422.38 -> that is sufficient for you to have a targeted creation

2425.92 -> of both infrastructure and pipeline and orchestration.

2430.21 -> Post that, it then identifies the necessary elements

2434.98 -> to actually execute on this.

2437.86 -> So as an example, when I process an order for a database

2443.98 -> for a file system to be transported,

2447.25 -> the first time creation of the infrastructure takes place,

2450.85 -> subsequent runs of that infrastructure pipeline

2454.39 -> are now available as an order fulfillment request

2457.69 -> that this entire automation takes care of.

2461.29 -> That's how this whole thing works.

2464.29 -> So in a sense, what we've now been able to do

2467.777 -> is to say, for the foreseeable future,

2472.06 -> run this kind of pipeline moving this kind of data

2475.54 -> from these systems to a secondary system

2478.78 -> for data and analytics.

2482.47 -> And the system is responsible for the orchestration,

2486.43 -> the constant redelivery, the remediation, the notification.

2490.54 -> All of that is now built into the system

2492.67 -> with embedded governance, embedded data quality,

2495.76 -> embedded everything.

2498.16 -> That's what hyper automation is all about.

2501.04 -> Now, if you take an example of a user experience for access,

2508.9 -> similar, nothing much different, the toys are different.

2513.85 -> What we use as AWS services are different,

2518.68 -> but the experience for the user is the same.

2521.71 -> He doesn't need to see this.

2523.9 -> Like I said,

2524.733 -> the heroics belong to the data engineering team.

2529.33 -> Vendors will provide all the necessary services,

2533.2 -> it's up to us, as users of those services

2535.84 -> to determine the experiences that we deliver.

2538.66 -> The experiences that we chose to deliver as a platform team

2541.75 -> is an abstracted experience.

2543.52 -> We try to figure out why is it necessary

2545.77 -> that you have to do something.

2547.42 -> The answer usually is you don't need to.

2549.7 -> So if the answer is you don't need to,

2551.32 -> we hide it behind a browser.

2554.41 -> So you can focus on the actual value add.

2557.74 -> Similar experiences you will find in data warehouses.

2560.74 -> The reason why I have this extra slide,

2562.63 -> is to tell you that no matter what kind

2565.21 -> of infrastructure aspect that you have underneath today,

2568.273 -> I have a data lake or data warehouses,

2570.73 -> tomorrow I have a graph database.

2572.89 -> Tomorrow I have another kind of repository,

2575.47 -> doesn't really matter.

2577.45 -> The method is the same. The ideas are the same.

2580.81 -> The way you do it is exactly the same.

2585.67 -> Now, key lessons learned.

2589.04 -> This is from our journey over the last nine months.

2593.02 -> You know it's going to differ from journey to journey.

2595.84 -> Super important to have a clear vision of end-state

2599.8 -> and this is important because there's too many shiny tools

2602.74 -> in the market, too many distractions.

2605.41 -> You're going to hear about

2607.15 -> do this with domain driven design.

2609.91 -> Do this with application oriented design.

2614.14 -> Yeah, those are important, relevant topics,

2617.59 -> take away from the real focus,

2619.27 -> which is you stay focused on your users.

2621.94 -> Who are they? In this case, they're internal.

2624.61 -> According to our CEO, two classification of employees,

2628.21 -> colleagues who help customers directly,

2630.58 -> colleagues who help colleagues who help customers.

2634.57 -> My customer is internal.

2636.31 -> The product that they build

2637.63 -> will be used by external customers.

2640.21 -> I've got to stay focused on them.

2642.46 -> So I gotta do everything that helps them do things

2645.49 -> at a faster clip, at a lower energy expenditure level.

2651.1 -> That's my focus, so stay focused on your user.

2653.98 -> It's super easy to get lost in this jungle of innovation

2658.09 -> that every vendor is out there peddling.

2661.24 -> Second, automate governance at every step.

2665.53 -> You will hear that governance is a beast, hard to automate.

2672.13 -> Tackle that head on. Ask what exactly is hard to automate?

2677.23 -> Data quality routines? No.

2679.81 -> Configurable, you apply a rule set.

2684.16 -> What else is hard to do, data ownership? Nope.

2687.46 -> Embedded at an object level like we have

2689.65 -> every asset within our ecosystem, every table,

2693.73 -> has a data owner embedded in it.

2696.1 -> What does that do? Helps you approve things quickly.

2699.94 -> Helps you understand ownership models quickly.

2702.55 -> So question anything that takes away from hyper automation.

2707.17 -> Third, this is very important as well.

2709.63 -> You need to build a small core talented team.

2714.25 -> They need to do two things.

2716.32 -> Number one, they're responsible for feature development.

2720.46 -> This feature development is driven by user demand.

2724.18 -> You don't try to build for the future

2725.83 -> without knowing your actual user demand,

2727.9 -> so you gotta be super close to the user.

2729.58 -> At the same time, while they build these shiny new features,

2732.67 -> it's also super important that they support users

2734.86 -> who are currently using the platform.

2736.33 -> So you got to have two parts to the team.

2739.12 -> One that keeps the light on, keeps remediating,

2741.64 -> keeps fixing, the other is to build out the new feature.

2747.58 -> The other tried and tested value

2751.45 -> in trying, value and failing.

2754.72 -> There's a lot of lip service in enterprises

2758.14 -> about fail fast, be brave.

2761.59 -> But when it comes to impacting your timelines,

2764.32 -> your deliveries, it's not looked upon kindly.

2766.96 -> In our experiences, there is no escaping that fact,

2770.59 -> you simply accept it, this is the cost of doing business.

2774.43 -> You're going to try with numerous AWS services

2777.55 -> or other cloud services for that matter.

2779.59 -> You're going to try combination of things.

2781.81 -> There are certain things that you don't want to invent.

2784.51 -> You just don't want to invent authentication,

2786.79 -> you don't want to invent authorization,

2789.04 -> you don't want to invent encryption, lots of things.

2792.46 -> You want to take services that have already been created.

2797.35 -> But how you glue them together,

2799.57 -> you'll have to test it yourself.

2802.39 -> Now the reason why I bring this forward is, I posit,

2806.26 -> that you have to create an engineering team for yourself.

2810.79 -> You have to treat this as an application.

2813.43 -> Historically, data platforms

2815.38 -> have not been treated as applications.

2817.9 -> They usually have been approached as a data product

2821.98 -> or a data platform product by itself with some integrations.

2827.14 -> The change here is, we talk about an autogenic data platform

2830.98 -> being an application and here's a product owner

2833.74 -> and here's a development team for that application alone.

2836.5 -> No different than any software product.

2839.56 -> So this is a software engineering team

2842.2 -> internally servicing it for the enterprise customers.

2846.79 -> There is a super important construct too, remember.

2852.52 -> Now in terms of outcomes, everybody loves outcomes,

2855.727 -> and everybody has these OKRs that measure

2862.03 -> the value of what you have done.

2864.52 -> For us, it's quite clear that this is the winner.

2869.71 -> The increased talent pool, what does that talk to?

2873.25 -> That talks to now, us not having this class system.

2878.92 -> Oh, you don't know AWS Glue, do you?

2882.04 -> You can't participate in our ecosystem.

2884.11 -> No, we don't have that conversation anymore.

2887.53 -> You can use a browser? You're welcome.

2889.99 -> What more do you need in order to be successful?

2892.6 -> You need that browser to have SQL embedded, a query engine?

2895.99 -> Okay, the platform teams hear that, to support you.

2899.71 -> You need Excel embedded within the same data portal?

2902.47 -> So be it, we deliver to you.

2905.419 -> We don't levy a tax, a learning tax.

2909.49 -> Historically, platforms which have not been created

2912.85 -> through this vision apply these taxes.

2915.52 -> There's a regime of taxes everywhere.

2917.8 -> You pay repeat taxes every day in your work life.

2921.85 -> The mission of this platform is to eliminate those taxes.

2927.22 -> Second, time savings.

2930.1 -> Our original goal was to shift right

2932.92 -> as much of human engineering as possible.

2936.43 -> Current state of the platform has shifted.

2939.4 -> The first point where human engineering

2941.35 -> starts to be involved is when you have to curate

2944.47 -> or refine the data.

2945.97 -> So you've got the taxes all the way to that standard layer

2949.18 -> of your lake house architecture

2950.86 -> through this autogenic data platform.

2954.61 -> Third, data access.

2957.46 -> That's been cut from days to minutes now,

2959.8 -> it's within the application,

2960.88 -> it's within the same data marketplace.

2962.53 -> It's a application that is governed, it is auditable,

2966.49 -> it tells you who granted access to who and for what data.

2970.15 -> All of that information is present within the tool itself.

2974.56 -> Cost savings.

2976.39 -> When you release such an application

2978.7 -> to the enterprise, you decrease the appetite

2981.88 -> for building bespoke data solutions

2984.61 -> because now you have to compare it against something

2987.88 -> that has so many features and works so well.

2992.601 -> Success is super important for that reason that it decreases

2996.1 -> the appetite for folks to try this on their own.

2998.35 -> There's no harm in trying things on their own.

3000.03 -> But then you would see the engineering that is necessary

3003.27 -> to create a holistic integrated system

3005.85 -> that's super important to realize.

3011.16 -> Now, in terms of governance, this was a governance topic,

3015.63 -> so I want to talk to a certain aspect of governance as well.

3019.8 -> Trust in data gets better when human beings

3024.15 -> don't finger the data.

3026.76 -> How far right can you move that? That's the question.

3031.62 -> How can you report on the hops of data

3035.22 -> that's taking through an automation?

3038.61 -> How much of this quality stuff can you move

3041.4 -> towards the left of the equation

3042.81 -> which is the source of data?

3043.977 -> And how much of it can be subject to the automation?

3046.71 -> So it improves trust in data

3049.2 -> because you have not touched the data

3051.12 -> up to a certain aspect.

3052.47 -> Now you can also focus your energies

3055.29 -> as a data governance organization on the actual place

3059.37 -> where data is changed, which is the refinement zones.

3063.87 -> If our innovation is successful,

3065.49 -> we'll move it even more to the right.

3067.89 -> But for now there is a narrow focused area

3070.83 -> which is either on the left side of the entire

3074.01 -> architecture within the source

3075.75 -> or, on the right side within the curation zones

3078.6 -> where you can now focus on, your focus is not spread.

3083.01 -> And lastly, because of the popularity of the platform,

3087.18 -> we are seeing reduced data sprawl.

3090.15 -> We are seeing more and more multi-tenant teams,

3093.6 -> data journey teams, come and use the platform,

3097.11 -> it's also integrating our data.

3098.79 -> So it's subjective whether that was one of the primary views

3102.45 -> because we still live in a very federated business model.

3105 -> But the goal for us is to reduce the data silos.

3109.77 -> I know I've said a lot for additional thinking

3115.92 -> around this topic, also published a Medium

3119.28 -> and LinkedIn article,

3120.39 -> under the header Autogenic Data Platforms.

3123.3 -> Feel free to hit me at my social media handle at LinkedIn,

3127.92 -> happy to collaborate.

3129.36 -> At this point, I think Jason and me are happy

3132.21 -> to take questions.

3134.444 -> - [Jason] Two more slides. Yeah, two more slides.

3137.464 -> - Sure. - Sure.

3138.87 -> - So just two more slides. How do I get started?

3144.27 -> There's three programs that AWS offers

3147.24 -> to help you get started.

3149.07 -> The first one is if you want to build a data strategy,

3152.31 -> we have our Data-Driven Everything program.

3154.56 -> Feel free to reach out to your SAs or your account team

3157.95 -> to understand more about that program.

3160.14 -> The second one is Data Labs,

3161.64 -> is have that strategy and help executing it.

3165.15 -> That would be our data lab team.

3166.41 -> And the third one is if you need help with implementation,

3169.41 -> there's obviously ProServe

3170.45 -> in our great partner community as well.

3172.65 -> So those are our first ways to help you get started

3174.87 -> with data governance.

3176.34 -> We do have a new workshop that both our our D2E,

3180.45 -> our Data-Driven Everything team and our ProServe team

3183.42 -> can help you execute, it helps you understand

3185.43 -> where you're at with data governance.

3187.11 -> So please reach out to your account team to understand

3190.2 -> where you're at in your journey with data governance

3192.03 -> and they're happy to help you with that.

3193.83 -> And the last thing is getting started.

3196.2 -> Think big, use that Discovery Workshop and data governance.

3199.77 -> Use things like Data-Driven Everything.

3202.05 -> You want to think big, but you wanna start small.

3204.72 -> How does this apply to your business strategy?

3206.97 -> How do you use tools such as Data Labs

3209.43 -> and ProServe through POCs?

3211.71 -> And then how do you scale fast through partners in Proserve?

3214.83 -> So thank you, we're happy to take a couple questions.

3218.73 -> We have a few minutes if people have questions.

Source: https://www.youtube.com/watch?v=vznDgJkoH7k