AWS re:Invent 2022 - Enabling agility with data governance on AWS (ANT204)
AWS re:Invent 2022 - Enabling agility with data governance on AWS (ANT204)
Data governance is the process of managing data throughout an end-to-end process, ensuring its accuracy and completeness and making sure it is accessible to those who need it. Join this session to learn how AWS is delivering comprehensive data governance— from data preparation and integration to data access, data quality, and metadata management—across analytics services.
ABOUT AWS Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.
AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.
#reInvent2022 #AWSreInvent2022 #AWSEvents
Content
0.023 -> - Welcome to Enabling Agility
with Data Governance on AWS.
4.17 -> Very excited to talk to
you about data governance.
6.27 -> I don't think I've said that before,
8.34 -> but today it's gonna be a good talk
9.75 -> between myself and Shihas
from Prudential, so thank you.
13.35 -> I run our Lake Formation product
14.97 -> as well as the Glue data
catalog here at AWS.
17.742 -> I've been with Amazon
for about eight years.
19.71 -> I was running our analytics practice
21.18 -> in professional services
for most of that time.
24.27 -> We're helping customers build data lakes,
26.79 -> help them migrate their
data platforms to AWS
29.37 -> and become data driven.
31.32 -> Let's go see what we're
gonna talk about today.
34.08 -> The first thing we're gonna talk about
36 -> is how does data governance
help you become data driven?
40.17 -> The second thing we're gonna talk about
41.37 -> is data governance patterns
within AWS analytic services.
45.6 -> And then Prudential Financial
will come up and talk about
47.91 -> their data journey and how
they actually did it for real
50.34 -> versus me just talking about
some PowerPoint examples.
54.99 -> So let's start first
with data driven themes
57.99 -> we hear from customers and I like to start
60.96 -> with this about data governance
63.3 -> because it's so critical hand in hand.
66.36 -> The first thing you wanna
learn is the top part of it
69.777 -> and the bottom part of
it are coupled in a way.
72.39 -> But the top part is more
about business context
75.84 -> and the bottom part is more
about people process technology.
79.23 -> So what customers tell us
they want to be data driven,
81.54 -> they struggle with the few
areas and these are consistent
84.06 -> themes across many customers we talked to.
86.85 -> The first thing is understanding
what great looks like.
89.37 -> And this isn't about
just building a roadmap
91.68 -> or understanding what I'm gonna do next.
93.15 -> It's actually what is great
94.29 -> and what will delight your customer.
96.63 -> At Amazon, we talk about working backwards
98.73 -> or writing PR, FAQs to
go through that process
101.25 -> and really articulate what is great.
103.23 -> And that's your first
start in your journey
105.27 -> with data governance as well,
106.32 -> and we'll tie these together in a moment.
108.3 -> Once you know what's great
is, looks like you prioritize
111 -> those use cases and you create sponsorship
112.86 -> just like you would with
any other initiative.
115.71 -> On the bottom row, is when
it fits into data governance.
119.73 -> As you start building these use cases
121.56 -> to solve business problems,
you integrate things
124.89 -> like data driven culture,
data literacy programs,
127.8 -> understanding data definitions,
you focus on gaps of skills.
131.37 -> Maybe you need to introduce a new tool
132.99 -> that users need to understand.
134.85 -> And the last thing is
security's number one,
137.19 -> if we're gonna drive business
agility with data governance,
140.79 -> we need to make sure
your data is protected.
142.71 -> Secure compliance controls are built
145.59 -> because your customer and users or data
148.35 -> need to make decisions at
the speed of their business.
150.99 -> And without protecting
data at that same speed,
154.08 -> you could create additional business risk.
158.01 -> So some other areas that we've seen
162.36 -> other publications talk
about is, for example,
165.15 -> Forbes gave some statistics,
167.01 -> 85% of businesses want to be data driven,
170.37 -> but yet 37% have really struggled with it.
173.19 -> And why have they struggled?
174.48 -> And you see some quotes around from IDC.
177.21 -> Data governance is a core struggling area
179.43 -> for becoming data driven.
180.93 -> So it's no longer
optional is what IDC says.
184.23 -> For enterprise organization
to get the most value
186.93 -> of their data, to make decisions,
188.91 -> they really need to
treat data as an asset.
191.85 -> The second part is
organizations lack knowledge
195.45 -> of efficient and effective
data governance activities.
198.21 -> And 30% of the time is wasted
doing data governance things,
201.87 -> data definitions, data
management, data security.
205.26 -> And these are areas you
can definitely automate
207.24 -> that we're gonna talk to in a few minutes.
210.24 -> Let's put some definition around this.
212.49 -> This is how we define data
governance here at Amazon.
215.16 -> There are thousands of
definitions of data governance,
217.65 -> so I'd love to hear yours as well,
219.39 -> but this is how we define it,
220.807 -> "Data governance is the
collection of policies,
225.24 -> processes and systems
that organizations use
228.57 -> to ensure that quality
and appropriate handling
231.06 -> of their data throughout its lifecycle
234.54 -> for the purpose of
generating business value."
237.96 -> And to me what really sticks
out the most is a few things,
241.08 -> one of which is the last line,
generating business value.
244.62 -> We do not want to do any
work in data governance
247.5 -> without a business value associated with.
249.9 -> And let's go a little
bit deeper into that.
253.44 -> So data governance starts with business.
258 -> The first thing you want to do
259.29 -> is tie it to your business strategy.
261.72 -> And I'll tell a bit of a story.
263.64 -> So I'd say like 15 years
ago I got very excited,
266.19 -> I was managing a customer data system,
269.07 -> it was probably pre-sales
force at the time,
271.29 -> and I'm like, our data's really bad,
273.36 -> I need a data quality tool.
275.61 -> And I went up to my management, I'm like
276.96 -> we're gonna go buy a data quality tool,
278.34 -> it's gonna be X hundreds
of thousands of dollars
280.05 -> and we're gonna solve our
customer data problem.
282.69 -> It wasn't funded and I
didn't know what to do.
285.72 -> I'm like, I can't understand
why this wasn't funded.
287.64 -> I wasn't really thinking about it.
289.62 -> We didn't have a customer
analytics problem to solve
292.47 -> at the time, you'd think
that most customers
294.48 -> had that problem, but we didn't have that.
297.36 -> We knew our customers, our business
299.4 -> wasn't the biggest company at
the time I was dealing with,
301.83 -> our business people knew our
customers really, really well
304.89 -> and they weren't going to
invest in a data quality tool.
307.5 -> So what's important here is understand
309.63 -> your business challenges and
start your data governance
312.72 -> journey with that.
314.52 -> Once you know those business challenges,
316.92 -> you have data consumers.
318.96 -> Data consumers are those
that need to use data
321.42 -> to make decisions.
322.74 -> So those are the folks that
will understand what data
325.17 -> they're gonna ask for, what
data they're gonna look for,
326.91 -> and what metrics they need
to solve business problems.
330 -> They're gonna choose to
inform data producers.
332.91 -> So data producers could be
data owners, system owners,
337.17 -> data lake admins, here
is the data I need to do
340.47 -> to solve my problem that ties
to the business strategy.
342.6 -> How can you help me?
344.16 -> And that's where data
governance really kicks in.
347.85 -> So this is our data
governance, I guess bracelet,
352.41 -> charm slide that we talk through.
355.53 -> And there's two areas, there's
the top portion of the slide
358.41 -> and the bottom portion of the slide.
360.66 -> So we've been talking about
schema-on-read for a while
365.16 -> here in this industry.
366.51 -> Schema-on-read's an interesting concept
367.95 -> 'cause you definitely
wanna just put the data in
369.45 -> and people can get value really quickly.
371.4 -> But does that really help a
data user make decisions faster?
376.05 -> What you wanna do here is the
top part as a data consumer
379.59 -> is asking a data producer
for more information.
383.07 -> Those producers should
ingest that data quickly.
386.4 -> They need to automate classification,
388.32 -> automate data profiling,
automate data quality
391.11 -> and secure and encrypt that
data before they give access.
394.35 -> And that brings that real understanding
396.09 -> of what that data is, so that
users can quickly be provided
400.02 -> access to that information.
402.03 -> For example, data classification.
403.53 -> What if your data had PII
and the producer of data
406.65 -> didn't really know what their
role is in the organization
409.59 -> to protect compliance controls or GDPR?
413.4 -> Automate data classification
on the way in.
416.34 -> The last thing is catalog, so
before giving access to data,
419.94 -> you want to catalog your information,
421.77 -> bring that technical and
business context to your data
425.1 -> and then give users access
to that to make decisions.
428.79 -> I was at a talk at
Amazon a couple years now
431.28 -> and they talked a bit
about the value of data
433.86 -> and they talked a lot about schema-on-read
436.26 -> and how this fits in.
438.51 -> But what you want to do
is maximize and get data
441.06 -> in the hands of users as fast as possible
443.49 -> in a secure and compliant
way so that they can make
447.09 -> decisions at the speed of their business.
449.49 -> So we'll talk a bit about automating
451.14 -> the top part of the slide.
453.54 -> The bottom part of the slide
is once data is in the hand
456.36 -> of consumers, they can manage that data,
458.22 -> they can create new data sets,
459.36 -> they can then become
producers on their own.
462.12 -> So if you think about a
consumer in that point in time,
464.73 -> they might be using data, accessing,
466.77 -> querying, making decisions.
468.57 -> They don't just make a
decision and it goes away.
470.88 -> They want to create something new to share
472.53 -> with their executives, share
with their business partners.
475.23 -> That's where they
actually become a producer
476.97 -> and they're gonna be data
curating, data integration,
480 -> building lineage, securing
that data and start
482.85 -> and continuing that life
cycle of data throughout.
486.54 -> So how do AWS services help?
491.37 -> I'm gonna talk about three
main challenges here.
494.19 -> The first one is how do we
automate data integration,
497.19 -> classification and data quality
499.68 -> on the way into your data lake?
501.99 -> The second area is how do we catalog
504.51 -> and make that data more usable?
506.67 -> And the third area is how
do we automate sharing
509.97 -> so that users can get
access to the data they need
512.55 -> at the right time and as fast as possible?
515.1 -> So let's go a bit deeper
into automating ingestion,
519.06 -> or lastly, you wanna make
sure that it can publish
521.37 -> those data sets back through
that same ingestion pipeline.
528.549 -> So automating data governance on ingestion
532.23 -> is a hard problem, let's talk
about why it's a hard problem.
535.44 -> The first thing is data
comes in various forms,
538.95 -> it comes streaming, it comes
in batch, it comes in CDC,
541.59 -> it comes from SaaS applications,
543.6 -> it comes from different file formats
545.76 -> and it's variable over time.
548.07 -> So automating that process is
hard but can be done over time
552.9 -> by building a set of patterns.
554.1 -> And we'll talk about an
example pattern in a moment.
556.65 -> The second thing is automation
with CICD pipelines.
559.44 -> So to make sure that your
enterprise standards are built
562.5 -> into that automation of ingestion.
564.84 -> So when a new pattern comes in,
566.58 -> your CICD pipelines can adapt.
569.31 -> And you want to handle it
for many data sources, RDBMS
572.61 -> files, streams, SaaS applications.
576.09 -> The second challenge you want to handle
577.89 -> when you're automating
data or data ingestion
581.04 -> is handling inconsistent performance,
583.23 -> reliability and quality.
585.15 -> So as part of these reusable
ingestion frameworks
588.33 -> that you might build
at your customer sites
590.91 -> is you want to understand
how data is stored.
592.71 -> You want to handle
inconsistent data formats.
595.56 -> Over time as you see more patterns across
598.35 -> all of your systems,
you'll actually understand
600.93 -> that those patterns
happen over and over again
604.14 -> across variable systems.
605.28 -> You can build a library
of data quality rules,
607.29 -> a library of data profiling statistics.
611.49 -> And the last thing is
you wanna standardize
612.99 -> your data quality so
that users in one system
615.93 -> understand the data quality
rules that were applied.
619.32 -> So let's say you have
multiple SAP instances,
621.54 -> there might the same data
quality rule can apply
623.91 -> to different instances.
625.38 -> So you get the consistent
data use on the way in.
629.16 -> And the last thing is bringing
in compliance and regulation.
633.33 -> So as you're monitoring your data quality,
635.64 -> you're running data profiling,
you look at the data,
638.49 -> you inspect the data and use
things like machine learning
640.98 -> such as Glue as PII detection.
642.93 -> Then there's Macie for
example, Comprehend can do PHI,
646.38 -> some PHI detection as well.
648.03 -> How do you automate that
as part of a pipeline
650.4 -> and then classify that data on the way in
653.34 -> and carry it through your environment.
656.7 -> Let's talk through a real example
658.62 -> that covers the ingestion piece of it.
661.08 -> I'm gonna cover data
classification, data profiling,
663.48 -> data quality securing catalog
665.607 -> and this can all be done with CICD.
667.38 -> I picked a different icon here
668.88 -> for the AWS data ops development kit,
671.07 -> which is an open source
project built on top
673.56 -> of the CDK for building
data pipelines faster.
679.56 -> And that can help you automate
a pattern such as this.
682.05 -> So you see here is the pattern
on the left is Salesforce
685.8 -> and on the right you
have a ingestion pattern
687.9 -> for SaaS application.
689.58 -> So you can use a product like AppFlow.
691.95 -> AppFlow can push that data to S3
694.02 -> and that's where you can
run your data quality rules
696.03 -> as it lands in S3 and as
you move it to curated.
699.6 -> As it lands, we're gonna talk
a bit about Glue crawlers
703.35 -> and how they remove the heavy listing.
704.82 -> So Glue crawlers will inspect your data,
706.95 -> you can build custom classification rules
709.35 -> within your Glue crawlers
to manage your schema.
711.66 -> You populate your data
catalog and you propagate
714.15 -> that to Lake Formation to
manage auspice controls.
716.97 -> This is just a standard pipeline
718.53 -> and Shihas will talk in more detail
719.85 -> of what happens in real world.
722.25 -> Obviously it's not as simple as this,
723.81 -> but this is just an
example of what you build.
725.55 -> So you build this, once
you apply it for Salesforce
728.22 -> and as other databases
come in, rinse and repeat,
730.98 -> then you end up just
replacing that first line
732.96 -> that goes into the box over time.
738.15 -> So let's talk about the second area.
740.1 -> The second area is cataloging data
742.26 -> and making that data more usable.
743.82 -> You wanna make sure that
data is findable, searchable,
746.94 -> accessible so people can
look at it, understand it,
750.21 -> request access to it,
all from a single place.
754.2 -> So how do our services help with that?
756.96 -> I'm gonna talk a bit about
crawlers, for example.
759.09 -> There's Glue crawlers that eliminate
761.25 -> the heavy lifting on managing schema.
763.47 -> So now we've ingested that data,
765.51 -> we have data quality built into it,
767.58 -> we have technical catalog.
769.8 -> This is really where we want
to extract that information
772.62 -> out of your data and start populating
775.86 -> and enriching your catalog over time.
777.357 -> And you can start with
your technical catalog
779.25 -> and we'll talk about the
business catalog in a moment.
781.89 -> The first thing is crawlers,
discover new data sets
785.04 -> and extract schema definitions.
787.98 -> I know what you're saying,
yes, I can do this in code,
789.84 -> yes, I can do this in APIs sure.
791.4 -> There are many ways to
handle schema definitions,
795.39 -> but what you're doing
here is you're reducing
797.43 -> that heavy lifting so
programmers don't have to worry
799.71 -> about catalog management and
you just put it on the crawler
803.1 -> and you can let us know if our crawlers
804.54 -> aren't working well enough for you
805.83 -> and we can fix it and improve it.
809.43 -> Crawlers cover a wide array of
sources from S3, for example,
813.478 -> DynamoDB, MongoDB, all of
our RDS, Aurora for example.
818.73 -> Recently we announced Snowflake last week,
821.16 -> so we can now cross Snowflake
and provide that schema
823.86 -> available for you so that
you can start understanding
826.98 -> what data's coming into
Snowflake over time
829.08 -> and manage that incrementally.
832.95 -> Then there's additional
catalogs, so Glue catalog,
835.59 -> you can add catalogs that
aren't covered by crawlers.
838.23 -> So there is a bit of a difference there.
840.9 -> Things like CloudTrail, Kafka,
842.43 -> for example with schema registry.
847.05 -> The next thing Glue Catalog do is they use
849.81 -> a built in classifier, so you
can do popular data things
853.17 -> such as PII identification,
you can write your own
856.11 -> custom classifiers with RoC
and that allows you to enrich
859.68 -> your catalog with the
technical metadata about what
863.34 -> that information is so you can better
865.14 -> protect that in the long run.
867.18 -> And by adding those patterns over time,
869.4 -> you start building up a library of them.
871.47 -> So going back to our business case,
873.03 -> let's say our business
case is customer analytics
875.61 -> and I need to identify PII,
877.59 -> you can then apply that first PII rule
879.84 -> to other data sets that might
have customer data in it
883.11 -> and leverage that over time.
885.3 -> The last thing is
crawler is run on demand,
887.4 -> they can run on S3 events
now and incrementally,
890.55 -> so you can just just have
them run an S3 events,
892.65 -> run 'em incrementally and
all you have to do is watch
894.75 -> monitoring alarms if they
fail for some odd reason.
899.13 -> And the last thing is a couple
new releases is AppFlow,
902.4 -> for example, AppFlow is
not a crawler feature
904.56 -> but a catalog feature.
905.94 -> You can now populate your
data directly from AppFlow
908.79 -> within features with an
AppFlow as you're pulling data
912.59 -> in from their connectors.
917.43 -> The next area I'm super excited
to talk about is what Adam
920.04 -> talked about this morning,
which is Amazon DataZone.
923.67 -> In the catalog space we talked
about technical catalog,
925.8 -> but what about business catalog here?
928.08 -> So Amazon DataZone is a new service
930.66 -> that enables customers to
discover and share data
933.75 -> across your organizational
boundaries, lines of businesses
937.29 -> with built in governance access controls.
940.137 -> And that removes all of the heavy lifting
942.3 -> when it comes to making data available
945.84 -> to everyone in your organization.
948.45 -> It improves the operational
efficiency of managing data,
951.48 -> managing access controls,
so that data teams focus
954.93 -> on working with data
and not worrying about
957.9 -> who has access to what data when.
960.51 -> It's built on top of Lake Formation.
962.34 -> So you can actually extract
and present that information
964.86 -> on who has access to what data.
966.81 -> You have centralized logging
monitoring and it provides
970.59 -> you that single point without having
972.96 -> to worry about AWS Analytic services.
976.2 -> So super excited about this launch today
977.85 -> comes with four free key features
as an organizational wide
981 -> business data catalog and
you make that available
984.45 -> for the context of all your users
986.19 -> to find what data you have.
988.41 -> The second area is it has
governance and access controls
991.29 -> with built in workflows for that example
993.54 -> we talked about earlier.
994.59 -> So data consumers find their data
996.54 -> within a single user interface.
997.86 -> They request access
right from that interface
1000.32 -> and then that system has
that workflow for giving
1004.52 -> those consumers access
to that data, right?
1006.98 -> So they can use the tools
within the simplified analytics
1010.19 -> portal that they've created with you.
1012.41 -> Out-of-the-box, it works
directly with Redshift Athena
1015.59 -> for example, and other services
that you'll learn about
1018.38 -> more over the upcoming week.
1020.48 -> The last thing is there's a data portal
1022.4 -> for an integrated data
experience for users
1024.26 -> to promote their data
exploration and drive innovation
1027.68 -> throughout your organization.
1032.63 -> Let's go to the last
area which is automating
1034.82 -> sharing of data.
1035.93 -> We've done a couple things
now we ingested our data,
1039.32 -> we know the quality, we
know the technical metadata,
1041.84 -> we know the classification,
1043.7 -> we've added the business context.
1045.71 -> Now we need to actually
share that information
1047.99 -> and share it at scale.
1050.72 -> Customers are adopting many patterns.
1052.94 -> Some customers that I talk
to all the time might start
1055.28 -> with a single account central
data lake that might work
1058.04 -> for them for the short
term or even the long term.
1060.89 -> Over time they might move to
a hub and spoke architecture
1063.92 -> where you have a central
data lake, for example,
1065.87 -> and then you start adding
extensible consumers
1068.6 -> to that information.
1070.49 -> Then there's customers
obviously adopting data mesh.
1072.83 -> There are a couple talks
on data mesh this week,
1074.72 -> I recommend you see them.
1076.85 -> Super exciting cutting edge
about how customers are building
1079.58 -> interoperable data sets and
leveraging AWS infrastructure
1083.6 -> to both provide that flexibility,
1086.21 -> but also protect that
data at the same time.
1089.12 -> And the last pattern we're seeing
1090.5 -> is business to business data sharing.
1092.6 -> Obviously we've got Amazon Data Exchange,
1094.4 -> but also customers are learning
different ways to share data
1097.52 -> and we'll talk about that in a moment.
1102.2 -> So how do we do this?
1103.34 -> The first thing we recommend is looking at
1105.41 -> the Lake Formation permissions model.
1107.45 -> Lake Formation provides database style,
1110.54 -> fine-grain permissions on your resources.
1113.15 -> Your resources are our tables
and columns, for example.
1119.06 -> You scale your permissions
model through Lake Formation
1122.81 -> tag-based access controls.
1124.4 -> We'll talk about tag-based access controls
1125.9 -> in the next slide.
1127.1 -> But what you're doing is
moving from what we say
1130.13 -> resource based access controls,
managing your tables based
1133.16 -> on your user community, to
tags which are focused more
1137.21 -> of your energy on like what data sets
1138.95 -> they are and how do I scale
giving people access to that
1141.92 -> data set based on what they
are and what classification
1145.22 -> rules they provide.
1147.17 -> It provides unified Amazon S3 permissions,
1149.51 -> it's integrated with services and tools
1152.66 -> and also third parties are
integrating with Lake Formation
1156.56 -> as well, and easy to audit
permissions for access.
1161.51 -> Let's go a bit deeper into
tag-based access controls.
1167.12 -> Imagine you have hundreds
of databases and thousands
1170.84 -> of tables and tens of thousands of users.
1173.9 -> That becomes an awfully terrible
matrix of access control
1177.35 -> permissioning that you need to manage
1178.79 -> for role-based access controls.
1181.01 -> So this is why we introduce
tag-based access controls.
1183.527 -> And if you are using
one of our original ways
1186.56 -> when we launched Lake
Formation, it didn't launch
1188.3 -> with tag-based access controls.
1189.62 -> If you are not using it
today and you're running into
1192.11 -> scaling challenges, we
definitely recommend
1194.54 -> talking to your SAs, talking
to us about how you can think
1197.93 -> about moving your access
control permissions to tags.
1201.35 -> The first thing you want to do
is define your tag ontology.
1204.14 -> So your tag ontology could be things like
1206.48 -> organizational structure
or classification rules.
1209.99 -> And the next thing you do
is you apply those tags
1213.68 -> to catalog resources, databases, tables.
1218.42 -> The higher you apply the tag,
1219.92 -> the more predominant the
access is given, right?
1222.53 -> So you could provide it at the
database level, for example
1226.67 -> they're hierarchical nature
and if there are conflicts,
1230.24 -> the system resolves those conflicts.
1233.45 -> The last thing is you create
those policies on those
1235.55 -> Lake Formation tags
for IAM users and roles
1239.09 -> and active directory users and
groups using SAML assertions.
1243.2 -> That's how you scale
your access permission,
1246.08 -> let's put this in context
to your architecture, right?
1250.49 -> So we're seeing customers adopt
1252.41 -> a few different patterns here.
1254.27 -> On the top left you have your
business data catalog section.
1257.12 -> I'll simplify it 'cause
these systems do a lot more
1258.92 -> than business data catalogs,
1260.12 -> so please don't take this too seriously.
1262.55 -> There's Amazon DataZone
which does way more
1264.32 -> than business data catalog,
but I just put it in a box.
1266.72 -> So entry point I find my business data
1269.597 -> and I need to request access.
1271.94 -> The next area is third party tools.
1274.19 -> You could be using third
party tools to find data,
1276.59 -> request access and
understand those data sets.
1279.23 -> Or there's open source solutions
like data.all for example,
1282.59 -> that you could leverage.
1283.423 -> So there's a few different options
1284.42 -> we're seeing customers do today.
1287.57 -> The next thing you want to do
is regardless of the option
1289.94 -> you're doing, so DataZone
works out-of-the-box,
1292.4 -> but if you're working with a
third party, work with them,
1294.86 -> I've seen many examples, I've
actually blogged about one
1297.29 -> with Informatica about
how to automate workflows
1300.95 -> within Informatica to drive
your access control permissions
1303.83 -> down to your data catalog.
1308.24 -> Most of the ISVs can do
those workflow processes
1310.97 -> and push those access permissions down.
1314.03 -> A lot of them can also read
up your Glue data catalog
1316.37 -> to understand your classification rules.
1319.1 -> When you do that, you then
push down and make sure
1322.79 -> that you're using Lake
Formation for your catalog,
1325.73 -> your Glue data catalog for
your catalog information,
1328.19 -> for your permission
model, your policy control
1331.55 -> and access down to your data domains.
1333.68 -> It could be your S3 data lake
or Redshift for that example.
1336.89 -> So that way when your users are coming in
1338.81 -> through various engines, whether
or not it's Amazon, Athena
1342.32 -> or Spark and EMR, it's
enforcing those policies
1346.64 -> consistently on how they requested access
1348.86 -> from their business context,
1350.09 -> It provides that end-to-end experience.
1353.84 -> With that, I'm actually gonna,
oh, I have one more slide.
1356.42 -> Key takeaways and I'll
hand it over to Shihas.
1359.66 -> Key takeaways on enabling
agility with data governance
1363.02 -> and AWS is automate ingestion.
1366.05 -> Build up those patterns as
part of your business strategy
1369.23 -> and understand how you could
automate compliance controls,
1372.95 -> standard data quality rules so
that data engineers can move
1376.61 -> at the speed of your business.
1378.35 -> The second thing is automate
classifying, cataloging
1381.56 -> and profiling your data on the way in,
1384.98 -> so that your catalog understands
the technical information
1387.89 -> and your business catalog
starts to understand
1389.81 -> what is the classification
rules you need to apply
1392.42 -> when you're giving people access.
1394.46 -> The third area is automate the management
1396.86 -> through tag-based access controls.
1398.87 -> And make sure you can scale
through your enterprise
1401.24 -> so that your data is protected
throughout its life cycle
1404.84 -> and you're not consistently
managing permissions every time.
1408.29 -> And the last thing is
automate data sharing
1410.39 -> through one interface,
1411.29 -> whether or not it's through
Amazon DataZone or ISV partners
1414.98 -> or open source solutions
you build on your own.
1417.44 -> Automate that experience
so as user comes in,
1419.9 -> finds the data requests access
from a single user interface
1423.44 -> and that they know what
they're getting and they know
1425.6 -> their data's protected on the way in.
1427.55 -> So with that, I'm gonna
hand it over to Shihas
1430.19 -> to tell his story, thank you.
1432.138 -> (audience applauds)
1441.692 -> - Can you hear me well?
1444.47 -> Right. I'm Shihas Vamanjoor
from Prudential Financial.
1448.16 -> My team made a request for
me to open this 30 minute
1452.45 -> presentation in their words.
1454.16 -> Here goes, over the next 30
minutes you will be introduced
1460.19 -> to the next generation of data platforms,
1463.67 -> a fully self-creating, self-organizing
1467.21 -> and self-managing platform
1469.37 -> we call the autogenic data platform.
1472.52 -> Folks in today's audience
are among the first
1475.43 -> outside of Prudential to
learn of this innovation.
1479.06 -> This is the future of data platforms
1481.457 -> and the future has already
arrived at Prudential.
1485.54 -> Harpreet and Zane from
the data platform team,
1488.18 -> are witness for me having
said this at re:Invent.
1492.307 -> (audience murmurs)
1499.16 -> A little bit of background,
1502.64 -> I'm from Prudential Financial Services.
1504.95 -> Prudential is a global
leader in financial services.
1507.83 -> We serve both institutional as
well as individual customers.
1511.22 -> We are in over 50 countries,
50 million customers,
1515.47 -> 40,000 employees.
1517.16 -> Let's focus on the 40,000
employees for the rest
1519.41 -> of the conversation.
1522.56 -> Me, I'm situated within the
Prudential chief data office.
1526.46 -> I'm the Product Owner for
Enterprise and Data Platforms.
1530.06 -> The mission of the data
platform is a big and bold one.
1534.29 -> Number one on our mission
statement is to democratize data
1539.15 -> to increase the value creator
base within (indistinct)
1544.4 -> that's the number one thing.
1545.99 -> Other pieces, increase
velocity of data innovation,
1550.07 -> reduce cost, reduce risk, so on so forth.
1555.23 -> Now, who does this data platform serve?
1560.87 -> At Prudential in the
multiple lines of business,
1563.63 -> both domestic and overseas,
we have a large community
1568.25 -> of technical and non-technical users.
1571.55 -> Technical users are data
scientists, data engineers,
1576.41 -> data analysts, data stewards,
1578.48 -> business intelligence professionals,
1580.04 -> machine learning engineers,
so on, you get the drift.
1582.95 -> Non-technical users, line
of business managers,
1586.97 -> business initiative owners, executive.
1590.51 -> What do they do with data?
1592.64 -> They want to exploit data
to create business value.
1596.42 -> What are the common challenges?
1598.31 -> On the slide is some of
the common challenges.
1600.62 -> These challenges all lead
to long time-to-value
1604.34 -> First of them, hard to locate data.
1606.47 -> Why is this hard? Why is this
such a challenging problem?
1611 -> Our companies are growing continually.
1614.24 -> Data is also growing
continually, exponentially.
1617.24 -> So we have a large number
of systems and knowledge
1620.9 -> about these systems is usually tribal.
1623.24 -> That basically means the
owner of the system is the one
1625.88 -> that you tap on the shoulder
to dig out information
1628.46 -> about this data, time consuming process.
1630.74 -> Now when you start to
multiply the number of systems
1633.68 -> that you have to deal with.
1635.03 -> Second, long time-to-access.
1637.55 -> Since there are a large number of systems,
1639.32 -> you got to go through governance hoops.
1641.6 -> You got to go from system
one, two, three, four, five,
1644.66 -> bring it to another system,
1646.25 -> another set of apples to deal with.
1648.8 -> Third, lots and lots of human engineering.
1652.28 -> Since we have so many
systems in the source site
1656.75 -> and the technology space is
constantly, rapidly changing.
1661.04 -> You have a lot of technology requirements
1662.99 -> in terms of understanding this.
1664.67 -> You gotta be a superman
engineer to try and extract data
1668.84 -> from all of these systems,
bring it to another system.
1671.72 -> Complex governance.
1672.71 -> I talked to this about
the number of systems
1674.87 -> and how governance become complex.
1676.55 -> And finally there's a lot of
tedious and repetitive work
1679.91 -> before you actually start to do
1681.38 -> what you really want to do, right?
1685.43 -> So when I started this
journey of building out
1689.03 -> this enterprise data platform,
I went on a listening tour,
1692.78 -> two eyes, two ears, one mouth,
1695.33 -> keep the mouth shut kind of tour.
1697.37 -> So I was listening to
what they really wanted.
1700.04 -> Here are the desired experiences.
1703.04 -> One, they said, can you make
it simple for me to discover
1708.29 -> data within this entire
enterprise architecture
1711.17 -> wherever data lies really don't care?
1713.99 -> Is it possible for you to deliver
1716.6 -> to me a e-commerce like experience?
1719.63 -> I just wanna shop for data products.
1722.72 -> No, this is not a shoe or
a watch, it's a data table.
1726.38 -> It's a data view, a file,
I wanna shop for it.
1729.98 -> Second, I really don't
care what these systems
1735.05 -> that house the data are in.
1736.43 -> I don't wanna be bothered
about pulling data
1739.04 -> from system X to system Y,
1741.38 -> data shipping is not a heroic job.
1745.31 -> There's no glamour in it anymore.
1747.26 -> It's very hard to convince
people to do that.
1750.86 -> So this no ETL is a
no-brainer, let's second.
1754.94 -> Third, why can't you build me a system
1758.27 -> where you optimize my time
as a data value creator
1762.38 -> so I can focus on actual value creation
1765.2 -> rather than spending 90%
of my time bringing data
1768.14 -> over from arcane systems,
cleaning it, massaging it,
1772.04 -> and getting it ready so
that I can create value?
1775.67 -> Finally, if this data is in a system
1778.19 -> that I can get access to,
how do you make it accessible
1781.1 -> to me through a few mouse clicks?
1783.2 -> Those are the challenges.
1785.81 -> This is from a very initial
draft of our concept.
1790.52 -> I said, I hear you.
1792.17 -> So what you're really
asking for is the genesis
1795.32 -> of an autogenic data platform,
1797.84 -> a platform that creates itself,
1800.12 -> and you, the value creator are the master,
1805.97 -> not the platform engineering team,
1808.28 -> not a data engineering team,
1809.93 -> but you have the keys to
this entire ecosystem.
1814.52 -> How would it work?
1816.11 -> The autogenic data platform
has to catalog and discover
1820.25 -> at scale, all the data
assets that you have
1822.32 -> within the enterprise, right?
1824.03 -> It's gotta make it
available in a marketplace,
1825.92 -> that's number one.
1827.39 -> Once it is in a marketplace,
what do you do next?
1831.02 -> You use an order
fulfillment request system.
1834.05 -> So you press those buttons,
1835.7 -> make sure the data goes over
1837.11 -> to a data and analytics platform
where you can exploit it.
1840.23 -> Number three, you do not want
to do manual data governance,
1845.27 -> manual data engineering,
none of those things.
1848.12 -> You expect the system to
take care of it for you.
1851.72 -> So you bring the data through
automated data governance
1854.39 -> data previously through a highly automated
1857.84 -> data processing engine into a zone.
1860.66 -> Finally gimme another set of
order fulfillment requests
1864.02 -> in the same marketplace so I
can get access to other data.
1867.65 -> This was the ask.
1869.3 -> At this point, I had a
few important questions
1872.27 -> that I need to answer.
1873.74 -> Looking at this end state
vision, I had to decide,
1878.99 -> number one, which partner would I choose?
1883.85 -> There's a series of
services that you need,
1886.67 -> intricate engineering that
you need to put together
1889.58 -> and then hide it all
away, that's the goal.
1893.21 -> Number two, what style of
architecture would I use?
1900.56 -> What method of development
would I use for the first?
1905.69 -> It became quite clear to me
1907.46 -> of all the cloud service providers,
1909.35 -> the most mature one with
these services was AWS.
1914.87 -> So we made the selection to go with AWS.
1918.89 -> Second, for the partner, we
chose AWS professional services.
1922.91 -> They're closest to the
technology and this was a design
1925.88 -> that was not trivial.
1928.31 -> Finally, we chose Agile as
the method of development.
1931.49 -> I told my team that we would
deploy the initial version
1936.32 -> of the autogenic data
platform into production
1938.63 -> within three months
from which time onward,
1941.24 -> every release would be a sprint,
which would be two weeks.
1945.05 -> And the reason we did
this is we always found
1947.81 -> that building with the
expectation of someone coming
1950.93 -> to use it, never works in practice.
1954.29 -> So that's the method that we chose.
1957.44 -> Fast forward nine months, what do we have?
1960.83 -> We have a completely automated,
hyper automated system.
1966.98 -> We have a marketplace.
1968.84 -> We use a front end technology
for the marketplace,
1971.54 -> which I'm glad to see, Jason's
presentation that Amazon's
1975.68 -> done some wonderful work,
which we will reverse engineer.
1979.67 -> We have some front-end
technologies for now,
1982.79 -> which take the place of the marketplace,
1985.04 -> which does the order fulfillment.
1987.32 -> We have a Lake House construct,
1991.22 -> a construct of bringing
data from whatever pattern,
1995.54 -> whether it is files, whether it's JDBC,
1998.15 -> whether it is bad streaming,
1999.86 -> all the fancy bells and whistles
that people usually want
2003.19 -> in a data platform, we have all of that,
2004.81 -> but incrementally we built that over.
2007.27 -> Now what we then went forward and said,
2009.73 -> the true value is in
shifting engineering right.
2014.35 -> So we then said, okay, let's
look at the key aspects
2017.2 -> of any data and analytics platform.
2020.02 -> Let's do things like data quality.
2023.5 -> Let's do things like data
standardization, PII detection,
2028.18 -> change data capture and management.
2029.8 -> All within an automated environment
2032.62 -> that no human being has to write code for.
2035.86 -> That was the goal, so over
a period of 12 months,
2039.85 -> we've achieved this goal.
2041.53 -> We are not at the
completion of the journey,
2044.14 -> the journey is still a long one,
2046.12 -> but the usage has been significant, right?
2048.01 -> From month three, we've been
able to onboard different
2051.67 -> data journey teams to the
platform and now we use a process
2055.24 -> of user generated demand
to build any new features.
2059.83 -> Now, one would think that this
requires an army of engineers
2063.37 -> and Prudential is a global giant.
2065.11 -> You would have access to an
army of engineers to build this,
2067.66 -> not true, small core team.
2070.66 -> They're sitting in the first row.
2071.8 -> We got Zane, Harpreet, Samir,
2073.81 -> these are the cloud and data
engineers along with ProServe.
2077.62 -> Jason was part of the
original ProServe team
2079.78 -> that helped build this out.
2081.88 -> Dozen or so people, under a dozen,
2084.01 -> including management was all it took,
2086.65 -> great effort from the
collective at Prudential.
2089.35 -> They provided all the collective muscle
2091.69 -> to make this happen, because
this was a common shared goal.
2097.15 -> In terms of architecture,
let's get down into a level
2099.97 -> of detail that is necessary
for this conversation
2102.7 -> and it ties closely to what AWS and Jason
2105.46 -> have been talking to in
the previous presentation.
2109.96 -> Number one, here's a
representation of sources,
2114.37 -> logical representation.
2115.54 -> And this is what you are likely to see
2117.79 -> in your own enterprises,
it has everything.
2120.22 -> Data center based databases, file systems,
2125.26 -> SaaS applications like
Salesforce and ServiceNow, APIs.
2129.73 -> Basically source of first
party, second party,
2132.46 -> third party data, wherever it may be.
2135.46 -> The second part of the
standard architecture
2138.46 -> is ingestion service layer.
2141.43 -> This is where incrementally,
the autogenic data platform
2146.38 -> has features that allow for
different types of sources
2150.49 -> to be handled differently
but intelligently.
2153.94 -> So you start with flat
file data transfers,
2157.69 -> okay, here's AWS data sync.
2160.45 -> But here's the difference,
nobody needs to know
2163.72 -> how AWS data sync works.
2166.54 -> That's coded into the automation.
2169.18 -> You provide the order that
says, I am a file source.
2175.69 -> The autogenic data platform
detects data file sources
2179.77 -> best suited for a data
sync transfer program.
2184.75 -> You are a JDBC source, so
I'm going to use Glue JDBC,
2189.28 -> oh you are Salesforce,
I'm going to use AppFlow.
2195.25 -> So that's the detection mechanism
2197.56 -> that's built into the programming.
2199.72 -> Now as you go rightward,
you'll start to see persistent
2204.07 -> layers in the architecture come through.
2206.62 -> Persistent layers are both
a lake as well as a house.
2210.82 -> Now up to this point, what
you have to imagine is
2214.48 -> there is no human being
writing any of this code.
2216.94 -> Not for infrastructure,
not for the services,
2219.88 -> not for the pipelines,
not for the orchestration.
2222.22 -> All happening by the application.
2224.5 -> The application has taken control.
2227.29 -> So it is self-creating.
2228.64 -> So what happens next is data is now moved
2231.82 -> through this persistent layer
from raw to standardized,
2238.87 -> applying the data quality
rules, the governance rules.
2244 -> Change data management
and change data capture
2246.07 -> is embedded within this pipeline.
2249.85 -> And then the data is now
available to the curator.
2252.82 -> But at this time, from
a governance standpoint,
2255.04 -> one would thing, hey, is this
data all accessible to people?
2258.25 -> No, the process has
decoupled access provisioning
2262.33 -> from data movement.
2263.77 -> None of this data is actually accessible.
2266.74 -> Now comes the central governance,
2268.96 -> which is what Lake Formation is for.
2271.39 -> We centralize governance to all the assets
2274.24 -> through Lake Formation,
one single point of access
2278.02 -> provisioning to this entire ecosystem.
2280.87 -> Finally, we have all kinds
of consumption patterns
2285.76 -> that get access through
this Lake Formation
2288.85 -> central governance layer
to this entire lake house.
2292 -> Now what I didn't talk
is just as important
2295.09 -> as what I just said.
2297.28 -> What we have done thus far
2299.92 -> is hyper automate the journey
to a standardized layer.
2303.94 -> The journey of data
exploitation does not end there.
2307.54 -> It begins there, but now is
the value creation exercise.
2311.83 -> Now you have these
highly skilled resources,
2314.14 -> the talented folks who
can use this ecosystem
2317.98 -> to refine the data.
2320.32 -> Now I have another innovation
project in the works,
2323.14 -> which is trying to machine
this curation as well.
2325.93 -> So that regular refinement
to create a data product
2330.37 -> is the focus of my data
journey teams today
2333.91 -> and this architecture has
allowed them to do so.
2337.15 -> Let's unpack this a little bit more.
2341.89 -> Here's a sample user
onboarding experience, right?
2347.47 -> This is how users onboard
data to our platform today.
2352.69 -> Here's a data athlete, technical
user, non-technical user,
2356.23 -> as long as you can use the
browser, you're welcome.
2360.37 -> You go scan this marketplace,
it looks like Amazon.com
2364.69 -> or whatever your favorite
e-commerce site is, right?
2367.6 -> You get a collective that
is a bunch of objects
2371.56 -> or a single object, it's your choice.
2374.2 -> You then say,
2375.25 -> I want this object to be
refreshed at a given frequency.
2382.48 -> You then said, submit order.
2386.26 -> Internally, the metadata
generated by your order
2390.91 -> is the heart of this intelligent platform.
2393.61 -> This uses all the metadata
that's been collected so far
2397.75 -> to provide instructions to
this auto generating system.
2404.11 -> Auto generation actually
creates the infrastructure code,
2407.83 -> that's the first order of business for it.
2410.44 -> If that is an order that is
coming in for the first time,
2414.55 -> all the infrastructure
necessary is created dynamically
2418.66 -> because the metadata the
manifest contains the information
2422.38 -> that is sufficient for you
to have a targeted creation
2425.92 -> of both infrastructure and
pipeline and orchestration.
2430.21 -> Post that, it then identifies
the necessary elements
2434.98 -> to actually execute on this.
2437.86 -> So as an example, when I
process an order for a database
2443.98 -> for a file system to be transported,
2447.25 -> the first time creation of the
infrastructure takes place,
2450.85 -> subsequent runs of that
infrastructure pipeline
2454.39 -> are now available as an
order fulfillment request
2457.69 -> that this entire automation takes care of.
2461.29 -> That's how this whole thing works.
2464.29 -> So in a sense, what
we've now been able to do
2467.777 -> is to say, for the foreseeable future,
2472.06 -> run this kind of pipeline
moving this kind of data
2475.54 -> from these systems to a secondary system
2478.78 -> for data and analytics.
2482.47 -> And the system is responsible
for the orchestration,
2486.43 -> the constant redelivery, the
remediation, the notification.
2490.54 -> All of that is now built into the system
2492.67 -> with embedded governance,
embedded data quality,
2495.76 -> embedded everything.
2498.16 -> That's what hyper automation is all about.
2501.04 -> Now, if you take an example of
a user experience for access,
2508.9 -> similar, nothing much different,
the toys are different.
2513.85 -> What we use as AWS services are different,
2518.68 -> but the experience for
the user is the same.
2521.71 -> He doesn't need to see this.
2523.9 -> Like I said,
2524.733 -> the heroics belong to the
data engineering team.
2529.33 -> Vendors will provide all
the necessary services,
2533.2 -> it's up to us, as users of those services
2535.84 -> to determine the
experiences that we deliver.
2538.66 -> The experiences that we chose
to deliver as a platform team
2541.75 -> is an abstracted experience.
2543.52 -> We try to figure out why is it necessary
2545.77 -> that you have to do something.
2547.42 -> The answer usually is you don't need to.
2549.7 -> So if the answer is you don't need to,
2551.32 -> we hide it behind a browser.
2554.41 -> So you can focus on the actual value add.
2557.74 -> Similar experiences you will
find in data warehouses.
2560.74 -> The reason why I have this extra slide,
2562.63 -> is to tell you that no matter what kind
2565.21 -> of infrastructure aspect that
you have underneath today,
2568.273 -> I have a data lake or data warehouses,
2570.73 -> tomorrow I have a graph database.
2572.89 -> Tomorrow I have another
kind of repository,
2575.47 -> doesn't really matter.
2577.45 -> The method is the same.
The ideas are the same.
2580.81 -> The way you do it is exactly the same.
2585.67 -> Now, key lessons learned.
2589.04 -> This is from our journey
over the last nine months.
2593.02 -> You know it's going to differ
from journey to journey.
2595.84 -> Super important to have a
clear vision of end-state
2599.8 -> and this is important because
there's too many shiny tools
2602.74 -> in the market, too many distractions.
2605.41 -> You're going to hear about
2607.15 -> do this with domain driven design.
2609.91 -> Do this with application oriented design.
2614.14 -> Yeah, those are important,
relevant topics,
2617.59 -> take away from the real focus,
2619.27 -> which is you stay focused on your users.
2621.94 -> Who are they? In this
case, they're internal.
2624.61 -> According to our CEO, two
classification of employees,
2628.21 -> colleagues who help customers directly,
2630.58 -> colleagues who help
colleagues who help customers.
2634.57 -> My customer is internal.
2636.31 -> The product that they build
2637.63 -> will be used by external customers.
2640.21 -> I've got to stay focused on them.
2642.46 -> So I gotta do everything
that helps them do things
2645.49 -> at a faster clip, at a lower
energy expenditure level.
2651.1 -> That's my focus, so stay
focused on your user.
2653.98 -> It's super easy to get lost
in this jungle of innovation
2658.09 -> that every vendor is out there peddling.
2661.24 -> Second, automate governance at every step.
2665.53 -> You will hear that governance
is a beast, hard to automate.
2672.13 -> Tackle that head on. Ask what
exactly is hard to automate?
2677.23 -> Data quality routines? No.
2679.81 -> Configurable, you apply a rule set.
2684.16 -> What else is hard to do,
data ownership? Nope.
2687.46 -> Embedded at an object level like we have
2689.65 -> every asset within our
ecosystem, every table,
2693.73 -> has a data owner embedded in it.
2696.1 -> What does that do? Helps
you approve things quickly.
2699.94 -> Helps you understand
ownership models quickly.
2702.55 -> So question anything that takes
away from hyper automation.
2707.17 -> Third, this is very important as well.
2709.63 -> You need to build a
small core talented team.
2714.25 -> They need to do two things.
2716.32 -> Number one, they're responsible
for feature development.
2720.46 -> This feature development
is driven by user demand.
2724.18 -> You don't try to build for the future
2725.83 -> without knowing your actual user demand,
2727.9 -> so you gotta be super close to the user.
2729.58 -> At the same time, while they
build these shiny new features,
2732.67 -> it's also super important
that they support users
2734.86 -> who are currently using the platform.
2736.33 -> So you got to have two parts to the team.
2739.12 -> One that keeps the light
on, keeps remediating,
2741.64 -> keeps fixing, the other is
to build out the new feature.
2747.58 -> The other tried and tested value
2751.45 -> in trying, value and failing.
2754.72 -> There's a lot of lip
service in enterprises
2758.14 -> about fail fast, be brave.
2761.59 -> But when it comes to
impacting your timelines,
2764.32 -> your deliveries, it's
not looked upon kindly.
2766.96 -> In our experiences, there
is no escaping that fact,
2770.59 -> you simply accept it, this is
the cost of doing business.
2774.43 -> You're going to try with
numerous AWS services
2777.55 -> or other cloud services for that matter.
2779.59 -> You're going to try combination of things.
2781.81 -> There are certain things that
you don't want to invent.
2784.51 -> You just don't want to
invent authentication,
2786.79 -> you don't want to invent authorization,
2789.04 -> you don't want to invent
encryption, lots of things.
2792.46 -> You want to take services that
have already been created.
2797.35 -> But how you glue them together,
2799.57 -> you'll have to test it yourself.
2802.39 -> Now the reason why I bring
this forward is, I posit,
2806.26 -> that you have to create an
engineering team for yourself.
2810.79 -> You have to treat this as an application.
2813.43 -> Historically, data platforms
2815.38 -> have not been treated as applications.
2817.9 -> They usually have been
approached as a data product
2821.98 -> or a data platform product by
itself with some integrations.
2827.14 -> The change here is, we talk
about an autogenic data platform
2830.98 -> being an application and
here's a product owner
2833.74 -> and here's a development team
for that application alone.
2836.5 -> No different than any software product.
2839.56 -> So this is a software engineering team
2842.2 -> internally servicing it for
the enterprise customers.
2846.79 -> There is a super important
construct too, remember.
2852.52 -> Now in terms of outcomes,
everybody loves outcomes,
2855.727 -> and everybody has these OKRs that measure
2862.03 -> the value of what you have done.
2864.52 -> For us, it's quite clear
that this is the winner.
2869.71 -> The increased talent pool,
what does that talk to?
2873.25 -> That talks to now, us not
having this class system.
2878.92 -> Oh, you don't know AWS Glue, do you?
2882.04 -> You can't participate in our ecosystem.
2884.11 -> No, we don't have that
conversation anymore.
2887.53 -> You can use a browser? You're welcome.
2889.99 -> What more do you need in
order to be successful?
2892.6 -> You need that browser to have
SQL embedded, a query engine?
2895.99 -> Okay, the platform teams
hear that, to support you.
2899.71 -> You need Excel embedded
within the same data portal?
2902.47 -> So be it, we deliver to you.
2905.419 -> We don't levy a tax, a learning tax.
2909.49 -> Historically, platforms
which have not been created
2912.85 -> through this vision apply these taxes.
2915.52 -> There's a regime of taxes everywhere.
2917.8 -> You pay repeat taxes every
day in your work life.
2921.85 -> The mission of this platform
is to eliminate those taxes.
2927.22 -> Second, time savings.
2930.1 -> Our original goal was to shift right
2932.92 -> as much of human engineering as possible.
2936.43 -> Current state of the platform has shifted.
2939.4 -> The first point where human engineering
2941.35 -> starts to be involved is
when you have to curate
2944.47 -> or refine the data.
2945.97 -> So you've got the taxes all
the way to that standard layer
2949.18 -> of your lake house architecture
2950.86 -> through this autogenic data platform.
2954.61 -> Third, data access.
2957.46 -> That's been cut from days to minutes now,
2959.8 -> it's within the application,
2960.88 -> it's within the same data marketplace.
2962.53 -> It's a application that is
governed, it is auditable,
2966.49 -> it tells you who granted access
to who and for what data.
2970.15 -> All of that information is
present within the tool itself.
2974.56 -> Cost savings.
2976.39 -> When you release such an application
2978.7 -> to the enterprise, you
decrease the appetite
2981.88 -> for building bespoke data solutions
2984.61 -> because now you have to
compare it against something
2987.88 -> that has so many features
and works so well.
2992.601 -> Success is super important for
that reason that it decreases
2996.1 -> the appetite for folks
to try this on their own.
2998.35 -> There's no harm in trying
things on their own.
3000.03 -> But then you would see the
engineering that is necessary
3003.27 -> to create a holistic integrated system
3005.85 -> that's super important to realize.
3011.16 -> Now, in terms of governance,
this was a governance topic,
3015.63 -> so I want to talk to a certain
aspect of governance as well.
3019.8 -> Trust in data gets
better when human beings
3024.15 -> don't finger the data.
3026.76 -> How far right can you move
that? That's the question.
3031.62 -> How can you report on the hops of data
3035.22 -> that's taking through an automation?
3038.61 -> How much of this quality
stuff can you move
3041.4 -> towards the left of the equation
3042.81 -> which is the source of data?
3043.977 -> And how much of it can be
subject to the automation?
3046.71 -> So it improves trust in data
3049.2 -> because you have not touched the data
3051.12 -> up to a certain aspect.
3052.47 -> Now you can also focus your energies
3055.29 -> as a data governance
organization on the actual place
3059.37 -> where data is changed, which
is the refinement zones.
3063.87 -> If our innovation is successful,
3065.49 -> we'll move it even more to the right.
3067.89 -> But for now there is a narrow focused area
3070.83 -> which is either on the
left side of the entire
3074.01 -> architecture within the source
3075.75 -> or, on the right side
within the curation zones
3078.6 -> where you can now focus on,
your focus is not spread.
3083.01 -> And lastly, because of the
popularity of the platform,
3087.18 -> we are seeing reduced data sprawl.
3090.15 -> We are seeing more and
more multi-tenant teams,
3093.6 -> data journey teams, come
and use the platform,
3097.11 -> it's also integrating our data.
3098.79 -> So it's subjective whether that
was one of the primary views
3102.45 -> because we still live in a
very federated business model.
3105 -> But the goal for us is
to reduce the data silos.
3109.77 -> I know I've said a lot
for additional thinking
3115.92 -> around this topic, also published a Medium
3119.28 -> and LinkedIn article,
3120.39 -> under the header Autogenic Data Platforms.
3123.3 -> Feel free to hit me at my
social media handle at LinkedIn,
3127.92 -> happy to collaborate.
3129.36 -> At this point, I think
Jason and me are happy
3132.21 -> to take questions.
3134.444 -> - [Jason] Two more slides.
Yeah, two more slides.
3137.464 -> - Sure.
- Sure.
3138.87 -> - So just two more slides.
How do I get started?
3144.27 -> There's three programs that AWS offers
3147.24 -> to help you get started.
3149.07 -> The first one is if you want
to build a data strategy,
3152.31 -> we have our Data-Driven
Everything program.
3154.56 -> Feel free to reach out to
your SAs or your account team
3157.95 -> to understand more about that program.
3160.14 -> The second one is Data Labs,
3161.64 -> is have that strategy
and help executing it.
3165.15 -> That would be our data lab team.
3166.41 -> And the third one is if you
need help with implementation,
3169.41 -> there's obviously ProServe
3170.45 -> in our great partner community as well.
3172.65 -> So those are our first ways
to help you get started
3174.87 -> with data governance.
3176.34 -> We do have a new workshop
that both our our D2E,
3180.45 -> our Data-Driven Everything
team and our ProServe team
3183.42 -> can help you execute,
it helps you understand
3185.43 -> where you're at with data governance.
3187.11 -> So please reach out to your
account team to understand
3190.2 -> where you're at in your
journey with data governance
3192.03 -> and they're happy to help you with that.
3193.83 -> And the last thing is getting started.
3196.2 -> Think big, use that Discovery
Workshop and data governance.
3199.77 -> Use things like Data-Driven Everything.
3202.05 -> You want to think big,
but you wanna start small.
3204.72 -> How does this apply to
your business strategy?
3206.97 -> How do you use tools such as Data Labs
3209.43 -> and ProServe through POCs?
3211.71 -> And then how do you scale fast
through partners in Proserve?
3214.83 -> So thank you, we're happy
to take a couple questions.
3218.73 -> We have a few minutes if
people have questions.