AWS re:Invent 2022 - 3 innovations that redefine data protection for Amazon S3 (PRT315)
Aug 16, 2023
AWS re:Invent 2022 - 3 innovations that redefine data protection for Amazon S3 (PRT315)
You rely on Amazon S3 to power your cloud-native applications, data lakes, analytics, and AI. While Amazon S3 is extremely durable, the resilience of the data itself is your responsibility. So how do you secure billions of objects from an ever-expanding list of potential threats? And how do you recover when your data is compromised? In this session, technologists from Amazon, Cox Automotive, and Clumio dive deep into Amazon S3 data protection and demonstrate how to swiftly recover petabytes of data in the event of an incident. Learn how to implement continuous immutable backup that is air-gapped and instantly recoverable. This presentation is brought to you by Clumio, an AWS Partner. Learn more about AWS re:Invent at https://go.aws/3ikK4dD . Subscribe: More AWS videos http://bit.ly/2O3zS75 More AWS events videos http://bit.ly/316g9t4 ABOUT AWS Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster. #reInvent2022 #AWSreInvent2022 #AWSEvents
Content
1.53 -> - Welcome to "Three Innovations
3.03 -> that Redefine Data
Protection for Amazon S3."
6.42 -> My name is Woon Jung,
7.998 -> I'm the Co-Founder and CTO of Clumio.
9.75 -> Today with me, I have Peter Imming.
12 -> - Hello.
12.833 -> - Principal Product
Manager from Amazon S3.
15.42 -> And Mark Huber, Senior
Director of Engineering
17.73 -> at Cox Automotive.
19.4 -> (attendees applauding)
20.25 -> First, I'll have Peter
join me to talk about
22.38 -> how S3 is being used and why
you should protect the data
25.38 -> in your S3 bucket.
26.67 -> And then after that I'll have Mark join me
28.86 -> to talk about their journey in their AWS
31.38 -> and the partnership with Clumio.
33.51 -> Last, I'll come in and talk
about kinda the details
36.03 -> about the how things are
implemented in the backend.
38.73 -> And I'll give you a live demo
40.08 -> that I'm pretty sure you all like it.
42.18 -> Peter.
43.77 -> - Thanks, Woon.
44.79 -> So if you thought this was
data protection for security
50.37 -> or IAM, or encryption, this
is not the session for you.
54.54 -> We're gonna be talking a
lot about data protection
56.94 -> in terms of backup.
59.82 -> In terms of replication
and high availability,
65.13 -> durability, availability.
67.26 -> So if you're in the wrong session,
69.18 -> definitely now's the time to head on out.
72.33 -> But again,
73.163 -> I think we wanna say thank
you for coming to the session.
74.88 -> We know you had a lot of choices out there
76.38 -> for different sessions to attend,
77.55 -> including the bar sessions
out there right now.
80.79 -> So thank you all.
81.72 -> I think from our
perspective for coming here
84.12 -> and listening to us today.
85.74 -> We'll try and keep it as
interactive as possible.
87.39 -> If you have questions,
88.29 -> I think we're all
comfortable taking questions
90.66 -> as you might have them.
92.43 -> So we'll go ahead and get started here,
94.53 -> and talk about data protection for S3.
99.03 -> Now Amazon S3,
101.01 -> we really have come a long way
in 16 years with Amazon S3.
105.24 -> We're now storing over 280
trillion objects in S3,
111.39 -> over a hundred million
transactions per second.
114.66 -> And really, what we're
seeing S3 kind of transform
117.39 -> into is really now the production,
120.72 -> the primary production
storage for customers
123.45 -> to create new data in.
125.01 -> It's no longer just a place to store data
127.38 -> or backup data too.
129.09 -> In my time at AWS, I've been
there for three years on S3.
133.53 -> That's probably the biggest
transformation I've seen is that
136.38 -> we now have the primary,
139.26 -> the bulk of the data coming into S3
141.03 -> being natively created inside of AWS.
144.27 -> And this is really a change
over the last couple of years
147.78 -> that we've just seen accelerate.
149.61 -> And that comes from applications
that you're running,
152.34 -> such as data lakes, machine learning.
154.92 -> It could be EMR, it could
be our services, it could be
159.966 -> Datadog, Databricks, it
could be Snowflake DB.
163.32 -> You're starting to see now
S3 and object storage really
166.29 -> become the primary
defacto class of storage
169.29 -> that you're creating new content with it.
171.9 -> And that includes cloud
native applications
173.97 -> that you may be running in a container,
175.68 -> that could be classic virtual machines
177.48 -> that are now writing out
to object storage directly.
180.54 -> It could be new databases
that are gonna be shipping
182.67 -> from traditional database vendors
184.5 -> that are gonna now run
natively on object storage
187.05 -> for the first time in their long history.
189.6 -> And we're also seeing
obviously a tremendous growth
192 -> in log files, machine generated log files,
195.72 -> machine generated data.
197.07 -> Everything from IP cameras,
everything from factory sensors
201.03 -> that are all generating machine logs
202.92 -> and then rapidly sending those to S3
204.99 -> as the production storage.
207.36 -> And we take that very seriously at Amazon.
210.15 -> So when we've got 280
trillion objects to store,
214.56 -> that takes a lot of
effort to store durably,
218.4 -> to store with availability,
to store with redundancy.
222.51 -> And we take that very seriously.
223.83 -> And if you've ever had to
manage storage at scale,
227.37 -> you understand that
that takes a lot of work
229.5 -> to manage all three of
those for virtually,
233.4 -> essentially unlimited scale.
235.95 -> So we wake up every day
on S3 and go look after
240.57 -> the availability, durability,
243.09 -> and the resiliency of that data,
244.74 -> of those 280 trillion objects
so that you don't have to.
248.22 -> You don't have to worry that
your data is durably stored.
250.83 -> You can go take that time
now back and go innovate
253.92 -> on top of S3 rather than
having to craft durable storage
258.09 -> that's highly available.
259.8 -> When we look at S3, there are though
264.66 -> a different type of
data protection question
267.21 -> that we get today, and
that is a delete request.
270.51 -> We don't know customer intent
272.94 -> when you ask us to delete an object.
276.06 -> We have to look at that
request as unambiguous.
278.61 -> We don't know if it's accidental.
280.89 -> We don't know if it's intentional.
282.75 -> We don't know if it's
perhaps even malicious.
285.54 -> So when we have that
type of application data,
287.88 -> whether it's user generated
data, media configuration data,
290.79 -> again, we talked about the data lakes,
292.41 -> sensitive information,
all of this different data
295.38 -> has different value to your organization.
297.9 -> And what we're looking at here is
300 -> different layers of data protection
302.01 -> on top of different types of data.
304.29 -> And what I'll be talking about
before handing it off to Mark
307.23 -> is really kinda looking
at the different layers
309.9 -> of data protection that
are most appropriate
311.79 -> for the different types of data
313.26 -> that you're storing in S3 today.
315.75 -> If that's compliance data,
317.04 -> that's gonna require a
different level of protection
319.26 -> than what you might have for user data
323.25 -> that might be uploaded.
324.27 -> Cat.jpeg, right?
325.41 -> How many of copies of
Cat.jpeg do we need to store?
329.82 -> That's a very different
type of data to protect
333.12 -> than data that contains
payment card information,
336.3 -> HIPAA data, personally
identifiable information.
339.66 -> So this is the type of
data that we're storing.
342.09 -> This is the type of data
that you are storing in S3,
344.58 -> I should say.
345.48 -> And so what we're gonna
be talking about today
347.07 -> is kinda crafting the
right layers of protection
349.53 -> on top of S3 and how you can set that up
354.57 -> for the right types of
data that you're storing.
356.43 -> When we look at the types of
risks that you have in S3,
360.18 -> the difference versus traditional storage
363.33 -> that you may have been running
for the last 10, 20 years
366.96 -> is that you no longer have
to worry about, again,
369.51 -> the durability and the
physical access to the storage.
372.27 -> It's now about an API, it's
that request to delete data.
377.22 -> Again, we don't know what the
intent is of that request,
381.39 -> but we do know, and it's important to S3
384.87 -> that we honor that request.
386.88 -> So whether that's a human error,
388.53 -> whether that's an inadvertent
deletion, the stakes happen.
392.01 -> Is it a natural disaster?
Is it a fire a flood?
394.5 -> Is it Godzilla coming in
396.09 -> and wrecking some undersea
communication cables?
399.12 -> At the end of the day,
400.14 -> these are all things to consider
when we're talking about
403.53 -> your data storage in S3.
405.617 -> Again, the deletions though,
407.55 -> those are really what we're
gonna kind of focus on today
409.68 -> for the most part.
410.55 -> Software errors, whether it's a human
412.5 -> or a piece of software.
413.58 -> If it's a script, maybe
it's a lifecycle policy
416.85 -> that's been misconfigured.
418.38 -> Again, we cannot distinguish
420.33 -> between a correct deletion
and an incorrect deletion.
423.33 -> We have to honor that request
just as we honor the request
427.29 -> to put data into S3.
429.66 -> So when we look at that, we
now have accidental data loss,
432.54 -> we have software errors,
433.95 -> and we also have what we call bad actors.
436.71 -> These could be employees that have, again,
439.56 -> authorization to the data
through our access controls,
443.73 -> our IAM policies,
445.65 -> but their intent as a bad actor
is to do something malicious
449.16 -> with that data.
449.993 -> Again, we can't differentiate
451.2 -> between their delete request
453.21 -> and a properly authorized delete request
456.03 -> that's not coming from a bad actor.
458.22 -> So we have to look at all
of these possibilities
460.95 -> and then offer additional layers
of protection inside of S3.
465.42 -> And that's really where this kinda,
466.95 -> it looks like an everlasting
gobstopper type approach here,
469.62 -> but your data is at the center.
471.21 -> And the first layer of protection
that we always talk about
473.91 -> is, again, your access controls.
475.86 -> S3 by default is secure.
477.96 -> You're not gonna access that data
479.91 -> unless you have been
authorized through IAM.
483.24 -> Now, when we look at that though,
485.73 -> if you are properly authorized,
487.44 -> once you've got access to the
data, what can you do with it?
490.35 -> An accidental deletion, again,
492.33 -> is no different to S3
than a malicious deletion.
495.15 -> We can't distinguish between the two.
497.55 -> So that's really where
something like object versioning
500.01 -> is the beginning of that journey
502.05 -> where you can now set S3 to go ahead
504.69 -> and start preventing accidental deletions
508.32 -> by keeping every version that is created.
511.5 -> Is it an override? We
will create a new version.
513.69 -> Is it the deletion? We will
then put a deletion marker
516.84 -> as the current version in
your S3 version object stack.
522.27 -> We keep track of every single version.
524.82 -> We always essentially
append in object versioning
528.51 -> once that's enabled on your bucket.
530.37 -> From there, we then need to
talk about malicious deletions.
534.15 -> So we've got accidental deletions covered.
535.92 -> What about malicious deletions?
537.3 -> Malicious deletions,
that is the layer that we
540.99 -> then bring to bear.
542.58 -> This is again, an opt-in
feature called object lock.
545.16 -> And object lock release becomes
sort of a defacto standard
547.74 -> for immutable storage in the cloud.
550.41 -> Object lock can be enabled
at the bucket level
552.51 -> or a per object level.
554.4 -> And once you have an object
locked placed on an object,
557.91 -> it's a retain until date.
559.65 -> That object cannot be
deleted by AWS personnel.
563.22 -> It cannot be deleted by your root account.
566.58 -> So we can now prevent
even malicious bad actors
569.79 -> from coming in and performing
an intentional delete,
573.72 -> even though they're properly authorized
575.82 -> through IAM policies.
578.58 -> We'll also be talking a
little bit about replication.
581.43 -> Now, if you need to prevent,
584.55 -> and essentially if you need
to have that data available
588.36 -> for compliance reasons or
have that data available
591.54 -> in a different region for resiliency,
594.15 -> that's where S3 replication comes in.
596.55 -> Now, replication is there,
it's not going to be a backup.
599.703 -> It's not gonna be essentially managed
601.53 -> or independent from S3.
603.63 -> So when we look at all
three of these here,
607.08 -> object versioning, object
lock and replication,
610.02 -> these are all opt-in features
611.46 -> that you have to turn on in S3.
614.67 -> They're not necessarily centrally managed.
616.89 -> They are enabled essentially
per bucket inside of S3.
621.09 -> Now, to look at all of that together,
622.86 -> Mark's gonna be talking a little bit about
624.54 -> another feature that we launched.
625.89 -> It's one of my favorite
things to talk about
627.39 -> called S3 Storage Lens.
629.46 -> It gives you a bird's
eye view, a central view
631.89 -> of all of those features and
essentially what percentage
634.83 -> of your data is versioned?
637.26 -> What percentage of your
data is object locked?
640.14 -> What percentage of your
data is replicated?
643.05 -> And Storage Lens is something
644.25 -> that you can actually get up
and running by dinner tonight.
648.42 -> It has 14 days of
historical data ready to go.
651.09 -> Mark's gonna show you how
Cox Automotive uses it,
654.27 -> but it gives you a bird's eye view
656.94 -> of all three of those data
protection capabilities.
660.99 -> But what we've talked about
so far is all native to S3,
664.62 -> but mistakes again still happen.
667.35 -> None of those are a replacement
668.7 -> for a true traditional
backup that is independent
671.34 -> and centrally managed.
672.69 -> And that's really where
we talk about solutions
676.11 -> with partners such Clumio.
678.21 -> And that's where we'll be talking about
680.64 -> Cox Automotive's journey
with Mark here today
682.92 -> about why they chose Clumio to supplement
686.22 -> those existing data protection features
688.05 -> that they're already using in S3,
689.88 -> but why Cox wanted to
cover that specifically
693.12 -> and look at a solution like Clumio
695.73 -> to actually create an independent
backup copy of S3, Mark.
701.4 -> - All right, thanks Peter.
703.89 -> All right, my name is Mark Huber.
705.99 -> I am the Director of
Engineering enablement
709.53 -> at Cox Automotive.
711.24 -> So Cox Automotive, who are we?
714.18 -> Not maybe a household
name that you would know,
717 -> but you probably know
some of our public faces
719.7 -> like Autotrader.com and Kelly Blue Book.
723.12 -> You look to see all those
logos across the bottom.
725.1 -> There's 20 more that you
probably have never heard of
727.17 -> unless you're in the automotive industry.
729.39 -> And that's where we are.
730.62 -> We are on a mission to transform
the way the world buys,
734.04 -> sells, owns and uses vehicles.
738.27 -> If you've really kind of paid
attention to what's going on
740.25 -> in the industry in the last five years,
743.07 -> we're seeing a whole new model.
744.6 -> Not the classic model of
going out and buying a car
747.27 -> at your local dealership,
748.95 -> everything you see from
ride share to fleet rental,
753.84 -> it's completely changing the dynamics
755.73 -> of the automotive industry.
757.26 -> And Cox Automotive is
there leading the charge
760.5 -> in how the industry operates,
763.77 -> how people access transportation,
whether it's clean energy,
769.2 -> and green vehicles or how
they use at least vehicles,
775.41 -> not a month by a time,
but maybe hour at a time.
779.88 -> We're looking at all the new ways
781.41 -> that the automotive industry
is starting to think about
784.59 -> how we utilize vehicles.
788.43 -> In doing that, we're actually,
you may not know this,
792.63 -> the Cox organization
is over 120 years old,
795.87 -> originally started in
the newspaper industry
799.17 -> and we've evolved through
many different parts.
800.97 -> You see Cox out here,
that's Cox Communication,
804.33 -> one of our sister organizations.
806.31 -> And we're in many other parts
of the industries as well.
809.91 -> Cox Automotive has come
together probably over more
812.73 -> of the last 50 years.
814.62 -> And we are really an
aggregation of many teams
818.43 -> all over the United States,
820.98 -> all over North America, South America,
825.24 -> all over the world, internationally.
828.39 -> And so that has come
together over so many years.
831.09 -> Remember, you know, everyone
knows Kelly Blue Book.
833.22 -> Remember it's the actual book,
835.5 -> how long has that been around? That's us.
837.96 -> And we've evolved that into
KBB.com, instant cash offer,
841.32 -> these things you see today.
843.69 -> Behind the scenes,
845.25 -> that's over 500 software engineering teams
848.64 -> that have come together
849.96 -> to start collaborating more as
one engineering organization.
852.807 -> And that's a journey that we have been on
855.623 -> and we've been a journey
to the cloud into AWS
859.83 -> for approximately the last seven years.
861.9 -> And I've had the pleasure of
being a part of that exercise
865.8 -> of bringing disparate
engineering teams together
869.67 -> to operate more as one
engineering organization
872.49 -> to fulfill that vision and
mission of Cox Automotive.
876.48 -> In doing that, we brought a
lot of software along the way
880.89 -> to operate more as one
engineering ecosystem,
884.73 -> modernizing and making more
cloud native applications,
887.61 -> moving from on-premise
data center environments,
890.82 -> mainframe systems, up into
systems now based in Amazon S3
895.17 -> Lambda using DynamoDB, we
have an internal initiative
899.61 -> called Serverless First.
902.04 -> And really driving the
change the way we build
904.08 -> an architect software,
905.85 -> but not just which technologies we use.
909.72 -> About four years ago we
made a big shift in focus
913.32 -> and adopted the AWS
well-architected framework
916.89 -> as really sort of our guidestone
for how do we think about
919.65 -> what does good software look like?
922.05 -> The pillars of operational excellence,
923.88 -> security, reliability, performance, cost,
927.06 -> and now sustainability,
929.04 -> really shaped the way we
look at what does a good,
932.82 -> well-architected piece
of software look like?
935.55 -> And we've used that as a benchmark
938.82 -> for all of these different teams,
940.23 -> all these different software systems.
942.27 -> And now deploying all of that software
945.12 -> in over 1300 AWS accounts,
948.45 -> that's really allowed us to
come from sort of all walks
952.05 -> of engineering life, all
different engineering cultures,
955.62 -> and start to operate more like
one engineering organization.
959.67 -> That's been our journey.
960.51 -> That's been our story.
962.22 -> And I've had the pleasure
to be a part of that, so.
966.96 -> Data is a huge part of
what makes that mission
970.59 -> and that journey a possible reality.
974.22 -> We got some numbers up on
here and these are pulled
976.5 -> from Storage Lens like
Peter was talking about.
980.76 -> I have the line of sight
over those 1300 AWS accounts
988.02 -> to look at our data real estate at scale.
992.19 -> We store close to 20 petabytes in S3
997.05 -> across those 1300 accounts,
999.51 -> that is across those accounts.
1001.16 -> That's not 20 petabytes in one bucket,
1003.68 -> that's not 20 petabytes in one account.
1006.26 -> That's 20 petabytes spread
across 1300 AWS accounts,
1010.43 -> spread across over
33,000 different buckets.
1016.16 -> That's 153 billion
1019.67 -> objects.
- It is, G,
1021.5 -> in this slide is a B, so that's billions.
1024.74 -> - So we are one tiny little
drop in that bucket of,
1028.04 -> no pun intended, of 280
trillion objects stored
1031.16 -> in S3.
- We appreciate that, Mark.
1032.605 -> - (chuckles) The average
object size is 142 kilobytes.
1040.76 -> How could I even begin to understand that
1043.37 -> and wrap my head around that
1045.56 -> without the tool sets
that Storage Lens gives me
1047.87 -> to understand how those
500 different Scrum teams
1052.43 -> are building software
and operating on data.
1056.27 -> The ability first to just
understand your state
1059.54 -> is where we started.
1060.8 -> Because 10 years ago,
1062.63 -> I can't say that we did
understand our state,
1065.24 -> moving to the cloud, using
tools like S3 and Storage Lens
1068.48 -> has made it it possible to
just begin to even understand
1071.54 -> what that looks like.
1073.4 -> Now, I know, Peter, you just had a launch
1074.98 -> of some new features with Storage Lens.
1077.115 -> I'll confess I haven't even had a chance
1078.92 -> to read the blog post yet.
1080.33 -> Do you wanna talk a little bit
about some of the new metrics
1082.07 -> that come out?
1082.903 -> - Sure, thanks Mark.
1083.736 -> So 34 new metrics in Storage Lens.
1086.72 -> I think I have that right.
1088.01 -> 34 new metrics.
1089.39 -> And again, the beauty of Storage Lens is
1092.51 -> you can get it up and running
in just a matter of minutes.
1095.3 -> All those dashboards are ready to go
1097.28 -> with those metrics with 14
days of historical data.
1100.01 -> So if you're not using Storage Lens today,
1103.52 -> that would be my one ask
is look at Storage Lens
1106.7 -> and then look at enabling
object versioning.
1109.34 -> Now I'm gonna sit down
and done the commercial,
1112.43 -> but thank you.
- Well,
1114.41 -> that's a good commercial.
1116.18 -> But that is actually a true story for us.
1118.61 -> So getting into talking about,
1121.85 -> how we started talking
about a backup strategy,
1124.19 -> those hundreds of teams
1125.882 -> are doing their local backup strategy.
1128.84 -> They have operational backups,
1130.58 -> they're thinking about
what happens in the event
1132.32 -> of a disaster.
1134.57 -> But we started really
doing some threat modeling
1137.78 -> to understand some of these
more nefarious scenarios
1141.56 -> like ransomware.
1143.03 -> And the starting question was,
1146.87 -> how much data do we even have
1148.16 -> that we have to even be concerned about?
1150.44 -> And it was, someone asked me a question
1152.69 -> and I went scrambling for an answer,
1154.73 -> and a couple hours later
we turned on Storage Lens
1156.89 -> at the organization's level
1158.75 -> and we had starting
dashboards right outta the box
1161.21 -> and helped me start thinking about
1162.86 -> how to organize a backup strategy.
1166.88 -> Very quickly, we got
into that backup strategy
1169.28 -> and we came to a couple of
interesting conclusions.
1172.64 -> Cox Automotive is not
trying to transform the way
1175.28 -> the world backs up data,
that's what Peter does.
1178.16 -> That's what Woon is doing.
1180.74 -> We are trying to transform
the automotive industry.
1182.63 -> It's not our core competency.
1184.67 -> We did actually briefly
have the conversation
1187.4 -> about should we build this ourselves?
1189.74 -> Maybe we could build this,
1190.97 -> but it's not what we're here to do.
1193.37 -> So we like to partner with
organizations like Clumio
1196.97 -> to bring them, waking up every morning
1200.66 -> thinking about that problem,
1201.95 -> about how to maximize the
efficiency of a backup.
1205.52 -> And we partner with them
so that we can focus
1208.49 -> on our core competency.
1210.68 -> The ransomware risks are real.
1212.72 -> You read what's going on in the industry.
1216.38 -> We ran a lot of threat
scenarios internally
1218.48 -> and saw that this was
a real potential risk.
1220.73 -> Our AWS multi-account strategy
running across 1300 accounts
1225.59 -> really helps mitigate that risk.
1228.08 -> But that wasn't enough
1229.13 -> 'cause it only takes one account
1231.05 -> on that critical public facing property
1235.61 -> to put you on the headlines
of the Wall Street Journal.
1242.42 -> It's a very complex problem,
technically speaking.
1245.87 -> We did the simple math.
1246.89 -> We did the simple way
of thinking about it.
1248.42 -> We have 20 petabytes of data
and it costs us this much,
1252.44 -> we need to back it all up.
1254.75 -> So we were gonna spend twice
as much in what we're doing.
1258.26 -> The simple backup
strategies would say yes.
1262.34 -> But we saw with Clumio,
was that they thought
1265.13 -> about that problem a
lot harder than we had.
1268.37 -> And the results we're getting with Clumio
1271.28 -> in the efficiencies, even in the dollar
1273.59 -> in how we are approaching
our backup strategy.
1275.75 -> Now, change that 2X
dynamic, it's not like that.
1283.37 -> Thinking about when we first met Clumio,
1285.5 -> when I first met Woon,
1288.47 -> what we saw was exceptional competency
1291.62 -> in the space of data and thinking about
1294.11 -> how to handle and manage data.
1298.19 -> The things that Woon's gonna
show in the demo in a bit,
1300.65 -> the way that they manage
Clumio partitions,
1304.13 -> the way they think about reorganizing data
1306.53 -> for optimal storage and
cost efficiency in it,
1310.07 -> is something that we had
never even considered.
1314.3 -> And here's the more important
part, that was simple to use.
1318.77 -> Part of the hardest
thing I have to focus on
1320.96 -> when thinking about a backup strategy
1322.55 -> is bringing those 500 teams together
1325.46 -> and getting them to take action.
1328.22 -> Getting them to put a
story in their backlog.
1331.19 -> Getting them to all
collectively focus on and engage
1334.22 -> in a backup strategy.
1335.48 -> Whatever we brought them,
it had to be simple.
1340.16 -> It had to solve our problems
1343.07 -> around needing a truly air-gapped solution
1347.59 -> for that ransomware scenario.
1350 -> That was what kicked
off our focus on data.
1353.51 -> Now, air gap is a word that's
been around our industry
1357.89 -> probably for 30 plus years.
1360.26 -> It's this very traditional
idea of no wires connecting
1364.13 -> these two environments.
1365.54 -> No way in or out.
1366.86 -> Someone compromises your
operational environment,
1372.74 -> there's no way that they can compromise
1375.41 -> your data vault environment.
1377.9 -> And that's important.
1379.07 -> It's an important
component of our strategy
1382.04 -> to mitigate the risk
around a ransomware event.
1385.94 -> But as we looked at solutions,
there's a tough part of that.
1390.38 -> Many of the ways of doing that
1392.33 -> take it outside the four walls of AWS.
1396.35 -> It takes it outside the technology
1399.44 -> that gives us the durability
and the resiliency guarantees,
1403.31 -> traditionally speaking.
1405.98 -> We didn't wanna lose the
benefits of how we store
1408.74 -> and manage data in S3 in our own accounts
1412.25 -> when working with a backup solution.
1414.71 -> And so we very specifically
went searching for a partner
1417.77 -> who was focused in natively developing
1420.74 -> air gap style technologies
working inside the AWS ecosystem.
1426.23 -> And that's what we found
when we looked at Clumio.
1429.5 -> It had to perform when we're
talking about this much backup
1432.44 -> at this much scale,
1433.7 -> these are running operational systems
1436.19 -> with very tight RTO and RPO concerns.
1439.19 -> We need to make sure that we
can get those levels of return
1443.24 -> to operation and recovery point
1445.34 -> without impacting 24/7
running production workloads.
1451.04 -> And finally we needed good partners.
1454.16 -> And this is probably the
most important part to us.
1459.2 -> When we first met the Clumio team,
1461.72 -> it did not do everything
that we needed it to do.
1465.56 -> It didn't work the way
we needed it to work
1468.74 -> to easily roll it out to 500
teams across 1300 counts.
1473.75 -> So we partnered and we worked
closely with their team,
1476.66 -> worked personally with Woon,
1478.07 -> worked with much of the
Clumio team here today.
1480.77 -> And very rapidly, we innovated.
1482.66 -> They understood our problem,
they understood our objectives
1486.92 -> and they shaped their product quickly.
1489.98 -> And I can tell you working
with other partners quickly
1491.99 -> is not always the operative
word in that sentence.
1495.997 -> So that partnership, thank you,
1497.96 -> thank you to the Clumio team.
1499.64 -> It's been excellent.
1502.76 -> So what are the results of all of this?
1504.41 -> We've been working with them
for roughly the past year,
1508.16 -> being able to standardize a
data protection capability
1511.88 -> in over 1300 accounts.
1515 -> I talked about that
well-architected framework
1517.46 -> and the idea of workloads.
1519.26 -> These are all the different
collective software systems
1522.08 -> that we have based on something
we call our core program,
1526.34 -> which is Cox Automotive's Observability
1528.71 -> and Resiliency Engineering Program.
1531.59 -> We've been able to rate
those 600 workloads
1534.26 -> to understand which of
them are most critical
1537.86 -> for these resiliency capabilities.
1539.66 -> The first in the line to be backed up
1542.39 -> to have full ransomware protection.
1545.06 -> And that's made us a
real actionable game plan
1547.79 -> to start rolling out the technology
1550.34 -> to these workloads in the accounts.
1553.25 -> We worked with the Clumio team
1554.57 -> to develop a Terraform provider.
1556.49 -> We terraform all 1300 AWS
landing zones in those accounts.
1561.71 -> And so now our provisioning
process that provisions
1564.71 -> a new account, we do that
on a multiple times a week.
1568.37 -> Every single account
comes fully integrated
1571.64 -> with the Clumio Stack.
1574.28 -> We use a tagging strategy developed with,
1577.73 -> to match the protection
group patterns within Clumio,
1580.7 -> that means it's as simple for a team
1583.43 -> as putting a tag on a
bucket to start backing up.
1587.03 -> No plugging in the tool,
1589.01 -> no integrating, no setup,
1592.34 -> they simply drop a tag
and it starts backing up.
1595.58 -> And that was the kind of simplicity
1597.14 -> that we were looking for.
1600.922 -> The way that Clumio looks at
cost in their credit system
1604.88 -> allows us to consume as we go
and clearly track which teams
1610.25 -> are consuming how much
of the backup capacity.
1613.76 -> It makes us effective in that cost pillar
1616.61 -> in the well-architected framework,
1617.87 -> really to understand how
that money's being spent,
1620.48 -> not spend and waste while
the product sits on a shelf,
1624.77 -> but buy as we go and buy as
we need and that's been great.
1630.14 -> And we've developed a lot
of new interesting features
1632.36 -> with them, especially
around that last one.
1634.37 -> B-Y-O-K, Bring Your Own Key,
1636.95 -> talked about that native
part of a solution
1641.15 -> working within the AWS space,
1643.4 -> working directly with AWS KMS.
1647.09 -> We are able to mutually
share a pair of KMS keys,
1651.68 -> one owned by Cox Automotive,
one owned by Clumio,
1655.49 -> which makes sure the
right level of balance.
1658.1 -> The idea that I can take my critical data
1661.1 -> and put it in the hands of a
trusted partner like Clumio,
1665.18 -> but still have the power to have control
1667.73 -> while it's left my four walls,
1670.7 -> to shut down the access to that data
1672.98 -> by revoking that encryption key.
1674.81 -> Patterns like that developed mutually,
1677.66 -> give us the right
balance where I can trust
1680.6 -> what is an outside entity
1682.79 -> to handle my most sensitive
and critical data.
1687.38 -> So I'm gonna stop there,
1689.78 -> I'm gonna hand it over to Woon
1691.37 -> and he's gonna show us
how the magic happens.
1693.02 -> - Thank you.
1694.33 -> So I start with what's Clumio?
1696.5 -> So Clumio is a data protection
service created on AWS
1700.13 -> to support the workloads and
data sources running on AWS.
1703.52 -> To that end, we wanted to create a service
1705.59 -> that is as scalable and elastic
as all the data services
1709.16 -> and applications that you're running
1710.81 -> on the public cloud on AWS.
1715.01 -> Before I jump in,
1716.42 -> I just wanna share a few observation
1718.94 -> that makes data protection
for S3 so challenging.
1722.09 -> Like Peter mentioned,
1723.32 -> S3 is being used very widely
1725.75 -> because of the flexibility,
simplicity, scalability,
1728.96 -> performance, and the
various point of integration
1731.75 -> makes S3 a natural choice
when it comes to be
1734.75 -> the application primary data store,
1737 -> especially for those modern applications
1739.04 -> that were actually born in the cloud.
1741.92 -> What that means to us is that now,
1744.26 -> we have buckets where
objects are being created
1748.13 -> by hundreds of thousands,
if not by millions.
1750.74 -> They're always created per hour.
1754.01 -> We also see buckets that are huge,
1756.41 -> they contain easily one,
two billions of objects.
1759.53 -> Moreover, we have customers coming to us
1761.63 -> with 10, 20, 30 billions
of objects per buckets.
1765.14 -> We also see a lot of variety
1767.15 -> because of the different
use cases they have data
1770.12 -> with different characteristics
and different requirements.
1772.97 -> Let me just give you one example.
1774.56 -> Let's say you have a bucket
with 1 billion objects.
1776.99 -> I'm pretty sure that within that bucket
1779.06 -> you will have some objects,
1780.53 -> yet you probably don't
care backing them up,
1782.99 -> but you'll have another
subset of the objects
1784.94 -> that you are required
to keep a second copy
1787.43 -> and you wanna retain it for one year.
1789.74 -> Even more, you may have
another subset of objects
1792.47 -> within the same bucket that
you would like to back it up
1796.7 -> and this time, you wanna
retain it for seven years.
1799.88 -> And we wanted to create a
service that is as flexible
1802.46 -> so that you can actually satisfy
1804.35 -> all the compliance requirement
and at the same time,
1807.11 -> optimize the cost by not backing
out what you don't need to.
1812.78 -> While looking at all of these things,
1814.94 -> you see that it is very
hard for a single solution
1818.15 -> to be the magic bullet that
solves all of the problems.
1821.3 -> Another example is
depending on the use cases
1823.88 -> and how you use your buckets,
many times it is not ideal,
1827.42 -> or really practical for enabling
things like object locking.
1831.35 -> And at Clumio, what we try
to do is to create a service,
1834.26 -> another tool in your tool
arsenal, in your toolbox
1837.68 -> along the features that
Peter talked about earlier,
1841.28 -> which you can use and
achieve data protection
1843.53 -> for your data in S3.
1848.362 -> So I'm gonna start with the
high level architecture overview
1851.96 -> and then we'll keep going down the road.
1854.24 -> First of all, on the left
hand side, we have ACME.
1857.48 -> That's a hypothetical customer
1859.67 -> and they're looking to protect
a big bucket in the middle.
1862.76 -> The way that they install Clumio
1864.77 -> is by deploying a cloud formation
template or a Terraform.
1868.34 -> Once that confirmation
template is deployed,
1870.68 -> we go ahead and install an
IAM Role and an EventBridge.
1874.88 -> That's all.
1876.08 -> In your account, the only Clumio footprint
1878.36 -> is literally just that IAM
Role and that EventBridge.
1881.99 -> While that is getting
deployed on Clumio side,
1884.78 -> we actually go ahead and
create a dedicated AWS account
1887.72 -> for the customer, ACME.
1889.34 -> the way that we segregate
data across the customer
1891.74 -> is by actually creating
dedicated AWS accounts
1894.44 -> for every customer that
onboards with Clumio.
1897.53 -> - And Woon, I'll say,
1900.47 -> there was hesitation with
the idea of moving data
1904.91 -> into your platform because
it could be co-mingled
1907.73 -> with other customers.
1910.19 -> You're using the same pattern that we use
1912.44 -> to create segmentation to create isolation
1915.89 -> for Cox Automotive's data separate
1918.17 -> from your other customers.
1920.69 -> And still get all the perimeter security
1923.12 -> that it gives in isolation.
1924.89 -> So this was a game changer
that made our many parts
1928.58 -> of our organization much more comfortable
1931.25 -> with the idea of working with
a third-party for data backup.
1934.85 -> - Yes, a lot of the things that we do
1936.41 -> is really security first.
1939.77 -> And all the access that
happens from the Clumio side
1942.89 -> back to the customer accounts
1944.11 -> is really through that IAM Role
1946.16 -> and we will use that EventBridge
1947.78 -> to capture the changes that
are happening in the bucket
1950.93 -> that we're backing up.
1952.13 -> And we're also integrating
with S3 inventory
1954.5 -> to capture the catalog of the object
1956.84 -> that you have in that bucket.
1959.12 -> And then as you can see,
1960.92 -> and then a lot of the
data processing happens
1963.29 -> through Lambda functions that they all run
1965.36 -> on our side of the account.
1967.34 -> Everything starting from
data movement, verification,
1970.91 -> optimizations and indexing.
1972.89 -> As you can see, a lot of the complexity
1975.02 -> is on the Clumio side
and that's by design.
1977.45 -> We wanna deliver Clumio as a service,
1979.55 -> to that end, we wanna keep
everything that is complicated
1982.16 -> to our end and leave the
customer side of the account.
1985.91 -> Pretty simple.
1986.743 -> Just IAM Role and an EventBridge.
1990.546 -> - And it's worth pointing out.
1991.379 -> That means near zero operational cost.
1993.95 -> Our AWS bill, we see near zero cost
1997.16 -> for you operating within our accounts.
1999.74 -> The compute resides with you
and we buy credits from you.
2004.3 -> - Correct.
2005.133 -> All the patching, troubleshooting,
all the observability
2008.38 -> is actually built right there in our side.
2013.48 -> So let's talk about a
little bit about backup
2015.19 -> and know a little bit
about the flexibility
2017.08 -> that's a what to backup.
2020.38 -> We wanna optimize costs for our customers
2022.51 -> and one of the easiest ways to do it
2024.31 -> is to allow the customers to tell us
2026.23 -> what's important to them.
2028.09 -> What we allow you to do is
to specify a set of filters.
2031.69 -> You tell us what prefix, what timestamps,
2034.99 -> and what storage passes
that you wanna back.
2037.84 -> And this is the way that
you optimize both for cost
2040.75 -> and also for time that it
takes us to do the backup.
2043.45 -> 'Cause again, given a bucket
with a billion objects,
2046.42 -> you may not wanna back up everything.
2049.48 -> Conceptually, it's pretty simple.
2051.22 -> You take a long list of objects
2052.96 -> and you apply Lambda function
2054.67 -> that goes ahead and applies
those filter functions.
2057.16 -> And then it outputs another
smaller, shorter list
2060.61 -> that contains the list of
objects that passes the filter.
2065.17 -> We also validate the
inventory on a daily basis.
2067.99 -> Remember we're actually
doing continuous backup,
2070.06 -> we're integrating with
EventBridge and that's how we know
2072.79 -> when objects are getting
created or deleted.
2077.62 -> But we wanna make sure that
the backup really captures
2081.49 -> hundred percent of all
the objects that you have.
2084.85 -> If for whatever reason were
to drop one message means that
2089.14 -> that one message or that one
object never gets backed up
2092.71 -> because we are doing continuous
backups based on that event.
2096.88 -> What we wanna do is to
actually go ahead and check,
2100.09 -> crosscheck against the S3
inventory on a daily basis
2103.48 -> and then detect if there's
any objects that are missing,
2106.45 -> we'll go ahead and do what's
called a catch up backup
2109.3 -> or fixed backup to make sure
that everything is captured.
2113.92 -> In other words, a backup
that captures 99.99%
2117.28 -> of your objects, it is
still a failed backup.
2119.53 -> You do want your backup
software to capture 100%
2123.31 -> of the objects.
2126.19 -> The way that we achieve this is that
2128.07 -> we developed a technology
that we allow us to compare
2131.35 -> the catalog of the objects
that we have in our backup
2134.11 -> against the S3 inventory.
2135.82 -> We will actually get two lists
2137.59 -> and then we will detect whether
there's a missing object
2140.53 -> and if there is one,
2141.82 -> it will output again, a shorter list,
2143.92 -> which it will be a shorter
list and it will actually do
2146.47 -> the closed backup based
on that shorter list.
2150.16 -> Again, think about it,
2151.18 -> the chances of dropping
that event is very small.
2153.55 -> It will actually be 0.001%,
2156.52 -> but if you have 10 billion objects,
2158.8 -> that actually is a decent number.
2162.73 -> And again, conceptually
things are very simple,
2165.19 -> but if you're processing
tens of billions of objects,
2167.413 -> it's not so easy
2169.06 -> because even the list of objects,
they're terabytes in size.
2172.51 -> And then we need to run
those filters concurrently
2174.73 -> using multiple Lambda
functions at the same time
2177.7 -> so that we can actually perform
backups in a timely manner.
2182.77 -> Moreover, comparing a very
long list of objects is, again,
2187.06 -> very, very challenging.
2188.35 -> If you have two lists with 30
billions objects on each side
2191.89 -> and finding which one is missing,
2193.6 -> it's actually very,
very challenging to do.
2197.05 -> Some of the technologies that
we've built here also allow us
2200.05 -> to support and do backups in buckets
2203.59 -> that do not have versioning
enabled, for instance.
2206.38 -> Because again,
2207.28 -> it may not be the most practical
things to do in some cases.
2212.77 -> Let's talk about ingest.
2214.3 -> So now we know that we collected,
2216.1 -> we have the list of objects
that needs to be backed up.
2219.43 -> So how hard could it be?
2221.11 -> You know that the objects
are organized in prefix
2223.75 -> and all we need to do
is to actually fire up
2226.33 -> a bunch of Lambda functions
2228.07 -> and get the objects from one side
2230.26 -> and move them to the other side.
2232.84 -> Actually, it turns out to
be that it's not that easy.
2235.42 -> If you fire up all those Lambda functions
2237.46 -> and they all start working on
the same prefix or partition,
2240.88 -> you're gonna get a lot of API throttle.
2243.22 -> You can increase the Lambda functions,
2245.11 -> but you're not gonna
make things any faster.
2246.91 -> In fact, all you're gonna
get is just API throttles.
2250.18 -> So what we ended up doing
is that we introduce
2252.97 -> a notion of Clumio partition.
2255.28 -> We looking at the list of objects
that needs to be backed up
2258.52 -> and based on some heuristics,
2260.29 -> we determine the Clumio side of partition
2263.2 -> and we actually schedule
the Lambda functions
2265.21 -> across the different partitions.
2266.92 -> What that allow us to do is now,
2269.02 -> we can actually have
these Lambda functions
2271.36 -> working on different
parts of the key space
2274.323 -> without choking a single prefix.
2277.99 -> But again, at the same time, remember,
2280 -> we're not the only one using that bucket.
2282.19 -> We're a backup.
2283.24 -> There's the primary application
that is currently using
2286.3 -> that bucket and we're continuously
monitoring API throttle.
2290.35 -> And if we see that we're
actually getting API throttles,
2293.41 -> it could be because the primary
application is using it,
2296.5 -> we go ahead and reschedule things.
2298.54 -> What that means is that we
can actually give a little bit
2301.21 -> of a break to that blue
partition out there
2303.28 -> and while we actually do
a little bit more work
2305.56 -> on that yellow partition.
2307.12 -> And we continuously do
that as part of the backup
2310.03 -> so that we can actually
satisfy the backup performance
2313.21 -> and at the same time,
allow the primary workload
2316.18 -> to actually have enough request per second
2319.15 -> to carry out the work.
2321.46 -> - And that's for, I'll say,
2322.66 -> that was one of our early concerns.
2324.64 -> There would be no worse
outage bridge to be on to say,
2329.38 -> our software is down because
the backup is running.
2332.62 -> And to have it intelligently
understand our workload,
2336.64 -> and adjust accordingly,
2338.53 -> removes the need for us to have that,
2340.69 -> to factor in that concern.
2341.89 -> We no longer think about,
2343.591 -> well, I have to schedule these at 3:00 AM
2345.46 -> when my workload is low
2347.26 -> 'cause we're competing for resources.
2349.3 -> - Yeah, but remember,
2350.413 -> S3's being used as the primary data store.
2353.38 -> So you have the application
that is continuously reading
2356.11 -> and writing to that bucket.
2357.76 -> And if the backup comes in
and it takes up all the APIs,
2361.39 -> all the provisioned APIs,
then yes, we can't do that.
2364.84 -> So we allow you to, you know,
2366.31 -> we monitor the API throttle,
we reschedule things,
2368.86 -> we allow the customers to even specify
2370.93 -> what is the maximum allowed API rate
2373.66 -> that we're allowed to consume.
2376.18 -> So let's talk about restore.
2377.65 -> It's very similar to the
backup side of the house.
2380.32 -> Again, we're super focused
in actually optimizing things
2383.83 -> for our customers.
2384.94 -> And the way that you do
that is to actually restore
2387.34 -> only what you need.
2388.69 -> Like if you have 10 billions
of objects backed up in Clumio,
2393.01 -> restoring all that, it does take time
2395.137 -> and it is actually time
consuming and resource consuming.
2399.16 -> The best way to actually avoid that
2400.66 -> is to restore exactly what you need.
2402.97 -> You can restore objects by prefix.
2404.98 -> You can restore objects based
on specific timestamps, tags,
2409.42 -> there's a variety of the
filters that you can specify
2412.33 -> and we will only restore those
objects that are required.
2415.6 -> This allows you to actually
restore the objects
2418.21 -> that you needed right now
2419.71 -> and restore some of the other objects
2421.35 -> at a later point in time.
2424.09 -> The way that we achieve
that is to basically
2426.52 -> working through the metadata.
2427.99 -> For every object that we back up,
2429.76 -> we maintain metadata
entry for every object.
2433.3 -> For every metadata blob,
2434.68 -> we actually maintain the
metadata in parquet files
2437.38 -> and we heavily use AWS Athena
to query that metadata engine.
2442 -> We know that Athena behaves better
2444.1 -> if you have metadata objects
that are somewhat bigger
2446.62 -> in the order of hundreds of megabytes.
2449.53 -> So what we do in the backend
2451.21 -> is that using various Lambda function,
2453.13 -> we're continuously optimizing
that metadata payload.
2455.89 -> We're combining the parquet files,
2457.87 -> we're actually partitioning
them differently
2459.85 -> so that when it is time to query,
2461.35 -> it is actually readily available
2463.42 -> and we can return that list of objects
2465.37 -> to be restored very, very quickly.
2467.59 -> And that's something that we
continuously do in the backend
2470.62 -> for our customers.
2472.63 -> If you see, these are all
challenges that happens
2475.81 -> when you have a lot of objects.
2477.25 -> And things are many times
you don't think about it
2480.88 -> when you start thinking
about implementing your own.
2486.76 -> So let's talk about observability.
2488.92 -> I mentioned that we are a service,
2491.74 -> we wanna own all the complexity ourselves
2494.38 -> because we don't wanna give it to you.
2496.87 -> We wanna be the ones
that detect the failure.
2499.21 -> We wanna be the ones that
troubleshoots every failure.
2501.67 -> We wanna be the one that monitors
2503.59 -> and actually patches everything for you
2505.63 -> so that you don't have to do anything,
2507.91 -> you just use it as a service.
2509.89 -> But at the same time, I just mentioned
2512.26 -> that we use thousands of Lambda functions,
2514.39 -> so different sizes and types.
2516.4 -> And at the same time, we also
have hundreds of customers,
2519.55 -> meaning hundreds of AWS accounts
2521.89 -> and all of these Lambda functions,
2523.6 -> they're running across all of
these hundred AWS accounts.
2527.68 -> And we have backups continuously happening
2530.59 -> all over the place.
2531.82 -> So how do we control all this?
2533.77 -> 'Cause if something fails, how do we know?
2536.35 -> Because remember, we need to be the one
2538.63 -> that detects the failure
and we need to be the one
2541.06 -> that troubleshoots it and fix it
2542.47 -> so that you don't have to.
2544.69 -> For us to achieve this,
2545.77 -> we actually implemented
an internal framework
2547.93 -> that we call a Clumio Workflow Engine.
2550.45 -> So for the sake of time,
2551.89 -> I'm not gonna be able to
spend too much time on it,
2553.93 -> but I promise I'll actually
give a live demo of this
2556.36 -> and I'll show you how
it works in real life.
2561.58 -> I wanna talk a little bit
about cost optimizations
2564.58 -> and also some of the confusions
2565.93 -> and questions that I
get from our customers.
2568.51 -> A lot of the time, I get
questions about, this is great,
2571.78 -> but it looks very expensive.
2573.52 -> And I also get question,
how is this different
2575.47 -> from replication? Is this replication?
2578.95 -> So to answer that question,
let me just use an analogy.
2581.89 -> I'm a big fan of books.
2583.78 -> Let's say you have a
bookshelf full of books
2585.76 -> and the way you organize your books
2587.71 -> is basically based on how
you would actually access
2590.5 -> those books every day.
2592.18 -> Now let's say we need
to create a second copy
2595.06 -> or create a backup of those
books that you have at home.
2598.06 -> One way to do that is to actually buy
2600.7 -> an exact same looking bookshelf
and replicate the format
2604.06 -> to the second bookshelf.
2605.59 -> Sure, you'll end up having
two copies and two books.
2608.65 -> However, if what you're looking is backup,
2611.08 -> I'll argue that is not the best way
2612.79 -> to actually achieve backup.
2614.41 -> If you wanna back it up and
what you truly want is backup,
2617.38 -> the best way and the most
efficient way to do it
2620.11 -> is to buy a box, stack all of those books
2622.78 -> and put that box away in the storage.
2625.78 -> That is a more efficient way
to actually to do backups.
2630.79 -> Just like replication is not, you know,
2633.46 -> just like backup cannot be a replication.
2636.82 -> Replication is not an ideal
backup solution either.
2639.79 -> So it is again, the right
tool for the right problem.
2646.72 -> So what I'm gonna do is
I'm gonna very quickly
2648.73 -> go over three announcements
2650.32 -> and then we'll actually switch
it over to the live demo.
2654.01 -> So the first one is about 15-minute RPO.
2657.94 -> I'm actually super happy
to announce that we fully,
2660.88 -> with the help with the S3 folks,
2662.74 -> we finished fully integrating
with the EventBridge
2666.13 -> that allow us to support 15-minute RPO.
2669.79 -> What we do is that we actually performed
2672.49 -> 15-minute micro backups.
2674.29 -> So every 15 minutes we will actually go
2676.57 -> and capture those objects
and we'll back it up.
2679.24 -> And then just like I told
you before, every day,
2682.42 -> once we receive the daily S3 inventory,
2684.88 -> we will actually cross examine
all of the micro backups
2687.97 -> that we took in the last 24 hours
2690.01 -> and we will actually compare
it with the S3 inventory.
2693.04 -> And if there's anything
missing, we'll at that point,
2695.89 -> do what we call the close
backup and fix everything up
2699.01 -> so that you do have the guarantee
2701.05 -> that we capture all of the objects.
2703.9 -> What this means to you
is that it means that
2706.39 -> in the worst case,
2707.32 -> you're losing up to 15-minute
worth of data in your bucket
2710.29 -> once this is enabled.
2714.01 -> Next, with a lot of the data optimization
2717.04 -> and the partitioning, and the scheduling,
2719.65 -> we can now support up to 30
billion objects per bucket.
2724.57 -> What this means to you is that
if you have a large bucket
2727.6 -> and you're struggling with data
protection, come talk to us.
2732.85 -> Okay, this last one is something
2734.65 -> that is actually my favorite one.
2736.87 -> We're talking about instant access.
2739.15 -> Clumio, we are actually
very, very motivated
2741.85 -> to actually optimize cost and
performance for our customers.
2745.66 -> Now if you wanna restore a
billion objects, what do you do?
2749.8 -> The first thing that you
could do is apply filter
2752.23 -> and reduce that object comes down.
2754.27 -> That's one way to do that.
2755.59 -> But at the end of the filter,
2756.7 -> if you're left still with
hundreds of minutes of object,
2759.61 -> restoring that, it will
actually take time.
2763.72 -> It will take time and resources.
2765.34 -> And what we want you to do
is for some specific cases
2768.1 -> such as DR testing, backup
testing, or true emergency,
2772.09 -> we wanted to make the
data readily available
2774.79 -> and also at a fraction of the cost.
2777.82 -> The way we do that is by a
feature called instant access
2780.61 -> where we expose the data
2782.17 -> that we store in the backup directly.
2785.29 -> So we exposed an S3 endpoint
that is S3 compatible,
2789.4 -> that you have the backup data there.
2792.01 -> So for example, we can take a backup
2795.07 -> or a specific prefix back,
let's say six hours ago,
2799.06 -> and we will return back
to you an S3 endpoint.
2802 -> If you actually connect
to that S3 endpoint,
2804.22 -> it will contain all the objects
in that bucket of prefix
2807.64 -> as of six hours ago.
2810.97 -> If you want it to be
as of three hours ago,
2813.52 -> we can repeat the same thing.
2815.53 -> We can actually take you
back to any point in time
2818.56 -> and we can actually share an S3 endpoint
2821.23 -> that contains all the data at
that specific point in time.
2825.49 -> And we can do all this
at a fraction of the cost
2828.85 -> and nearly instantaneously.
2831.22 -> Now, if you're doing things
like DR testing, backup testing,
2834.79 -> or you're in a true
emergency, this is something
2837.61 -> that could really be a
life changer for you.
2841.66 -> All right, we'll do the demo.
2843.67 -> By the way, it is a live demo.
2848.56 -> All right, so what I'm
gonna do is, first of all,
2851.92 -> I'm gonna log into my test cluster.
2858.16 -> Got all my passwords there.
2865.21 -> So this is kind of the home
screen that you created on
2869.26 -> once you logged in.
2870.19 -> We have the ver-
2871.023 -> oops, what happened. (chuckles)
2873.16 -> We have the various dashboards and stuff,
2874.81 -> but again, for the sake
of this discussion,
2876.43 -> since we're talking about data protection
2878.92 -> was quickly skip over.
2880.66 -> But this is kinda gives you a dashboard,
2882.82 -> a visibility into what's
happening on your environment
2885.4 -> in terms of what's protected,
2887.95 -> how much is costing
you and so on so forth.
2891.58 -> One of the first things
that you will have to do
2893.79 -> is to actually register your environment.
2896.173 -> Like I mentioned, you can
actually do it through
2898.133 -> a cloud formation
template or via Terraform.
2901.66 -> Once you register that environment,
2903.43 -> essentially you specify the
AWS account and a region,
2906.82 -> and an optional name
and you will actually,
2908.92 -> the whole thing will take
you no more than 15 minutes
2911.47 -> to get up and running
2913.33 -> 'cause all we are doing
again is creating that role
2915.4 -> and that EventBridge on your account.
2918.22 -> Once that's done, typically what you do
2920.883 -> is that you create a policy, you name it,
2924.49 -> and essentially, let's first name it.
2928.985 -> And then you can enable
the various data sources
2934.69 -> that we support.
2935.74 -> In the case of S3, we support two tier,
2938.62 -> standard and frozen with
different pricing points
2941.83 -> and different retention
and access performance.
2946.21 -> And then you can actually
set the retention.
2948.73 -> For the sake of demo we'll skip that
2950.44 -> and we will use some of the
policies that we already have.
2955.09 -> Next, if you wanna
protect your S3 buckets,
2959.65 -> what you'll end up doing,
2961.96 -> the first thing that you'll end up doing
2963.4 -> is to actually create what's
called the protection group.
2966.7 -> A protection group is really
a combination of buckets
2969.7 -> along with the filters.
2971.26 -> This is how you're telling
us what's important to you.
2974.47 -> So the way we do that, let's
just name the protection group,
2978.01 -> let's call it reinvent.
2979.42 -> And then there are multiple ways
2980.44 -> that you can actually add a
bucket, you can add it by tag.
2983.59 -> Let's just do it by tag.
2988.3 -> So then what happens is that
2989.8 -> it automatically searches
your environment.
2991.93 -> It automatically adds all of the buckets
2994.24 -> that contains the tag reinvent demo.
2997.72 -> And then it is not only just one time,
2999.7 -> if after the protection group is created,
3001.68 -> you create another bucket with such tag,
3003.36 -> they get automatically added.
3007.26 -> You can also add these buckets manually.
3009.78 -> Let's say I just picked these two.
3014.13 -> From here, now that I select the buckets,
3016.5 -> I need to apply the filters.
3018.12 -> The way that you apply the filters,
3019.8 -> you can actually tell us
whether you wanna protect
3021.75 -> all the storage classes or
you wanna leave some out.
3025.2 -> Maybe you're not interested in protecting
3027.6 -> some of the One Zone-IA.
3031.5 -> You can actually tell us
whether you wanna back up
3033.54 -> all the versions of the object
or just the latest version.
3037.17 -> And you can also tell us which prefixes
3039.33 -> are interesting for you.
3040.95 -> Let's say protect and then
within protect we can exclude,
3046.26 -> let's say the prefix junk
3049.38 -> and we can also back up
the prefix important.
3057.18 -> Next, now that we specify
the bucket and the filter,
3060.15 -> you have defined the what?
3062.1 -> Now we go ahead and select the policy
3064.143 -> that you wanna apply to.
3065.91 -> Whether you wanna back it up every day,
3067.65 -> we can for seven year.
3069.15 -> Now you're configuring the frequency
3071.1 -> under the retention period.
3073.26 -> So let me cancel that for the sake of demo
3075.15 -> and then we'll just use
some of the protection group
3077.67 -> that I created before the demo.
3079.41 -> I have few protections
group that I created,
3081.57 -> some of them in medium size,
which is about 6 billion,
3084.48 -> some of them that are somewhat large
3086.22 -> in the order of 30 billion objects.
3088.26 -> And then we're gonna concentrate
3089.88 -> in these two protection groups.
3093.54 -> These two protection groups
contains the two same buckets.
3096.3 -> They contain the
reinvent2022clumio1 and clumio2
3100.17 -> both added by tag.
3103.23 -> Over here, this protection
group, the seven year one,
3106.44 -> again, contains the
exact same two buckets.
3109.26 -> And you may ask the question, why?
3111.93 -> Why do we have two protection groups
3113.82 -> for the same two buckets?
3116.58 -> The reason is that they're
actually protecting
3119.7 -> different prefixes within those buckets.
3123.15 -> This one is actually
protecting everything,
3125.16 -> the prefix year one, except
what's under year one/temp.
3130.68 -> And with that, we're actually
applying a yearly retention
3134.19 -> because that is my compliance,
3135.72 -> that is my retention requirement.
3138.93 -> Now if I actually go back and
I look at the seven year one,
3143.82 -> again, it is exact same bucket.
3145.98 -> However, they are the
two different prefixes.
3148.53 -> This time I'm actually
protecting the seven year
3151.47 -> except the temp.
3155.22 -> And this time I'm actually
retaining it for seven year,
3159.18 -> unlike the previous one that
I was actually retaining
3161.46 -> just for one year.
3163.62 -> So that's on the backup side of the house.
3165.93 -> Let's just explore the
different options that we have
3169.2 -> for restored process for restores.
3173.04 -> The simple one is to
restore at a single prefix
3176.52 -> or entire buckets.
3177.75 -> You come in and you pick the
buckets that you're interested
3182.01 -> both of them or just one of them.
3184.41 -> And then you apply filters,
3186.12 -> you tell us exactly the
point in time to the second.
3190.98 -> And then moreover, you
can even apply filters.
3193.98 -> And if you don't, it happens
to be the entire bucket.
3197.04 -> But you can actually apply
filters and we can do things
3200.55 -> like demo and we can even
apply filter based on size
3206.16 -> and then do a preview.
3208.83 -> So what we do, we now
query the metadata engine
3212.1 -> and you will return a
preview of the object
3214.2 -> that gets restored before
we actually execute it.
3218.82 -> Once you click on summary,
3220.32 -> then now you're telling
us where to restore it to.
3224.07 -> You're telling us
whether you wanna restore
3225.66 -> all the versions.
3228.24 -> It tells you about the protection group,
3229.8 -> the bucket that we selected, the filters.
3232.38 -> We can actually restore it back
3233.85 -> to the original source account,
3235.35 -> but we can actually restore it
3236.58 -> to any other registered source account.
3239.37 -> It doesn't have to be in the same account.
3242.01 -> You tell us to which
bucket to restore it to.
3244.92 -> And you can also tell us
what is the storage class
3248.04 -> that you wanna use when
we're restoring it back.
3251.01 -> From there, you can also
tell us whether you wanna add
3253.77 -> some prefixes in front of this object
3256.41 -> and we can even add
some of the object tags.
3261.3 -> Another option is to actually
restore a single object.
3264.54 -> It is pretty much the same.
3265.8 -> You specify the time
and different filters,
3268.56 -> but the difference is
that you get to specify
3271.92 -> the specific version in
the chain of the versions
3274.68 -> for the object.
3276.87 -> Lastly, let me actually
quickly show a demo,
3280.02 -> the instant access demo.
3283.17 -> So let's just do an instant access.
3285.63 -> Let's do reinventLive.
3290.25 -> So what I'm doing here is
that I'm actually creating
3293.43 -> an endpoint that contains
the list of objects
3296.22 -> as of that backup on
the 30th that I clicked
3299.31 -> as of 3:00 PM or 5.
3301.71 -> And if I click the endpoint,
3303.78 -> we will actually go to the instant access
3306 -> and we will see the reinventLive.
3308.52 -> That being prepared as we speak.
3312.42 -> Let me just quickly switch it over, 0%.
3316.2 -> Oh, it's actually done.
3318.24 -> Okay, that was pretty quick.
3319.59 -> So it's done.
3320.82 -> So this is the live mount point,
3324.84 -> the access point that we have created.
3326.94 -> And then the way that we do it,
3328.08 -> we just copy that URL
3329.237 -> 'cause again, that URL is the S3 point,
3332 -> S3 endpoint that contains the
objects at that point in time.
3335.97 -> So let me just quickly switch it over.
3339.36 -> I have a quick cheat sheet out here
3342.45 -> to help myself doing the demo.
3344.52 -> I'm just exploring a couple
of environment variables.
3348.57 -> So this will be the
endpoint and then the URL
3354.3 -> that I need to copy is reinventLive.
3357.36 -> Copy, copy, paste that.
3359.79 -> And that's it.
3361.92 -> Now good to go.
3363.66 -> And this is all the AWS CLI
tool that we all love and know.
3371.19 -> So now I'm able to go ahead and
do things like list objects.
3378.39 -> So as you can see, we can
actually list the objects.
3381.75 -> And this is basically no
different than you listing
3384.51 -> the original bucket,
3385.38 -> except that we are
actually listing the bucket
3387.699 -> as of 30th at 3:00 PM.
3393.3 -> We can do things like head object.
3407.16 -> Let's do that...
3411.63 -> That's the head object.
3413.46 -> We can actually do head object
with the different objects.
3421.62 -> That will work.
3423.158 -> As you can see, the ETag,
3424.5 -> will actually go ahead
and match this 134c.
3429.33 -> We can actually even do get objects.
3437.215 -> And let me just do this one out here.
3441.69 -> And just to prove you, I have nothing.
3450.09 -> So that will be the get object,
3453 -> if you do ls, we do see
that object out here
3456.243 -> that we downloaded and...
3466.119 -> And yeah, that's kinda the picture
3467.76 -> that I took out of our slides.
3469.2 -> But that's the object that
I put it in S3 buckets
3471.12 -> and it showed up.
3473.64 -> That's about the instant access.
3475.83 -> It gives you point in
time, immediate access
3478.74 -> to the data in the bucket.
3480.06 -> You can take it to 3:00 PM, 4:00 PM,
3482.43 -> we will actually give you
that S3 compatible endpoint,
3485.04 -> nearly instantaneously and
a fraction of the cost.
3489.36 -> Lastly, like I promise,
3490.71 -> let me just do a quick demo
about the whole observability
3493.86 -> that I talked about a little earlier.
3497.49 -> So this is truly live.
3499.8 -> So what I'm gonna do is what
you guys see as a customer
3503.34 -> is this UI, you don't see the backend,
3506.4 -> but what we see in the backend is again,
3508.8 -> the whole workflow engine that
we manage day in, day out.
3512.85 -> So let me copy out this task idea here,
3515.67 -> and then it is an internal
debugger that we implemented.
3521.46 -> I have to go to task.
3524.1 -> And then we'll have the different task IDs
3527.25 -> and let me just paste this one out.
3537.6 -> So this one allow us to
see exactly what's going on
3543.03 -> in every single backup.
3544.86 -> We know what's happening.
3546.39 -> We're getting what we call
the container information.
3549.48 -> We're setting the progresses,
3550.89 -> we're waiting for bucket configurations.
3553.35 -> We're actually doing
some of the CDC queries.
3555.75 -> I mean some of the data capture,
3557.58 -> change data captures that we talked about.
3559.74 -> This is a if step.
3561.03 -> So it's actually a little
bit of a programming language
3563.16 -> that I actually put together.
3564.48 -> If this step is successful, it
will take the black arrow out
3567.42 -> and you execute the next step.
3569.04 -> If these steps fails, this will turn red
3571.44 -> and it will actually take the red arrow.
3573.51 -> And if the user cancels at that point,
3575.79 -> we will actually take
the purple arrow out.
3579.3 -> This is an if statement
where you have then and else,
3584.28 -> and this would actually be
something like a subroutine call.
3588.48 -> I can actually click on
it and I can go deeper
3591.45 -> as to see what's happening
in that subroutine.
3593.97 -> But if anything fails,
this is exactly how we know
3596.97 -> where you fail, why it
fail, what was the input
3600 -> and what was the step that was executed
3601.71 -> before and what did it output.
3604.14 -> And with every failure, this
is how we can troubleshoot it
3607.29 -> within minutes and not within hours.
3609.66 -> And when something fails, we
get to update that one step.
3614.04 -> That's much easier.
3615.21 -> That's why we can troubleshoot it
3617.04 -> in matter of hours and not days.
3619.41 -> And we can actually do all this for you
3621.66 -> and you don't have to do it.
3623.19 -> This is a live example.
3624.51 -> So let me actually give
you one of a bad example
3628.86 -> that I had collected.
3630.42 -> In this case, I actually
introduced an error
3634.38 -> intentionally and I'll
show you what it does.
3638.88 -> So as you can see, yellow...
3643.98 -> As you can see, this is red.
3645.81 -> We know that it has failed.
3647.43 -> Where is it fail?
3648.45 -> We know that all of these steps
3649.8 -> are actually executed correctly,
3651.75 -> but it ended up failing right here.
3654.12 -> You'll see the red border out there.
3656.67 -> So then because this has failed,
3658.44 -> we're not executing the steps below it,
3660.45 -> but instead, we're actually
taking the red arrow out.
3663.78 -> And then what we are doing is such as,
3666.27 -> we're generating an event
that something has failed.
3668.61 -> That's what the X is for.
3670.02 -> We're terminating the task as a failure.
3672.51 -> We're actually sending a
notification to our support team
3675.15 -> that they know that something has failed
3677.1 -> and we have all of the
steps that has failed.
3679.92 -> And then now, because
this is the subroutine
3683.58 -> that has failed with the Lambda functions,
3685.62 -> we can actually click on it
and look at it even deeper.
3689.64 -> This is the function that what it does,
3691.98 -> remember we're processing
large number of manifest file.
3696.03 -> So we apply the filter
3697.35 -> using multiple Lambda
functions concurrently.
3700.35 -> So we have what we call
the fork and joint step.
3707.1 -> So at this point,
3708.09 -> we compute how many
Lambda functions to use.
3710.76 -> In this particular example,
3712.23 -> you're using 32 Lambda's concurrently
3714.99 -> to actually filter different
areas of the manifest file
3718.56 -> to apply the filter and
know what to back up
3720.96 -> and not to back up.
3722.73 -> So what happens is that I
introduce an error on this step,
3725.58 -> that's why this turned red.
3727.2 -> But then when all these
stored in two Lambda's
3728.997 -> are actually done
processing different areas
3731.4 -> of the manifest file,
3732.84 -> they all join out here and then
they will actually continue.
3736.38 -> But it so happens that this step has fail.
3738.75 -> That's why we're actually
taking the purple arrow out,
3741.3 -> that which that ends up failing.
3743.34 -> And then we bubble that
to the parent workflow,
3745.8 -> which then takes the exit path
to notify our support team.
3749.31 -> But looking at this,
3750.267 -> now we can actually debug
things within minutes
3753.24 -> and we can now truly be a service
3755.7 -> and take all that complexity
away from you and we own it.
3760.5 -> - And what's unprecedented is
3763.17 -> they're taking on that
complexity through observability,
3767.37 -> but also transparency to us as a customer.
3770.696 -> And that makes a big difference
3771.75 -> in it not being a black box to us,
3774.75 -> so we can understand
how you're approaching
3776.4 -> the problem too.
3780.57 -> - That's it.
3781.53 -> All right, just on time.
3784.26 -> Or maybe a little over.
3786.059 -> (speakers chuckling)
3788.46 -> - Thank you all for coming.
3789.69 -> We really appreciate it.
3790.59 -> And we'll be up here if
you have any questions.
3792.392 -> (attendees applauding)
Source: https://www.youtube.com/watch?v=-dr71gKGZGc