AWS re:Invent 2022 - 3 innovations that redefine data protection for Amazon S3 (PRT315)

AWS re:Invent 2022 - 3 innovations that redefine data protection for Amazon S3 (PRT315)


AWS re:Invent 2022 - 3 innovations that redefine data protection for Amazon S3 (PRT315)

You rely on Amazon S3 to power your cloud-native applications, data lakes, analytics, and AI. While Amazon S3 is extremely durable, the resilience of the data itself is your responsibility. So how do you secure billions of objects from an ever-expanding list of potential threats? And how do you recover when your data is compromised? In this session, technologists from Amazon, Cox Automotive, and Clumio dive deep into Amazon S3 data protection and demonstrate how to swiftly recover petabytes of data in the event of an incident. Learn how to implement continuous immutable backup that is air-gapped and instantly recoverable. This presentation is brought to you by Clumio, an AWS Partner.

Learn more about AWS re:Invent at https://go.aws/3ikK4dD.

Subscribe:
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4

ABOUT AWS
Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.

AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

#reInvent2022 #AWSreInvent2022 #AWSEvents


Content

1.53 -> - Welcome to "Three Innovations
3.03 -> that Redefine Data Protection for Amazon S3."
6.42 -> My name is Woon Jung,
7.998 -> I'm the Co-Founder and CTO of Clumio.
9.75 -> Today with me, I have Peter Imming.
12 -> - Hello.
12.833 -> - Principal Product Manager from Amazon S3.
15.42 -> And Mark Huber, Senior Director of Engineering
17.73 -> at Cox Automotive.
19.4 -> (attendees applauding)
20.25 -> First, I'll have Peter join me to talk about
22.38 -> how S3 is being used and why you should protect the data
25.38 -> in your S3 bucket.
26.67 -> And then after that I'll have Mark join me
28.86 -> to talk about their journey in their AWS
31.38 -> and the partnership with Clumio.
33.51 -> Last, I'll come in and talk about kinda the details
36.03 -> about the how things are implemented in the backend.
38.73 -> And I'll give you a live demo
40.08 -> that I'm pretty sure you all like it.
42.18 -> Peter.
43.77 -> - Thanks, Woon.
44.79 -> So if you thought this was data protection for security
50.37 -> or IAM, or encryption, this is not the session for you.
54.54 -> We're gonna be talking a lot about data protection
56.94 -> in terms of backup.
59.82 -> In terms of replication and high availability,
65.13 -> durability, availability.
67.26 -> So if you're in the wrong session,
69.18 -> definitely now's the time to head on out.
72.33 -> But again,
73.163 -> I think we wanna say thank you for coming to the session.
74.88 -> We know you had a lot of choices out there
76.38 -> for different sessions to attend,
77.55 -> including the bar sessions out there right now.
80.79 -> So thank you all.
81.72 -> I think from our perspective for coming here
84.12 -> and listening to us today.
85.74 -> We'll try and keep it as interactive as possible.
87.39 -> If you have questions,
88.29 -> I think we're all comfortable taking questions
90.66 -> as you might have them.
92.43 -> So we'll go ahead and get started here,
94.53 -> and talk about data protection for S3.
99.03 -> Now Amazon S3,
101.01 -> we really have come a long way in 16 years with Amazon S3.
105.24 -> We're now storing over 280 trillion objects in S3,
111.39 -> over a hundred million transactions per second.
114.66 -> And really, what we're seeing S3 kind of transform
117.39 -> into is really now the production,
120.72 -> the primary production storage for customers
123.45 -> to create new data in.
125.01 -> It's no longer just a place to store data
127.38 -> or backup data too.
129.09 -> In my time at AWS, I've been there for three years on S3.
133.53 -> That's probably the biggest transformation I've seen is that
136.38 -> we now have the primary,
139.26 -> the bulk of the data coming into S3
141.03 -> being natively created inside of AWS.
144.27 -> And this is really a change over the last couple of years
147.78 -> that we've just seen accelerate.
149.61 -> And that comes from applications that you're running,
152.34 -> such as data lakes, machine learning.
154.92 -> It could be EMR, it could be our services, it could be
159.966 -> Datadog, Databricks, it could be Snowflake DB.
163.32 -> You're starting to see now S3 and object storage really
166.29 -> become the primary defacto class of storage
169.29 -> that you're creating new content with it.
171.9 -> And that includes cloud native applications
173.97 -> that you may be running in a container,
175.68 -> that could be classic virtual machines
177.48 -> that are now writing out to object storage directly.
180.54 -> It could be new databases that are gonna be shipping
182.67 -> from traditional database vendors
184.5 -> that are gonna now run natively on object storage
187.05 -> for the first time in their long history.
189.6 -> And we're also seeing obviously a tremendous growth
192 -> in log files, machine generated log files,
195.72 -> machine generated data.
197.07 -> Everything from IP cameras, everything from factory sensors
201.03 -> that are all generating machine logs
202.92 -> and then rapidly sending those to S3
204.99 -> as the production storage.
207.36 -> And we take that very seriously at Amazon.
210.15 -> So when we've got 280 trillion objects to store,
214.56 -> that takes a lot of effort to store durably,
218.4 -> to store with availability, to store with redundancy.
222.51 -> And we take that very seriously.
223.83 -> And if you've ever had to manage storage at scale,
227.37 -> you understand that that takes a lot of work
229.5 -> to manage all three of those for virtually,
233.4 -> essentially unlimited scale.
235.95 -> So we wake up every day on S3 and go look after
240.57 -> the availability, durability,
243.09 -> and the resiliency of that data,
244.74 -> of those 280 trillion objects so that you don't have to.
248.22 -> You don't have to worry that your data is durably stored.
250.83 -> You can go take that time now back and go innovate
253.92 -> on top of S3 rather than having to craft durable storage
258.09 -> that's highly available.
259.8 -> When we look at S3, there are though
264.66 -> a different type of data protection question
267.21 -> that we get today, and that is a delete request.
270.51 -> We don't know customer intent
272.94 -> when you ask us to delete an object.
276.06 -> We have to look at that request as unambiguous.
278.61 -> We don't know if it's accidental.
280.89 -> We don't know if it's intentional.
282.75 -> We don't know if it's perhaps even malicious.
285.54 -> So when we have that type of application data,
287.88 -> whether it's user generated data, media configuration data,
290.79 -> again, we talked about the data lakes,
292.41 -> sensitive information, all of this different data
295.38 -> has different value to your organization.
297.9 -> And what we're looking at here is
300 -> different layers of data protection
302.01 -> on top of different types of data.
304.29 -> And what I'll be talking about before handing it off to Mark
307.23 -> is really kinda looking at the different layers
309.9 -> of data protection that are most appropriate
311.79 -> for the different types of data
313.26 -> that you're storing in S3 today.
315.75 -> If that's compliance data,
317.04 -> that's gonna require a different level of protection
319.26 -> than what you might have for user data
323.25 -> that might be uploaded.
324.27 -> Cat.jpeg, right?
325.41 -> How many of copies of Cat.jpeg do we need to store?
329.82 -> That's a very different type of data to protect
333.12 -> than data that contains payment card information,
336.3 -> HIPAA data, personally identifiable information.
339.66 -> So this is the type of data that we're storing.
342.09 -> This is the type of data that you are storing in S3,
344.58 -> I should say.
345.48 -> And so what we're gonna be talking about today
347.07 -> is kinda crafting the right layers of protection
349.53 -> on top of S3 and how you can set that up
354.57 -> for the right types of data that you're storing.
356.43 -> When we look at the types of risks that you have in S3,
360.18 -> the difference versus traditional storage
363.33 -> that you may have been running for the last 10, 20 years
366.96 -> is that you no longer have to worry about, again,
369.51 -> the durability and the physical access to the storage.
372.27 -> It's now about an API, it's that request to delete data.
377.22 -> Again, we don't know what the intent is of that request,
381.39 -> but we do know, and it's important to S3
384.87 -> that we honor that request.
386.88 -> So whether that's a human error,
388.53 -> whether that's an inadvertent deletion, the stakes happen.
392.01 -> Is it a natural disaster? Is it a fire a flood?
394.5 -> Is it Godzilla coming in
396.09 -> and wrecking some undersea communication cables?
399.12 -> At the end of the day,
400.14 -> these are all things to consider when we're talking about
403.53 -> your data storage in S3.
405.617 -> Again, the deletions though,
407.55 -> those are really what we're gonna kind of focus on today
409.68 -> for the most part.
410.55 -> Software errors, whether it's a human
412.5 -> or a piece of software.
413.58 -> If it's a script, maybe it's a lifecycle policy
416.85 -> that's been misconfigured.
418.38 -> Again, we cannot distinguish
420.33 -> between a correct deletion and an incorrect deletion.
423.33 -> We have to honor that request just as we honor the request
427.29 -> to put data into S3.
429.66 -> So when we look at that, we now have accidental data loss,
432.54 -> we have software errors,
433.95 -> and we also have what we call bad actors.
436.71 -> These could be employees that have, again,
439.56 -> authorization to the data through our access controls,
443.73 -> our IAM policies,
445.65 -> but their intent as a bad actor is to do something malicious
449.16 -> with that data.
449.993 -> Again, we can't differentiate
451.2 -> between their delete request
453.21 -> and a properly authorized delete request
456.03 -> that's not coming from a bad actor.
458.22 -> So we have to look at all of these possibilities
460.95 -> and then offer additional layers of protection inside of S3.
465.42 -> And that's really where this kinda,
466.95 -> it looks like an everlasting gobstopper type approach here,
469.62 -> but your data is at the center.
471.21 -> And the first layer of protection that we always talk about
473.91 -> is, again, your access controls.
475.86 -> S3 by default is secure.
477.96 -> You're not gonna access that data
479.91 -> unless you have been authorized through IAM.
483.24 -> Now, when we look at that though,
485.73 -> if you are properly authorized,
487.44 -> once you've got access to the data, what can you do with it?
490.35 -> An accidental deletion, again,
492.33 -> is no different to S3 than a malicious deletion.
495.15 -> We can't distinguish between the two.
497.55 -> So that's really where something like object versioning
500.01 -> is the beginning of that journey
502.05 -> where you can now set S3 to go ahead
504.69 -> and start preventing accidental deletions
508.32 -> by keeping every version that is created.
511.5 -> Is it an override? We will create a new version.
513.69 -> Is it the deletion? We will then put a deletion marker
516.84 -> as the current version in your S3 version object stack.
522.27 -> We keep track of every single version.
524.82 -> We always essentially append in object versioning
528.51 -> once that's enabled on your bucket.
530.37 -> From there, we then need to talk about malicious deletions.
534.15 -> So we've got accidental deletions covered.
535.92 -> What about malicious deletions?
537.3 -> Malicious deletions, that is the layer that we
540.99 -> then bring to bear.
542.58 -> This is again, an opt-in feature called object lock.
545.16 -> And object lock release becomes sort of a defacto standard
547.74 -> for immutable storage in the cloud.
550.41 -> Object lock can be enabled at the bucket level
552.51 -> or a per object level.
554.4 -> And once you have an object locked placed on an object,
557.91 -> it's a retain until date.
559.65 -> That object cannot be deleted by AWS personnel.
563.22 -> It cannot be deleted by your root account.
566.58 -> So we can now prevent even malicious bad actors
569.79 -> from coming in and performing an intentional delete,
573.72 -> even though they're properly authorized
575.82 -> through IAM policies.
578.58 -> We'll also be talking a little bit about replication.
581.43 -> Now, if you need to prevent,
584.55 -> and essentially if you need to have that data available
588.36 -> for compliance reasons or have that data available
591.54 -> in a different region for resiliency,
594.15 -> that's where S3 replication comes in.
596.55 -> Now, replication is there, it's not going to be a backup.
599.703 -> It's not gonna be essentially managed
601.53 -> or independent from S3.
603.63 -> So when we look at all three of these here,
607.08 -> object versioning, object lock and replication,
610.02 -> these are all opt-in features
611.46 -> that you have to turn on in S3.
614.67 -> They're not necessarily centrally managed.
616.89 -> They are enabled essentially per bucket inside of S3.
621.09 -> Now, to look at all of that together,
622.86 -> Mark's gonna be talking a little bit about
624.54 -> another feature that we launched.
625.89 -> It's one of my favorite things to talk about
627.39 -> called S3 Storage Lens.
629.46 -> It gives you a bird's eye view, a central view
631.89 -> of all of those features and essentially what percentage
634.83 -> of your data is versioned?
637.26 -> What percentage of your data is object locked?
640.14 -> What percentage of your data is replicated?
643.05 -> And Storage Lens is something
644.25 -> that you can actually get up and running by dinner tonight.
648.42 -> It has 14 days of historical data ready to go.
651.09 -> Mark's gonna show you how Cox Automotive uses it,
654.27 -> but it gives you a bird's eye view
656.94 -> of all three of those data protection capabilities.
660.99 -> But what we've talked about so far is all native to S3,
664.62 -> but mistakes again still happen.
667.35 -> None of those are a replacement
668.7 -> for a true traditional backup that is independent
671.34 -> and centrally managed.
672.69 -> And that's really where we talk about solutions
676.11 -> with partners such Clumio.
678.21 -> And that's where we'll be talking about
680.64 -> Cox Automotive's journey with Mark here today
682.92 -> about why they chose Clumio to supplement
686.22 -> those existing data protection features
688.05 -> that they're already using in S3,
689.88 -> but why Cox wanted to cover that specifically
693.12 -> and look at a solution like Clumio
695.73 -> to actually create an independent backup copy of S3, Mark.
701.4 -> - All right, thanks Peter.
703.89 -> All right, my name is Mark Huber.
705.99 -> I am the Director of Engineering enablement
709.53 -> at Cox Automotive.
711.24 -> So Cox Automotive, who are we?
714.18 -> Not maybe a household name that you would know,
717 -> but you probably know some of our public faces
719.7 -> like Autotrader.com and Kelly Blue Book.
723.12 -> You look to see all those logos across the bottom.
725.1 -> There's 20 more that you probably have never heard of
727.17 -> unless you're in the automotive industry.
729.39 -> And that's where we are.
730.62 -> We are on a mission to transform the way the world buys,
734.04 -> sells, owns and uses vehicles.
738.27 -> If you've really kind of paid attention to what's going on
740.25 -> in the industry in the last five years,
743.07 -> we're seeing a whole new model.
744.6 -> Not the classic model of going out and buying a car
747.27 -> at your local dealership,
748.95 -> everything you see from ride share to fleet rental,
753.84 -> it's completely changing the dynamics
755.73 -> of the automotive industry.
757.26 -> And Cox Automotive is there leading the charge
760.5 -> in how the industry operates,
763.77 -> how people access transportation, whether it's clean energy,
769.2 -> and green vehicles or how they use at least vehicles,
775.41 -> not a month by a time, but maybe hour at a time.
779.88 -> We're looking at all the new ways
781.41 -> that the automotive industry is starting to think about
784.59 -> how we utilize vehicles.
788.43 -> In doing that, we're actually, you may not know this,
792.63 -> the Cox organization is over 120 years old,
795.87 -> originally started in the newspaper industry
799.17 -> and we've evolved through many different parts.
800.97 -> You see Cox out here, that's Cox Communication,
804.33 -> one of our sister organizations.
806.31 -> And we're in many other parts of the industries as well.
809.91 -> Cox Automotive has come together probably over more
812.73 -> of the last 50 years.
814.62 -> And we are really an aggregation of many teams
818.43 -> all over the United States,
820.98 -> all over North America, South America,
825.24 -> all over the world, internationally.
828.39 -> And so that has come together over so many years.
831.09 -> Remember, you know, everyone knows Kelly Blue Book.
833.22 -> Remember it's the actual book,
835.5 -> how long has that been around? That's us.
837.96 -> And we've evolved that into KBB.com, instant cash offer,
841.32 -> these things you see today.
843.69 -> Behind the scenes,
845.25 -> that's over 500 software engineering teams
848.64 -> that have come together
849.96 -> to start collaborating more as one engineering organization.
852.807 -> And that's a journey that we have been on
855.623 -> and we've been a journey to the cloud into AWS
859.83 -> for approximately the last seven years.
861.9 -> And I've had the pleasure of being a part of that exercise
865.8 -> of bringing disparate engineering teams together
869.67 -> to operate more as one engineering organization
872.49 -> to fulfill that vision and mission of Cox Automotive.
876.48 -> In doing that, we brought a lot of software along the way
880.89 -> to operate more as one engineering ecosystem,
884.73 -> modernizing and making more cloud native applications,
887.61 -> moving from on-premise data center environments,
890.82 -> mainframe systems, up into systems now based in Amazon S3
895.17 -> Lambda using DynamoDB, we have an internal initiative
899.61 -> called Serverless First.
902.04 -> And really driving the change the way we build
904.08 -> an architect software,
905.85 -> but not just which technologies we use.
909.72 -> About four years ago we made a big shift in focus
913.32 -> and adopted the AWS well-architected framework
916.89 -> as really sort of our guidestone for how do we think about
919.65 -> what does good software look like?
922.05 -> The pillars of operational excellence,
923.88 -> security, reliability, performance, cost,
927.06 -> and now sustainability,
929.04 -> really shaped the way we look at what does a good,
932.82 -> well-architected piece of software look like?
935.55 -> And we've used that as a benchmark
938.82 -> for all of these different teams,
940.23 -> all these different software systems.
942.27 -> And now deploying all of that software
945.12 -> in over 1300 AWS accounts,
948.45 -> that's really allowed us to come from sort of all walks
952.05 -> of engineering life, all different engineering cultures,
955.62 -> and start to operate more like one engineering organization.
959.67 -> That's been our journey.
960.51 -> That's been our story.
962.22 -> And I've had the pleasure to be a part of that, so.
966.96 -> Data is a huge part of what makes that mission
970.59 -> and that journey a possible reality.
974.22 -> We got some numbers up on here and these are pulled
976.5 -> from Storage Lens like Peter was talking about.
980.76 -> I have the line of sight over those 1300 AWS accounts
988.02 -> to look at our data real estate at scale.
992.19 -> We store close to 20 petabytes in S3
997.05 -> across those 1300 accounts,
999.51 -> that is across those accounts.
1001.16 -> That's not 20 petabytes in one bucket,
1003.68 -> that's not 20 petabytes in one account.
1006.26 -> That's 20 petabytes spread across 1300 AWS accounts,
1010.43 -> spread across over 33,000 different buckets.
1016.16 -> That's 153 billion
1019.67 -> objects. - It is, G,
1021.5 -> in this slide is a B, so that's billions.
1024.74 -> - So we are one tiny little drop in that bucket of,
1028.04 -> no pun intended, of 280 trillion objects stored
1031.16 -> in S3. - We appreciate that, Mark.
1032.605 -> - (chuckles) The average object size is 142 kilobytes.
1040.76 -> How could I even begin to understand that
1043.37 -> and wrap my head around that
1045.56 -> without the tool sets that Storage Lens gives me
1047.87 -> to understand how those 500 different Scrum teams
1052.43 -> are building software and operating on data.
1056.27 -> The ability first to just understand your state
1059.54 -> is where we started.
1060.8 -> Because 10 years ago,
1062.63 -> I can't say that we did understand our state,
1065.24 -> moving to the cloud, using tools like S3 and Storage Lens
1068.48 -> has made it it possible to just begin to even understand
1071.54 -> what that looks like.
1073.4 -> Now, I know, Peter, you just had a launch
1074.98 -> of some new features with Storage Lens.
1077.115 -> I'll confess I haven't even had a chance
1078.92 -> to read the blog post yet.
1080.33 -> Do you wanna talk a little bit about some of the new metrics
1082.07 -> that come out?
1082.903 -> - Sure, thanks Mark.
1083.736 -> So 34 new metrics in Storage Lens.
1086.72 -> I think I have that right.
1088.01 -> 34 new metrics.
1089.39 -> And again, the beauty of Storage Lens is
1092.51 -> you can get it up and running in just a matter of minutes.
1095.3 -> All those dashboards are ready to go
1097.28 -> with those metrics with 14 days of historical data.
1100.01 -> So if you're not using Storage Lens today,
1103.52 -> that would be my one ask is look at Storage Lens
1106.7 -> and then look at enabling object versioning.
1109.34 -> Now I'm gonna sit down and done the commercial,
1112.43 -> but thank you. - Well,
1114.41 -> that's a good commercial.
1116.18 -> But that is actually a true story for us.
1118.61 -> So getting into talking about,
1121.85 -> how we started talking about a backup strategy,
1124.19 -> those hundreds of teams
1125.882 -> are doing their local backup strategy.
1128.84 -> They have operational backups,
1130.58 -> they're thinking about what happens in the event
1132.32 -> of a disaster.
1134.57 -> But we started really doing some threat modeling
1137.78 -> to understand some of these more nefarious scenarios
1141.56 -> like ransomware.
1143.03 -> And the starting question was,
1146.87 -> how much data do we even have
1148.16 -> that we have to even be concerned about?
1150.44 -> And it was, someone asked me a question
1152.69 -> and I went scrambling for an answer,
1154.73 -> and a couple hours later we turned on Storage Lens
1156.89 -> at the organization's level
1158.75 -> and we had starting dashboards right outta the box
1161.21 -> and helped me start thinking about
1162.86 -> how to organize a backup strategy.
1166.88 -> Very quickly, we got into that backup strategy
1169.28 -> and we came to a couple of interesting conclusions.
1172.64 -> Cox Automotive is not trying to transform the way
1175.28 -> the world backs up data, that's what Peter does.
1178.16 -> That's what Woon is doing.
1180.74 -> We are trying to transform the automotive industry.
1182.63 -> It's not our core competency.
1184.67 -> We did actually briefly have the conversation
1187.4 -> about should we build this ourselves?
1189.74 -> Maybe we could build this,
1190.97 -> but it's not what we're here to do.
1193.37 -> So we like to partner with organizations like Clumio
1196.97 -> to bring them, waking up every morning
1200.66 -> thinking about that problem,
1201.95 -> about how to maximize the efficiency of a backup.
1205.52 -> And we partner with them so that we can focus
1208.49 -> on our core competency.
1210.68 -> The ransomware risks are real.
1212.72 -> You read what's going on in the industry.
1216.38 -> We ran a lot of threat scenarios internally
1218.48 -> and saw that this was a real potential risk.
1220.73 -> Our AWS multi-account strategy running across 1300 accounts
1225.59 -> really helps mitigate that risk.
1228.08 -> But that wasn't enough
1229.13 -> 'cause it only takes one account
1231.05 -> on that critical public facing property
1235.61 -> to put you on the headlines of the Wall Street Journal.
1242.42 -> It's a very complex problem, technically speaking.
1245.87 -> We did the simple math.
1246.89 -> We did the simple way of thinking about it.
1248.42 -> We have 20 petabytes of data and it costs us this much,
1252.44 -> we need to back it all up.
1254.75 -> So we were gonna spend twice as much in what we're doing.
1258.26 -> The simple backup strategies would say yes.
1262.34 -> But we saw with Clumio, was that they thought
1265.13 -> about that problem a lot harder than we had.
1268.37 -> And the results we're getting with Clumio
1271.28 -> in the efficiencies, even in the dollar
1273.59 -> in how we are approaching our backup strategy.
1275.75 -> Now, change that 2X dynamic, it's not like that.
1283.37 -> Thinking about when we first met Clumio,
1285.5 -> when I first met Woon,
1288.47 -> what we saw was exceptional competency
1291.62 -> in the space of data and thinking about
1294.11 -> how to handle and manage data.
1298.19 -> The things that Woon's gonna show in the demo in a bit,
1300.65 -> the way that they manage Clumio partitions,
1304.13 -> the way they think about reorganizing data
1306.53 -> for optimal storage and cost efficiency in it,
1310.07 -> is something that we had never even considered.
1314.3 -> And here's the more important part, that was simple to use.
1318.77 -> Part of the hardest thing I have to focus on
1320.96 -> when thinking about a backup strategy
1322.55 -> is bringing those 500 teams together
1325.46 -> and getting them to take action.
1328.22 -> Getting them to put a story in their backlog.
1331.19 -> Getting them to all collectively focus on and engage
1334.22 -> in a backup strategy.
1335.48 -> Whatever we brought them, it had to be simple.
1340.16 -> It had to solve our problems
1343.07 -> around needing a truly air-gapped solution
1347.59 -> for that ransomware scenario.
1350 -> That was what kicked off our focus on data.
1353.51 -> Now, air gap is a word that's been around our industry
1357.89 -> probably for 30 plus years.
1360.26 -> It's this very traditional idea of no wires connecting
1364.13 -> these two environments.
1365.54 -> No way in or out.
1366.86 -> Someone compromises your operational environment,
1372.74 -> there's no way that they can compromise
1375.41 -> your data vault environment.
1377.9 -> And that's important.
1379.07 -> It's an important component of our strategy
1382.04 -> to mitigate the risk around a ransomware event.
1385.94 -> But as we looked at solutions, there's a tough part of that.
1390.38 -> Many of the ways of doing that
1392.33 -> take it outside the four walls of AWS.
1396.35 -> It takes it outside the technology
1399.44 -> that gives us the durability and the resiliency guarantees,
1403.31 -> traditionally speaking.
1405.98 -> We didn't wanna lose the benefits of how we store
1408.74 -> and manage data in S3 in our own accounts
1412.25 -> when working with a backup solution.
1414.71 -> And so we very specifically went searching for a partner
1417.77 -> who was focused in natively developing
1420.74 -> air gap style technologies working inside the AWS ecosystem.
1426.23 -> And that's what we found when we looked at Clumio.
1429.5 -> It had to perform when we're talking about this much backup
1432.44 -> at this much scale,
1433.7 -> these are running operational systems
1436.19 -> with very tight RTO and RPO concerns.
1439.19 -> We need to make sure that we can get those levels of return
1443.24 -> to operation and recovery point
1445.34 -> without impacting 24/7 running production workloads.
1451.04 -> And finally we needed good partners.
1454.16 -> And this is probably the most important part to us.
1459.2 -> When we first met the Clumio team,
1461.72 -> it did not do everything that we needed it to do.
1465.56 -> It didn't work the way we needed it to work
1468.74 -> to easily roll it out to 500 teams across 1300 counts.
1473.75 -> So we partnered and we worked closely with their team,
1476.66 -> worked personally with Woon,
1478.07 -> worked with much of the Clumio team here today.
1480.77 -> And very rapidly, we innovated.
1482.66 -> They understood our problem, they understood our objectives
1486.92 -> and they shaped their product quickly.
1489.98 -> And I can tell you working with other partners quickly
1491.99 -> is not always the operative word in that sentence.
1495.997 -> So that partnership, thank you,
1497.96 -> thank you to the Clumio team.
1499.64 -> It's been excellent.
1502.76 -> So what are the results of all of this?
1504.41 -> We've been working with them for roughly the past year,
1508.16 -> being able to standardize a data protection capability
1511.88 -> in over 1300 accounts.
1515 -> I talked about that well-architected framework
1517.46 -> and the idea of workloads.
1519.26 -> These are all the different collective software systems
1522.08 -> that we have based on something we call our core program,
1526.34 -> which is Cox Automotive's Observability
1528.71 -> and Resiliency Engineering Program.
1531.59 -> We've been able to rate those 600 workloads
1534.26 -> to understand which of them are most critical
1537.86 -> for these resiliency capabilities.
1539.66 -> The first in the line to be backed up
1542.39 -> to have full ransomware protection.
1545.06 -> And that's made us a real actionable game plan
1547.79 -> to start rolling out the technology
1550.34 -> to these workloads in the accounts.
1553.25 -> We worked with the Clumio team
1554.57 -> to develop a Terraform provider.
1556.49 -> We terraform all 1300 AWS landing zones in those accounts.
1561.71 -> And so now our provisioning process that provisions
1564.71 -> a new account, we do that on a multiple times a week.
1568.37 -> Every single account comes fully integrated
1571.64 -> with the Clumio Stack.
1574.28 -> We use a tagging strategy developed with,
1577.73 -> to match the protection group patterns within Clumio,
1580.7 -> that means it's as simple for a team
1583.43 -> as putting a tag on a bucket to start backing up.
1587.03 -> No plugging in the tool,
1589.01 -> no integrating, no setup,
1592.34 -> they simply drop a tag and it starts backing up.
1595.58 -> And that was the kind of simplicity
1597.14 -> that we were looking for.
1600.922 -> The way that Clumio looks at cost in their credit system
1604.88 -> allows us to consume as we go and clearly track which teams
1610.25 -> are consuming how much of the backup capacity.
1613.76 -> It makes us effective in that cost pillar
1616.61 -> in the well-architected framework,
1617.87 -> really to understand how that money's being spent,
1620.48 -> not spend and waste while the product sits on a shelf,
1624.77 -> but buy as we go and buy as we need and that's been great.
1630.14 -> And we've developed a lot of new interesting features
1632.36 -> with them, especially around that last one.
1634.37 -> B-Y-O-K, Bring Your Own Key,
1636.95 -> talked about that native part of a solution
1641.15 -> working within the AWS space,
1643.4 -> working directly with AWS KMS.
1647.09 -> We are able to mutually share a pair of KMS keys,
1651.68 -> one owned by Cox Automotive, one owned by Clumio,
1655.49 -> which makes sure the right level of balance.
1658.1 -> The idea that I can take my critical data
1661.1 -> and put it in the hands of a trusted partner like Clumio,
1665.18 -> but still have the power to have control
1667.73 -> while it's left my four walls,
1670.7 -> to shut down the access to that data
1672.98 -> by revoking that encryption key.
1674.81 -> Patterns like that developed mutually,
1677.66 -> give us the right balance where I can trust
1680.6 -> what is an outside entity
1682.79 -> to handle my most sensitive and critical data.
1687.38 -> So I'm gonna stop there,
1689.78 -> I'm gonna hand it over to Woon
1691.37 -> and he's gonna show us how the magic happens.
1693.02 -> - Thank you.
1694.33 -> So I start with what's Clumio?
1696.5 -> So Clumio is a data protection service created on AWS
1700.13 -> to support the workloads and data sources running on AWS.
1703.52 -> To that end, we wanted to create a service
1705.59 -> that is as scalable and elastic as all the data services
1709.16 -> and applications that you're running
1710.81 -> on the public cloud on AWS.
1715.01 -> Before I jump in,
1716.42 -> I just wanna share a few observation
1718.94 -> that makes data protection for S3 so challenging.
1722.09 -> Like Peter mentioned,
1723.32 -> S3 is being used very widely
1725.75 -> because of the flexibility, simplicity, scalability,
1728.96 -> performance, and the various point of integration
1731.75 -> makes S3 a natural choice when it comes to be
1734.75 -> the application primary data store,
1737 -> especially for those modern applications
1739.04 -> that were actually born in the cloud.
1741.92 -> What that means to us is that now,
1744.26 -> we have buckets where objects are being created
1748.13 -> by hundreds of thousands, if not by millions.
1750.74 -> They're always created per hour.
1754.01 -> We also see buckets that are huge,
1756.41 -> they contain easily one, two billions of objects.
1759.53 -> Moreover, we have customers coming to us
1761.63 -> with 10, 20, 30 billions of objects per buckets.
1765.14 -> We also see a lot of variety
1767.15 -> because of the different use cases they have data
1770.12 -> with different characteristics and different requirements.
1772.97 -> Let me just give you one example.
1774.56 -> Let's say you have a bucket with 1 billion objects.
1776.99 -> I'm pretty sure that within that bucket
1779.06 -> you will have some objects,
1780.53 -> yet you probably don't care backing them up,
1782.99 -> but you'll have another subset of the objects
1784.94 -> that you are required to keep a second copy
1787.43 -> and you wanna retain it for one year.
1789.74 -> Even more, you may have another subset of objects
1792.47 -> within the same bucket that you would like to back it up
1796.7 -> and this time, you wanna retain it for seven years.
1799.88 -> And we wanted to create a service that is as flexible
1802.46 -> so that you can actually satisfy
1804.35 -> all the compliance requirement and at the same time,
1807.11 -> optimize the cost by not backing out what you don't need to.
1812.78 -> While looking at all of these things,
1814.94 -> you see that it is very hard for a single solution
1818.15 -> to be the magic bullet that solves all of the problems.
1821.3 -> Another example is depending on the use cases
1823.88 -> and how you use your buckets, many times it is not ideal,
1827.42 -> or really practical for enabling things like object locking.
1831.35 -> And at Clumio, what we try to do is to create a service,
1834.26 -> another tool in your tool arsenal, in your toolbox
1837.68 -> along the features that Peter talked about earlier,
1841.28 -> which you can use and achieve data protection
1843.53 -> for your data in S3.
1848.362 -> So I'm gonna start with the high level architecture overview
1851.96 -> and then we'll keep going down the road.
1854.24 -> First of all, on the left hand side, we have ACME.
1857.48 -> That's a hypothetical customer
1859.67 -> and they're looking to protect a big bucket in the middle.
1862.76 -> The way that they install Clumio
1864.77 -> is by deploying a cloud formation template or a Terraform.
1868.34 -> Once that confirmation template is deployed,
1870.68 -> we go ahead and install an IAM Role and an EventBridge.
1874.88 -> That's all.
1876.08 -> In your account, the only Clumio footprint
1878.36 -> is literally just that IAM Role and that EventBridge.
1881.99 -> While that is getting deployed on Clumio side,
1884.78 -> we actually go ahead and create a dedicated AWS account
1887.72 -> for the customer, ACME.
1889.34 -> the way that we segregate data across the customer
1891.74 -> is by actually creating dedicated AWS accounts
1894.44 -> for every customer that onboards with Clumio.
1897.53 -> - And Woon, I'll say,
1900.47 -> there was hesitation with the idea of moving data
1904.91 -> into your platform because it could be co-mingled
1907.73 -> with other customers.
1910.19 -> You're using the same pattern that we use
1912.44 -> to create segmentation to create isolation
1915.89 -> for Cox Automotive's data separate
1918.17 -> from your other customers.
1920.69 -> And still get all the perimeter security
1923.12 -> that it gives in isolation.
1924.89 -> So this was a game changer that made our many parts
1928.58 -> of our organization much more comfortable
1931.25 -> with the idea of working with a third-party for data backup.
1934.85 -> - Yes, a lot of the things that we do
1936.41 -> is really security first.
1939.77 -> And all the access that happens from the Clumio side
1942.89 -> back to the customer accounts
1944.11 -> is really through that IAM Role
1946.16 -> and we will use that EventBridge
1947.78 -> to capture the changes that are happening in the bucket
1950.93 -> that we're backing up.
1952.13 -> And we're also integrating with S3 inventory
1954.5 -> to capture the catalog of the object
1956.84 -> that you have in that bucket.
1959.12 -> And then as you can see,
1960.92 -> and then a lot of the data processing happens
1963.29 -> through Lambda functions that they all run
1965.36 -> on our side of the account.
1967.34 -> Everything starting from data movement, verification,
1970.91 -> optimizations and indexing.
1972.89 -> As you can see, a lot of the complexity
1975.02 -> is on the Clumio side and that's by design.
1977.45 -> We wanna deliver Clumio as a service,
1979.55 -> to that end, we wanna keep everything that is complicated
1982.16 -> to our end and leave the customer side of the account.
1985.91 -> Pretty simple.
1986.743 -> Just IAM Role and an EventBridge.
1990.546 -> - And it's worth pointing out.
1991.379 -> That means near zero operational cost.
1993.95 -> Our AWS bill, we see near zero cost
1997.16 -> for you operating within our accounts.
1999.74 -> The compute resides with you and we buy credits from you.
2004.3 -> - Correct.
2005.133 -> All the patching, troubleshooting, all the observability
2008.38 -> is actually built right there in our side.
2013.48 -> So let's talk about a little bit about backup
2015.19 -> and know a little bit about the flexibility
2017.08 -> that's a what to backup.
2020.38 -> We wanna optimize costs for our customers
2022.51 -> and one of the easiest ways to do it
2024.31 -> is to allow the customers to tell us
2026.23 -> what's important to them.
2028.09 -> What we allow you to do is to specify a set of filters.
2031.69 -> You tell us what prefix, what timestamps,
2034.99 -> and what storage passes that you wanna back.
2037.84 -> And this is the way that you optimize both for cost
2040.75 -> and also for time that it takes us to do the backup.
2043.45 -> 'Cause again, given a bucket with a billion objects,
2046.42 -> you may not wanna back up everything.
2049.48 -> Conceptually, it's pretty simple.
2051.22 -> You take a long list of objects
2052.96 -> and you apply Lambda function
2054.67 -> that goes ahead and applies those filter functions.
2057.16 -> And then it outputs another smaller, shorter list
2060.61 -> that contains the list of objects that passes the filter.
2065.17 -> We also validate the inventory on a daily basis.
2067.99 -> Remember we're actually doing continuous backup,
2070.06 -> we're integrating with EventBridge and that's how we know
2072.79 -> when objects are getting created or deleted.
2077.62 -> But we wanna make sure that the backup really captures
2081.49 -> hundred percent of all the objects that you have.
2084.85 -> If for whatever reason were to drop one message means that
2089.14 -> that one message or that one object never gets backed up
2092.71 -> because we are doing continuous backups based on that event.
2096.88 -> What we wanna do is to actually go ahead and check,
2100.09 -> crosscheck against the S3 inventory on a daily basis
2103.48 -> and then detect if there's any objects that are missing,
2106.45 -> we'll go ahead and do what's called a catch up backup
2109.3 -> or fixed backup to make sure that everything is captured.
2113.92 -> In other words, a backup that captures 99.99%
2117.28 -> of your objects, it is still a failed backup.
2119.53 -> You do want your backup software to capture 100%
2123.31 -> of the objects.
2126.19 -> The way that we achieve this is that
2128.07 -> we developed a technology that we allow us to compare
2131.35 -> the catalog of the objects that we have in our backup
2134.11 -> against the S3 inventory.
2135.82 -> We will actually get two lists
2137.59 -> and then we will detect whether there's a missing object
2140.53 -> and if there is one,
2141.82 -> it will output again, a shorter list,
2143.92 -> which it will be a shorter list and it will actually do
2146.47 -> the closed backup based on that shorter list.
2150.16 -> Again, think about it,
2151.18 -> the chances of dropping that event is very small.
2153.55 -> It will actually be 0.001%,
2156.52 -> but if you have 10 billion objects,
2158.8 -> that actually is a decent number.
2162.73 -> And again, conceptually things are very simple,
2165.19 -> but if you're processing tens of billions of objects,
2167.413 -> it's not so easy
2169.06 -> because even the list of objects, they're terabytes in size.
2172.51 -> And then we need to run those filters concurrently
2174.73 -> using multiple Lambda functions at the same time
2177.7 -> so that we can actually perform backups in a timely manner.
2182.77 -> Moreover, comparing a very long list of objects is, again,
2187.06 -> very, very challenging.
2188.35 -> If you have two lists with 30 billions objects on each side
2191.89 -> and finding which one is missing,
2193.6 -> it's actually very, very challenging to do.
2197.05 -> Some of the technologies that we've built here also allow us
2200.05 -> to support and do backups in buckets
2203.59 -> that do not have versioning enabled, for instance.
2206.38 -> Because again,
2207.28 -> it may not be the most practical things to do in some cases.
2212.77 -> Let's talk about ingest.
2214.3 -> So now we know that we collected,
2216.1 -> we have the list of objects that needs to be backed up.
2219.43 -> So how hard could it be?
2221.11 -> You know that the objects are organized in prefix
2223.75 -> and all we need to do is to actually fire up
2226.33 -> a bunch of Lambda functions
2228.07 -> and get the objects from one side
2230.26 -> and move them to the other side.
2232.84 -> Actually, it turns out to be that it's not that easy.
2235.42 -> If you fire up all those Lambda functions
2237.46 -> and they all start working on the same prefix or partition,
2240.88 -> you're gonna get a lot of API throttle.
2243.22 -> You can increase the Lambda functions,
2245.11 -> but you're not gonna make things any faster.
2246.91 -> In fact, all you're gonna get is just API throttles.
2250.18 -> So what we ended up doing is that we introduce
2252.97 -> a notion of Clumio partition.
2255.28 -> We looking at the list of objects that needs to be backed up
2258.52 -> and based on some heuristics,
2260.29 -> we determine the Clumio side of partition
2263.2 -> and we actually schedule the Lambda functions
2265.21 -> across the different partitions.
2266.92 -> What that allow us to do is now,
2269.02 -> we can actually have these Lambda functions
2271.36 -> working on different parts of the key space
2274.323 -> without choking a single prefix.
2277.99 -> But again, at the same time, remember,
2280 -> we're not the only one using that bucket.
2282.19 -> We're a backup.
2283.24 -> There's the primary application that is currently using
2286.3 -> that bucket and we're continuously monitoring API throttle.
2290.35 -> And if we see that we're actually getting API throttles,
2293.41 -> it could be because the primary application is using it,
2296.5 -> we go ahead and reschedule things.
2298.54 -> What that means is that we can actually give a little bit
2301.21 -> of a break to that blue partition out there
2303.28 -> and while we actually do a little bit more work
2305.56 -> on that yellow partition.
2307.12 -> And we continuously do that as part of the backup
2310.03 -> so that we can actually satisfy the backup performance
2313.21 -> and at the same time, allow the primary workload
2316.18 -> to actually have enough request per second
2319.15 -> to carry out the work.
2321.46 -> - And that's for, I'll say,
2322.66 -> that was one of our early concerns.
2324.64 -> There would be no worse outage bridge to be on to say,
2329.38 -> our software is down because the backup is running.
2332.62 -> And to have it intelligently understand our workload,
2336.64 -> and adjust accordingly,
2338.53 -> removes the need for us to have that,
2340.69 -> to factor in that concern.
2341.89 -> We no longer think about,
2343.591 -> well, I have to schedule these at 3:00 AM
2345.46 -> when my workload is low
2347.26 -> 'cause we're competing for resources.
2349.3 -> - Yeah, but remember,
2350.413 -> S3's being used as the primary data store.
2353.38 -> So you have the application that is continuously reading
2356.11 -> and writing to that bucket.
2357.76 -> And if the backup comes in and it takes up all the APIs,
2361.39 -> all the provisioned APIs, then yes, we can't do that.
2364.84 -> So we allow you to, you know,
2366.31 -> we monitor the API throttle, we reschedule things,
2368.86 -> we allow the customers to even specify
2370.93 -> what is the maximum allowed API rate
2373.66 -> that we're allowed to consume.
2376.18 -> So let's talk about restore.
2377.65 -> It's very similar to the backup side of the house.
2380.32 -> Again, we're super focused in actually optimizing things
2383.83 -> for our customers.
2384.94 -> And the way that you do that is to actually restore
2387.34 -> only what you need.
2388.69 -> Like if you have 10 billions of objects backed up in Clumio,
2393.01 -> restoring all that, it does take time
2395.137 -> and it is actually time consuming and resource consuming.
2399.16 -> The best way to actually avoid that
2400.66 -> is to restore exactly what you need.
2402.97 -> You can restore objects by prefix.
2404.98 -> You can restore objects based on specific timestamps, tags,
2409.42 -> there's a variety of the filters that you can specify
2412.33 -> and we will only restore those objects that are required.
2415.6 -> This allows you to actually restore the objects
2418.21 -> that you needed right now
2419.71 -> and restore some of the other objects
2421.35 -> at a later point in time.
2424.09 -> The way that we achieve that is to basically
2426.52 -> working through the metadata.
2427.99 -> For every object that we back up,
2429.76 -> we maintain metadata entry for every object.
2433.3 -> For every metadata blob,
2434.68 -> we actually maintain the metadata in parquet files
2437.38 -> and we heavily use AWS Athena to query that metadata engine.
2442 -> We know that Athena behaves better
2444.1 -> if you have metadata objects that are somewhat bigger
2446.62 -> in the order of hundreds of megabytes.
2449.53 -> So what we do in the backend
2451.21 -> is that using various Lambda function,
2453.13 -> we're continuously optimizing that metadata payload.
2455.89 -> We're combining the parquet files,
2457.87 -> we're actually partitioning them differently
2459.85 -> so that when it is time to query,
2461.35 -> it is actually readily available
2463.42 -> and we can return that list of objects
2465.37 -> to be restored very, very quickly.
2467.59 -> And that's something that we continuously do in the backend
2470.62 -> for our customers.
2472.63 -> If you see, these are all challenges that happens
2475.81 -> when you have a lot of objects.
2477.25 -> And things are many times you don't think about it
2480.88 -> when you start thinking about implementing your own.
2486.76 -> So let's talk about observability.
2488.92 -> I mentioned that we are a service,
2491.74 -> we wanna own all the complexity ourselves
2494.38 -> because we don't wanna give it to you.
2496.87 -> We wanna be the ones that detect the failure.
2499.21 -> We wanna be the ones that troubleshoots every failure.
2501.67 -> We wanna be the one that monitors
2503.59 -> and actually patches everything for you
2505.63 -> so that you don't have to do anything,
2507.91 -> you just use it as a service.
2509.89 -> But at the same time, I just mentioned
2512.26 -> that we use thousands of Lambda functions,
2514.39 -> so different sizes and types.
2516.4 -> And at the same time, we also have hundreds of customers,
2519.55 -> meaning hundreds of AWS accounts
2521.89 -> and all of these Lambda functions,
2523.6 -> they're running across all of these hundred AWS accounts.
2527.68 -> And we have backups continuously happening
2530.59 -> all over the place.
2531.82 -> So how do we control all this?
2533.77 -> 'Cause if something fails, how do we know?
2536.35 -> Because remember, we need to be the one
2538.63 -> that detects the failure and we need to be the one
2541.06 -> that troubleshoots it and fix it
2542.47 -> so that you don't have to.
2544.69 -> For us to achieve this,
2545.77 -> we actually implemented an internal framework
2547.93 -> that we call a Clumio Workflow Engine.
2550.45 -> So for the sake of time,
2551.89 -> I'm not gonna be able to spend too much time on it,
2553.93 -> but I promise I'll actually give a live demo of this
2556.36 -> and I'll show you how it works in real life.
2561.58 -> I wanna talk a little bit about cost optimizations
2564.58 -> and also some of the confusions
2565.93 -> and questions that I get from our customers.
2568.51 -> A lot of the time, I get questions about, this is great,
2571.78 -> but it looks very expensive.
2573.52 -> And I also get question, how is this different
2575.47 -> from replication? Is this replication?
2578.95 -> So to answer that question, let me just use an analogy.
2581.89 -> I'm a big fan of books.
2583.78 -> Let's say you have a bookshelf full of books
2585.76 -> and the way you organize your books
2587.71 -> is basically based on how you would actually access
2590.5 -> those books every day.
2592.18 -> Now let's say we need to create a second copy
2595.06 -> or create a backup of those books that you have at home.
2598.06 -> One way to do that is to actually buy
2600.7 -> an exact same looking bookshelf and replicate the format
2604.06 -> to the second bookshelf.
2605.59 -> Sure, you'll end up having two copies and two books.
2608.65 -> However, if what you're looking is backup,
2611.08 -> I'll argue that is not the best way
2612.79 -> to actually achieve backup.
2614.41 -> If you wanna back it up and what you truly want is backup,
2617.38 -> the best way and the most efficient way to do it
2620.11 -> is to buy a box, stack all of those books
2622.78 -> and put that box away in the storage.
2625.78 -> That is a more efficient way to actually to do backups.
2630.79 -> Just like replication is not, you know,
2633.46 -> just like backup cannot be a replication.
2636.82 -> Replication is not an ideal backup solution either.
2639.79 -> So it is again, the right tool for the right problem.
2646.72 -> So what I'm gonna do is I'm gonna very quickly
2648.73 -> go over three announcements
2650.32 -> and then we'll actually switch it over to the live demo.
2654.01 -> So the first one is about 15-minute RPO.
2657.94 -> I'm actually super happy to announce that we fully,
2660.88 -> with the help with the S3 folks,
2662.74 -> we finished fully integrating with the EventBridge
2666.13 -> that allow us to support 15-minute RPO.
2669.79 -> What we do is that we actually performed
2672.49 -> 15-minute micro backups.
2674.29 -> So every 15 minutes we will actually go
2676.57 -> and capture those objects and we'll back it up.
2679.24 -> And then just like I told you before, every day,
2682.42 -> once we receive the daily S3 inventory,
2684.88 -> we will actually cross examine all of the micro backups
2687.97 -> that we took in the last 24 hours
2690.01 -> and we will actually compare it with the S3 inventory.
2693.04 -> And if there's anything missing, we'll at that point,
2695.89 -> do what we call the close backup and fix everything up
2699.01 -> so that you do have the guarantee
2701.05 -> that we capture all of the objects.
2703.9 -> What this means to you is that it means that
2706.39 -> in the worst case,
2707.32 -> you're losing up to 15-minute worth of data in your bucket
2710.29 -> once this is enabled.
2714.01 -> Next, with a lot of the data optimization
2717.04 -> and the partitioning, and the scheduling,
2719.65 -> we can now support up to 30 billion objects per bucket.
2724.57 -> What this means to you is that if you have a large bucket
2727.6 -> and you're struggling with data protection, come talk to us.
2732.85 -> Okay, this last one is something
2734.65 -> that is actually my favorite one.
2736.87 -> We're talking about instant access.
2739.15 -> Clumio, we are actually very, very motivated
2741.85 -> to actually optimize cost and performance for our customers.
2745.66 -> Now if you wanna restore a billion objects, what do you do?
2749.8 -> The first thing that you could do is apply filter
2752.23 -> and reduce that object comes down.
2754.27 -> That's one way to do that.
2755.59 -> But at the end of the filter,
2756.7 -> if you're left still with hundreds of minutes of object,
2759.61 -> restoring that, it will actually take time.
2763.72 -> It will take time and resources.
2765.34 -> And what we want you to do is for some specific cases
2768.1 -> such as DR testing, backup testing, or true emergency,
2772.09 -> we wanted to make the data readily available
2774.79 -> and also at a fraction of the cost.
2777.82 -> The way we do that is by a feature called instant access
2780.61 -> where we expose the data
2782.17 -> that we store in the backup directly.
2785.29 -> So we exposed an S3 endpoint that is S3 compatible,
2789.4 -> that you have the backup data there.
2792.01 -> So for example, we can take a backup
2795.07 -> or a specific prefix back, let's say six hours ago,
2799.06 -> and we will return back to you an S3 endpoint.
2802 -> If you actually connect to that S3 endpoint,
2804.22 -> it will contain all the objects in that bucket of prefix
2807.64 -> as of six hours ago.
2810.97 -> If you want it to be as of three hours ago,
2813.52 -> we can repeat the same thing.
2815.53 -> We can actually take you back to any point in time
2818.56 -> and we can actually share an S3 endpoint
2821.23 -> that contains all the data at that specific point in time.
2825.49 -> And we can do all this at a fraction of the cost
2828.85 -> and nearly instantaneously.
2831.22 -> Now, if you're doing things like DR testing, backup testing,
2834.79 -> or you're in a true emergency, this is something
2837.61 -> that could really be a life changer for you.
2841.66 -> All right, we'll do the demo.
2843.67 -> By the way, it is a live demo.
2848.56 -> All right, so what I'm gonna do is, first of all,
2851.92 -> I'm gonna log into my test cluster.
2858.16 -> Got all my passwords there.
2865.21 -> So this is kind of the home screen that you created on
2869.26 -> once you logged in.
2870.19 -> We have the ver-
2871.023 -> oops, what happened. (chuckles)
2873.16 -> We have the various dashboards and stuff,
2874.81 -> but again, for the sake of this discussion,
2876.43 -> since we're talking about data protection
2878.92 -> was quickly skip over.
2880.66 -> But this is kinda gives you a dashboard,
2882.82 -> a visibility into what's happening on your environment
2885.4 -> in terms of what's protected,
2887.95 -> how much is costing you and so on so forth.
2891.58 -> One of the first things that you will have to do
2893.79 -> is to actually register your environment.
2896.173 -> Like I mentioned, you can actually do it through
2898.133 -> a cloud formation template or via Terraform.
2901.66 -> Once you register that environment,
2903.43 -> essentially you specify the AWS account and a region,
2906.82 -> and an optional name and you will actually,
2908.92 -> the whole thing will take you no more than 15 minutes
2911.47 -> to get up and running
2913.33 -> 'cause all we are doing again is creating that role
2915.4 -> and that EventBridge on your account.
2918.22 -> Once that's done, typically what you do
2920.883 -> is that you create a policy, you name it,
2924.49 -> and essentially, let's first name it.
2928.985 -> And then you can enable the various data sources
2934.69 -> that we support.
2935.74 -> In the case of S3, we support two tier,
2938.62 -> standard and frozen with different pricing points
2941.83 -> and different retention and access performance.
2946.21 -> And then you can actually set the retention.
2948.73 -> For the sake of demo we'll skip that
2950.44 -> and we will use some of the policies that we already have.
2955.09 -> Next, if you wanna protect your S3 buckets,
2959.65 -> what you'll end up doing,
2961.96 -> the first thing that you'll end up doing
2963.4 -> is to actually create what's called the protection group.
2966.7 -> A protection group is really a combination of buckets
2969.7 -> along with the filters.
2971.26 -> This is how you're telling us what's important to you.
2974.47 -> So the way we do that, let's just name the protection group,
2978.01 -> let's call it reinvent.
2979.42 -> And then there are multiple ways
2980.44 -> that you can actually add a bucket, you can add it by tag.
2983.59 -> Let's just do it by tag.
2988.3 -> So then what happens is that
2989.8 -> it automatically searches your environment.
2991.93 -> It automatically adds all of the buckets
2994.24 -> that contains the tag reinvent demo.
2997.72 -> And then it is not only just one time,
2999.7 -> if after the protection group is created,
3001.68 -> you create another bucket with such tag,
3003.36 -> they get automatically added.
3007.26 -> You can also add these buckets manually.
3009.78 -> Let's say I just picked these two.
3014.13 -> From here, now that I select the buckets,
3016.5 -> I need to apply the filters.
3018.12 -> The way that you apply the filters,
3019.8 -> you can actually tell us whether you wanna protect
3021.75 -> all the storage classes or you wanna leave some out.
3025.2 -> Maybe you're not interested in protecting
3027.6 -> some of the One Zone-IA.
3031.5 -> You can actually tell us whether you wanna back up
3033.54 -> all the versions of the object or just the latest version.
3037.17 -> And you can also tell us which prefixes
3039.33 -> are interesting for you.
3040.95 -> Let's say protect and then within protect we can exclude,
3046.26 -> let's say the prefix junk
3049.38 -> and we can also back up the prefix important.
3057.18 -> Next, now that we specify the bucket and the filter,
3060.15 -> you have defined the what?
3062.1 -> Now we go ahead and select the policy
3064.143 -> that you wanna apply to.
3065.91 -> Whether you wanna back it up every day,
3067.65 -> we can for seven year.
3069.15 -> Now you're configuring the frequency
3071.1 -> under the retention period.
3073.26 -> So let me cancel that for the sake of demo
3075.15 -> and then we'll just use some of the protection group
3077.67 -> that I created before the demo.
3079.41 -> I have few protections group that I created,
3081.57 -> some of them in medium size, which is about 6 billion,
3084.48 -> some of them that are somewhat large
3086.22 -> in the order of 30 billion objects.
3088.26 -> And then we're gonna concentrate
3089.88 -> in these two protection groups.
3093.54 -> These two protection groups contains the two same buckets.
3096.3 -> They contain the reinvent2022clumio1 and clumio2
3100.17 -> both added by tag.
3103.23 -> Over here, this protection group, the seven year one,
3106.44 -> again, contains the exact same two buckets.
3109.26 -> And you may ask the question, why?
3111.93 -> Why do we have two protection groups
3113.82 -> for the same two buckets?
3116.58 -> The reason is that they're actually protecting
3119.7 -> different prefixes within those buckets.
3123.15 -> This one is actually protecting everything,
3125.16 -> the prefix year one, except what's under year one/temp.
3130.68 -> And with that, we're actually applying a yearly retention
3134.19 -> because that is my compliance,
3135.72 -> that is my retention requirement.
3138.93 -> Now if I actually go back and I look at the seven year one,
3143.82 -> again, it is exact same bucket.
3145.98 -> However, they are the two different prefixes.
3148.53 -> This time I'm actually protecting the seven year
3151.47 -> except the temp.
3155.22 -> And this time I'm actually retaining it for seven year,
3159.18 -> unlike the previous one that I was actually retaining
3161.46 -> just for one year.
3163.62 -> So that's on the backup side of the house.
3165.93 -> Let's just explore the different options that we have
3169.2 -> for restored process for restores.
3173.04 -> The simple one is to restore at a single prefix
3176.52 -> or entire buckets.
3177.75 -> You come in and you pick the buckets that you're interested
3182.01 -> both of them or just one of them.
3184.41 -> And then you apply filters,
3186.12 -> you tell us exactly the point in time to the second.
3190.98 -> And then moreover, you can even apply filters.
3193.98 -> And if you don't, it happens to be the entire bucket.
3197.04 -> But you can actually apply filters and we can do things
3200.55 -> like demo and we can even apply filter based on size
3206.16 -> and then do a preview.
3208.83 -> So what we do, we now query the metadata engine
3212.1 -> and you will return a preview of the object
3214.2 -> that gets restored before we actually execute it.
3218.82 -> Once you click on summary,
3220.32 -> then now you're telling us where to restore it to.
3224.07 -> You're telling us whether you wanna restore
3225.66 -> all the versions.
3228.24 -> It tells you about the protection group,
3229.8 -> the bucket that we selected, the filters.
3232.38 -> We can actually restore it back
3233.85 -> to the original source account,
3235.35 -> but we can actually restore it
3236.58 -> to any other registered source account.
3239.37 -> It doesn't have to be in the same account.
3242.01 -> You tell us to which bucket to restore it to.
3244.92 -> And you can also tell us what is the storage class
3248.04 -> that you wanna use when we're restoring it back.
3251.01 -> From there, you can also tell us whether you wanna add
3253.77 -> some prefixes in front of this object
3256.41 -> and we can even add some of the object tags.
3261.3 -> Another option is to actually restore a single object.
3264.54 -> It is pretty much the same.
3265.8 -> You specify the time and different filters,
3268.56 -> but the difference is that you get to specify
3271.92 -> the specific version in the chain of the versions
3274.68 -> for the object.
3276.87 -> Lastly, let me actually quickly show a demo,
3280.02 -> the instant access demo.
3283.17 -> So let's just do an instant access.
3285.63 -> Let's do reinventLive.
3290.25 -> So what I'm doing here is that I'm actually creating
3293.43 -> an endpoint that contains the list of objects
3296.22 -> as of that backup on the 30th that I clicked
3299.31 -> as of 3:00 PM or 5.
3301.71 -> And if I click the endpoint,
3303.78 -> we will actually go to the instant access
3306 -> and we will see the reinventLive.
3308.52 -> That being prepared as we speak.
3312.42 -> Let me just quickly switch it over, 0%.
3316.2 -> Oh, it's actually done.
3318.24 -> Okay, that was pretty quick.
3319.59 -> So it's done.
3320.82 -> So this is the live mount point,
3324.84 -> the access point that we have created.
3326.94 -> And then the way that we do it,
3328.08 -> we just copy that URL
3329.237 -> 'cause again, that URL is the S3 point,
3332 -> S3 endpoint that contains the objects at that point in time.
3335.97 -> So let me just quickly switch it over.
3339.36 -> I have a quick cheat sheet out here
3342.45 -> to help myself doing the demo.
3344.52 -> I'm just exploring a couple of environment variables.
3348.57 -> So this will be the endpoint and then the URL
3354.3 -> that I need to copy is reinventLive.
3357.36 -> Copy, copy, paste that.
3359.79 -> And that's it.
3361.92 -> Now good to go.
3363.66 -> And this is all the AWS CLI tool that we all love and know.
3371.19 -> So now I'm able to go ahead and do things like list objects.
3378.39 -> So as you can see, we can actually list the objects.
3381.75 -> And this is basically no different than you listing
3384.51 -> the original bucket,
3385.38 -> except that we are actually listing the bucket
3387.699 -> as of 30th at 3:00 PM.
3393.3 -> We can do things like head object.
3407.16 -> Let's do that...
3411.63 -> That's the head object.
3413.46 -> We can actually do head object with the different objects.
3421.62 -> That will work.
3423.158 -> As you can see, the ETag,
3424.5 -> will actually go ahead and match this 134c.
3429.33 -> We can actually even do get objects.
3437.215 -> And let me just do this one out here.
3441.69 -> And just to prove you, I have nothing.
3450.09 -> So that will be the get object,
3453 -> if you do ls, we do see that object out here
3456.243 -> that we downloaded and...
3466.119 -> And yeah, that's kinda the picture
3467.76 -> that I took out of our slides.
3469.2 -> But that's the object that I put it in S3 buckets
3471.12 -> and it showed up.
3473.64 -> That's about the instant access.
3475.83 -> It gives you point in time, immediate access
3478.74 -> to the data in the bucket.
3480.06 -> You can take it to 3:00 PM, 4:00 PM,
3482.43 -> we will actually give you that S3 compatible endpoint,
3485.04 -> nearly instantaneously and a fraction of the cost.
3489.36 -> Lastly, like I promise,
3490.71 -> let me just do a quick demo about the whole observability
3493.86 -> that I talked about a little earlier.
3497.49 -> So this is truly live.
3499.8 -> So what I'm gonna do is what you guys see as a customer
3503.34 -> is this UI, you don't see the backend,
3506.4 -> but what we see in the backend is again,
3508.8 -> the whole workflow engine that we manage day in, day out.
3512.85 -> So let me copy out this task idea here,
3515.67 -> and then it is an internal debugger that we implemented.
3521.46 -> I have to go to task.
3524.1 -> And then we'll have the different task IDs
3527.25 -> and let me just paste this one out.
3537.6 -> So this one allow us to see exactly what's going on
3543.03 -> in every single backup.
3544.86 -> We know what's happening.
3546.39 -> We're getting what we call the container information.
3549.48 -> We're setting the progresses,
3550.89 -> we're waiting for bucket configurations.
3553.35 -> We're actually doing some of the CDC queries.
3555.75 -> I mean some of the data capture,
3557.58 -> change data captures that we talked about.
3559.74 -> This is a if step.
3561.03 -> So it's actually a little bit of a programming language
3563.16 -> that I actually put together.
3564.48 -> If this step is successful, it will take the black arrow out
3567.42 -> and you execute the next step.
3569.04 -> If these steps fails, this will turn red
3571.44 -> and it will actually take the red arrow.
3573.51 -> And if the user cancels at that point,
3575.79 -> we will actually take the purple arrow out.
3579.3 -> This is an if statement where you have then and else,
3584.28 -> and this would actually be something like a subroutine call.
3588.48 -> I can actually click on it and I can go deeper
3591.45 -> as to see what's happening in that subroutine.
3593.97 -> But if anything fails, this is exactly how we know
3596.97 -> where you fail, why it fail, what was the input
3600 -> and what was the step that was executed
3601.71 -> before and what did it output.
3604.14 -> And with every failure, this is how we can troubleshoot it
3607.29 -> within minutes and not within hours.
3609.66 -> And when something fails, we get to update that one step.
3614.04 -> That's much easier.
3615.21 -> That's why we can troubleshoot it
3617.04 -> in matter of hours and not days.
3619.41 -> And we can actually do all this for you
3621.66 -> and you don't have to do it.
3623.19 -> This is a live example.
3624.51 -> So let me actually give you one of a bad example
3628.86 -> that I had collected.
3630.42 -> In this case, I actually introduced an error
3634.38 -> intentionally and I'll show you what it does.
3638.88 -> So as you can see, yellow...
3643.98 -> As you can see, this is red.
3645.81 -> We know that it has failed.
3647.43 -> Where is it fail?
3648.45 -> We know that all of these steps
3649.8 -> are actually executed correctly,
3651.75 -> but it ended up failing right here.
3654.12 -> You'll see the red border out there.
3656.67 -> So then because this has failed,
3658.44 -> we're not executing the steps below it,
3660.45 -> but instead, we're actually taking the red arrow out.
3663.78 -> And then what we are doing is such as,
3666.27 -> we're generating an event that something has failed.
3668.61 -> That's what the X is for.
3670.02 -> We're terminating the task as a failure.
3672.51 -> We're actually sending a notification to our support team
3675.15 -> that they know that something has failed
3677.1 -> and we have all of the steps that has failed.
3679.92 -> And then now, because this is the subroutine
3683.58 -> that has failed with the Lambda functions,
3685.62 -> we can actually click on it and look at it even deeper.
3689.64 -> This is the function that what it does,
3691.98 -> remember we're processing large number of manifest file.
3696.03 -> So we apply the filter
3697.35 -> using multiple Lambda functions concurrently.
3700.35 -> So we have what we call the fork and joint step.
3707.1 -> So at this point,
3708.09 -> we compute how many Lambda functions to use.
3710.76 -> In this particular example,
3712.23 -> you're using 32 Lambda's concurrently
3714.99 -> to actually filter different areas of the manifest file
3718.56 -> to apply the filter and know what to back up
3720.96 -> and not to back up.
3722.73 -> So what happens is that I introduce an error on this step,
3725.58 -> that's why this turned red.
3727.2 -> But then when all these stored in two Lambda's
3728.997 -> are actually done processing different areas
3731.4 -> of the manifest file,
3732.84 -> they all join out here and then they will actually continue.
3736.38 -> But it so happens that this step has fail.
3738.75 -> That's why we're actually taking the purple arrow out,
3741.3 -> that which that ends up failing.
3743.34 -> And then we bubble that to the parent workflow,
3745.8 -> which then takes the exit path to notify our support team.
3749.31 -> But looking at this,
3750.267 -> now we can actually debug things within minutes
3753.24 -> and we can now truly be a service
3755.7 -> and take all that complexity away from you and we own it.
3760.5 -> - And what's unprecedented is
3763.17 -> they're taking on that complexity through observability,
3767.37 -> but also transparency to us as a customer.
3770.696 -> And that makes a big difference
3771.75 -> in it not being a black box to us,
3774.75 -> so we can understand how you're approaching
3776.4 -> the problem too.
3780.57 -> - That's it.
3781.53 -> All right, just on time.
3784.26 -> Or maybe a little over.
3786.059 -> (speakers chuckling)
3788.46 -> - Thank you all for coming.
3789.69 -> We really appreciate it.
3790.59 -> And we'll be up here if you have any questions.
3792.392 -> (attendees applauding)

Source: https://www.youtube.com/watch?v=-dr71gKGZGc