AWS re:Invent 2022 - 3 innovations that redefine data protection for Amazon S3 (PRT315)

Aug 16, 2023

AWS re:Invent 2022 - 3 innovations that redefine data protection for Amazon S3 (PRT315)

You rely on Amazon S3 to power your cloud-native applications, data lakes, analytics, and AI. While Amazon S3 is extremely durable, the resilience of the data itself is your responsibility. So how do you secure billions of objects from an ever-expanding list of potential threats? And how do you recover when your data is compromised? In this session, technologists from Amazon, Cox Automotive, and Clumio dive deep into Amazon S3 data protection and demonstrate how to swiftly recover petabytes of data in the event of an incident. Learn how to implement continuous immutable backup that is air-gapped and instantly recoverable. This presentation is brought to you by Clumio, an AWS Partner.

Learn more about AWS re:Invent at https://go.aws/3ikK4dD.

Subscribe:
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4

ABOUT AWS
Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.

AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

#reInvent2022 #AWSreInvent2022 #AWSEvents

Content

1.53 -> - Welcome to "Three Innovations

3.03 -> that Redefine Data Protection for Amazon S3."

6.42 -> My name is Woon Jung,

7.998 -> I'm the Co-Founder and CTO of Clumio.

9.75 -> Today with me, I have Peter Imming.

12 -> - Hello.

12.833 -> - Principal Product Manager from Amazon S3.

15.42 -> And Mark Huber, Senior Director of Engineering

17.73 -> at Cox Automotive.

19.4 -> (attendees applauding)

20.25 -> First, I'll have Peter join me to talk about

22.38 -> how S3 is being used and why you should protect the data

25.38 -> in your S3 bucket.

26.67 -> And then after that I'll have Mark join me

28.86 -> to talk about their journey in their AWS

31.38 -> and the partnership with Clumio.

33.51 -> Last, I'll come in and talk about kinda the details

36.03 -> about the how things are implemented in the backend.

38.73 -> And I'll give you a live demo

40.08 -> that I'm pretty sure you all like it.

42.18 -> Peter.

43.77 -> - Thanks, Woon.

44.79 -> So if you thought this was data protection for security

50.37 -> or IAM, or encryption, this is not the session for you.

54.54 -> We're gonna be talking a lot about data protection

56.94 -> in terms of backup.

59.82 -> In terms of replication and high availability,

65.13 -> durability, availability.

67.26 -> So if you're in the wrong session,

69.18 -> definitely now's the time to head on out.

72.33 -> But again,

73.163 -> I think we wanna say thank you for coming to the session.

74.88 -> We know you had a lot of choices out there

76.38 -> for different sessions to attend,

77.55 -> including the bar sessions out there right now.

80.79 -> So thank you all.

81.72 -> I think from our perspective for coming here

84.12 -> and listening to us today.

85.74 -> We'll try and keep it as interactive as possible.

87.39 -> If you have questions,

88.29 -> I think we're all comfortable taking questions

90.66 -> as you might have them.

92.43 -> So we'll go ahead and get started here,

94.53 -> and talk about data protection for S3.

99.03 -> Now Amazon S3,

101.01 -> we really have come a long way in 16 years with Amazon S3.

105.24 -> We're now storing over 280 trillion objects in S3,

111.39 -> over a hundred million transactions per second.

114.66 -> And really, what we're seeing S3 kind of transform

117.39 -> into is really now the production,

120.72 -> the primary production storage for customers

123.45 -> to create new data in.

125.01 -> It's no longer just a place to store data

127.38 -> or backup data too.

129.09 -> In my time at AWS, I've been there for three years on S3.

133.53 -> That's probably the biggest transformation I've seen is that

136.38 -> we now have the primary,

139.26 -> the bulk of the data coming into S3

141.03 -> being natively created inside of AWS.

144.27 -> And this is really a change over the last couple of years

147.78 -> that we've just seen accelerate.

149.61 -> And that comes from applications that you're running,

152.34 -> such as data lakes, machine learning.

154.92 -> It could be EMR, it could be our services, it could be

159.966 -> Datadog, Databricks, it could be Snowflake DB.

163.32 -> You're starting to see now S3 and object storage really

166.29 -> become the primary defacto class of storage

169.29 -> that you're creating new content with it.

171.9 -> And that includes cloud native applications

173.97 -> that you may be running in a container,

175.68 -> that could be classic virtual machines

177.48 -> that are now writing out to object storage directly.

180.54 -> It could be new databases that are gonna be shipping

182.67 -> from traditional database vendors

184.5 -> that are gonna now run natively on object storage

187.05 -> for the first time in their long history.

189.6 -> And we're also seeing obviously a tremendous growth

192 -> in log files, machine generated log files,

195.72 -> machine generated data.

197.07 -> Everything from IP cameras, everything from factory sensors

201.03 -> that are all generating machine logs

202.92 -> and then rapidly sending those to S3

204.99 -> as the production storage.

207.36 -> And we take that very seriously at Amazon.

210.15 -> So when we've got 280 trillion objects to store,

214.56 -> that takes a lot of effort to store durably,

218.4 -> to store with availability, to store with redundancy.

222.51 -> And we take that very seriously.

223.83 -> And if you've ever had to manage storage at scale,

227.37 -> you understand that that takes a lot of work

229.5 -> to manage all three of those for virtually,

233.4 -> essentially unlimited scale.

235.95 -> So we wake up every day on S3 and go look after

240.57 -> the availability, durability,

243.09 -> and the resiliency of that data,

244.74 -> of those 280 trillion objects so that you don't have to.

248.22 -> You don't have to worry that your data is durably stored.

250.83 -> You can go take that time now back and go innovate

253.92 -> on top of S3 rather than having to craft durable storage

258.09 -> that's highly available.

259.8 -> When we look at S3, there are though

264.66 -> a different type of data protection question

267.21 -> that we get today, and that is a delete request.

270.51 -> We don't know customer intent

272.94 -> when you ask us to delete an object.

276.06 -> We have to look at that request as unambiguous.

278.61 -> We don't know if it's accidental.

280.89 -> We don't know if it's intentional.

282.75 -> We don't know if it's perhaps even malicious.

285.54 -> So when we have that type of application data,

287.88 -> whether it's user generated data, media configuration data,

290.79 -> again, we talked about the data lakes,

292.41 -> sensitive information, all of this different data

295.38 -> has different value to your organization.

297.9 -> And what we're looking at here is

300 -> different layers of data protection

302.01 -> on top of different types of data.

304.29 -> And what I'll be talking about before handing it off to Mark

307.23 -> is really kinda looking at the different layers

309.9 -> of data protection that are most appropriate

311.79 -> for the different types of data

313.26 -> that you're storing in S3 today.

315.75 -> If that's compliance data,

317.04 -> that's gonna require a different level of protection

319.26 -> than what you might have for user data

323.25 -> that might be uploaded.

324.27 -> Cat.jpeg, right?

325.41 -> How many of copies of Cat.jpeg do we need to store?

329.82 -> That's a very different type of data to protect

333.12 -> than data that contains payment card information,

336.3 -> HIPAA data, personally identifiable information.

339.66 -> So this is the type of data that we're storing.

342.09 -> This is the type of data that you are storing in S3,

344.58 -> I should say.

345.48 -> And so what we're gonna be talking about today

347.07 -> is kinda crafting the right layers of protection

349.53 -> on top of S3 and how you can set that up

354.57 -> for the right types of data that you're storing.

356.43 -> When we look at the types of risks that you have in S3,

360.18 -> the difference versus traditional storage

363.33 -> that you may have been running for the last 10, 20 years

366.96 -> is that you no longer have to worry about, again,

369.51 -> the durability and the physical access to the storage.

372.27 -> It's now about an API, it's that request to delete data.

377.22 -> Again, we don't know what the intent is of that request,

381.39 -> but we do know, and it's important to S3

384.87 -> that we honor that request.

386.88 -> So whether that's a human error,

388.53 -> whether that's an inadvertent deletion, the stakes happen.

392.01 -> Is it a natural disaster? Is it a fire a flood?

394.5 -> Is it Godzilla coming in

396.09 -> and wrecking some undersea communication cables?

399.12 -> At the end of the day,

400.14 -> these are all things to consider when we're talking about

403.53 -> your data storage in S3.

405.617 -> Again, the deletions though,

407.55 -> those are really what we're gonna kind of focus on today

409.68 -> for the most part.

410.55 -> Software errors, whether it's a human

412.5 -> or a piece of software.

413.58 -> If it's a script, maybe it's a lifecycle policy

416.85 -> that's been misconfigured.

418.38 -> Again, we cannot distinguish

420.33 -> between a correct deletion and an incorrect deletion.

423.33 -> We have to honor that request just as we honor the request

427.29 -> to put data into S3.

429.66 -> So when we look at that, we now have accidental data loss,

432.54 -> we have software errors,

433.95 -> and we also have what we call bad actors.

436.71 -> These could be employees that have, again,

439.56 -> authorization to the data through our access controls,

443.73 -> our IAM policies,

445.65 -> but their intent as a bad actor is to do something malicious

449.16 -> with that data.

449.993 -> Again, we can't differentiate

451.2 -> between their delete request

453.21 -> and a properly authorized delete request

456.03 -> that's not coming from a bad actor.

458.22 -> So we have to look at all of these possibilities

460.95 -> and then offer additional layers of protection inside of S3.

465.42 -> And that's really where this kinda,

466.95 -> it looks like an everlasting gobstopper type approach here,

469.62 -> but your data is at the center.

471.21 -> And the first layer of protection that we always talk about

473.91 -> is, again, your access controls.

475.86 -> S3 by default is secure.

477.96 -> You're not gonna access that data

479.91 -> unless you have been authorized through IAM.

483.24 -> Now, when we look at that though,

485.73 -> if you are properly authorized,

487.44 -> once you've got access to the data, what can you do with it?

490.35 -> An accidental deletion, again,

492.33 -> is no different to S3 than a malicious deletion.

495.15 -> We can't distinguish between the two.

497.55 -> So that's really where something like object versioning

500.01 -> is the beginning of that journey

502.05 -> where you can now set S3 to go ahead

504.69 -> and start preventing accidental deletions

508.32 -> by keeping every version that is created.

511.5 -> Is it an override? We will create a new version.

513.69 -> Is it the deletion? We will then put a deletion marker

516.84 -> as the current version in your S3 version object stack.

522.27 -> We keep track of every single version.

524.82 -> We always essentially append in object versioning

528.51 -> once that's enabled on your bucket.

530.37 -> From there, we then need to talk about malicious deletions.

534.15 -> So we've got accidental deletions covered.

535.92 -> What about malicious deletions?

537.3 -> Malicious deletions, that is the layer that we

540.99 -> then bring to bear.

542.58 -> This is again, an opt-in feature called object lock.

545.16 -> And object lock release becomes sort of a defacto standard

547.74 -> for immutable storage in the cloud.

550.41 -> Object lock can be enabled at the bucket level

552.51 -> or a per object level.

554.4 -> And once you have an object locked placed on an object,

557.91 -> it's a retain until date.

559.65 -> That object cannot be deleted by AWS personnel.

563.22 -> It cannot be deleted by your root account.

566.58 -> So we can now prevent even malicious bad actors

569.79 -> from coming in and performing an intentional delete,

573.72 -> even though they're properly authorized

575.82 -> through IAM policies.

578.58 -> We'll also be talking a little bit about replication.

581.43 -> Now, if you need to prevent,

584.55 -> and essentially if you need to have that data available

588.36 -> for compliance reasons or have that data available

591.54 -> in a different region for resiliency,

594.15 -> that's where S3 replication comes in.

596.55 -> Now, replication is there, it's not going to be a backup.

599.703 -> It's not gonna be essentially managed

601.53 -> or independent from S3.

603.63 -> So when we look at all three of these here,

607.08 -> object versioning, object lock and replication,

610.02 -> these are all opt-in features

611.46 -> that you have to turn on in S3.

614.67 -> They're not necessarily centrally managed.

616.89 -> They are enabled essentially per bucket inside of S3.

621.09 -> Now, to look at all of that together,

622.86 -> Mark's gonna be talking a little bit about

624.54 -> another feature that we launched.

625.89 -> It's one of my favorite things to talk about

627.39 -> called S3 Storage Lens.

629.46 -> It gives you a bird's eye view, a central view

631.89 -> of all of those features and essentially what percentage

634.83 -> of your data is versioned?

637.26 -> What percentage of your data is object locked?

640.14 -> What percentage of your data is replicated?

643.05 -> And Storage Lens is something

644.25 -> that you can actually get up and running by dinner tonight.

648.42 -> It has 14 days of historical data ready to go.

651.09 -> Mark's gonna show you how Cox Automotive uses it,

654.27 -> but it gives you a bird's eye view

656.94 -> of all three of those data protection capabilities.

660.99 -> But what we've talked about so far is all native to S3,

664.62 -> but mistakes again still happen.

667.35 -> None of those are a replacement

668.7 -> for a true traditional backup that is independent

671.34 -> and centrally managed.

672.69 -> And that's really where we talk about solutions

676.11 -> with partners such Clumio.

678.21 -> And that's where we'll be talking about

680.64 -> Cox Automotive's journey with Mark here today

682.92 -> about why they chose Clumio to supplement

686.22 -> those existing data protection features

688.05 -> that they're already using in S3,

689.88 -> but why Cox wanted to cover that specifically

693.12 -> and look at a solution like Clumio

695.73 -> to actually create an independent backup copy of S3, Mark.

701.4 -> - All right, thanks Peter.

703.89 -> All right, my name is Mark Huber.

705.99 -> I am the Director of Engineering enablement

709.53 -> at Cox Automotive.

711.24 -> So Cox Automotive, who are we?

714.18 -> Not maybe a household name that you would know,

717 -> but you probably know some of our public faces

719.7 -> like Autotrader.com and Kelly Blue Book.

723.12 -> You look to see all those logos across the bottom.

725.1 -> There's 20 more that you probably have never heard of

727.17 -> unless you're in the automotive industry.

729.39 -> And that's where we are.

730.62 -> We are on a mission to transform the way the world buys,

734.04 -> sells, owns and uses vehicles.

738.27 -> If you've really kind of paid attention to what's going on

740.25 -> in the industry in the last five years,

743.07 -> we're seeing a whole new model.

744.6 -> Not the classic model of going out and buying a car

747.27 -> at your local dealership,

748.95 -> everything you see from ride share to fleet rental,

753.84 -> it's completely changing the dynamics

755.73 -> of the automotive industry.

757.26 -> And Cox Automotive is there leading the charge

760.5 -> in how the industry operates,

763.77 -> how people access transportation, whether it's clean energy,

769.2 -> and green vehicles or how they use at least vehicles,

775.41 -> not a month by a time, but maybe hour at a time.

779.88 -> We're looking at all the new ways

781.41 -> that the automotive industry is starting to think about

784.59 -> how we utilize vehicles.

788.43 -> In doing that, we're actually, you may not know this,

792.63 -> the Cox organization is over 120 years old,

795.87 -> originally started in the newspaper industry

799.17 -> and we've evolved through many different parts.

800.97 -> You see Cox out here, that's Cox Communication,

804.33 -> one of our sister organizations.

806.31 -> And we're in many other parts of the industries as well.

809.91 -> Cox Automotive has come together probably over more

812.73 -> of the last 50 years.

814.62 -> And we are really an aggregation of many teams

818.43 -> all over the United States,

820.98 -> all over North America, South America,

825.24 -> all over the world, internationally.

828.39 -> And so that has come together over so many years.

831.09 -> Remember, you know, everyone knows Kelly Blue Book.

833.22 -> Remember it's the actual book,

835.5 -> how long has that been around? That's us.

837.96 -> And we've evolved that into KBB.com, instant cash offer,

841.32 -> these things you see today.

843.69 -> Behind the scenes,

845.25 -> that's over 500 software engineering teams

848.64 -> that have come together

849.96 -> to start collaborating more as one engineering organization.

852.807 -> And that's a journey that we have been on

855.623 -> and we've been a journey to the cloud into AWS

859.83 -> for approximately the last seven years.

861.9 -> And I've had the pleasure of being a part of that exercise

865.8 -> of bringing disparate engineering teams together

869.67 -> to operate more as one engineering organization

872.49 -> to fulfill that vision and mission of Cox Automotive.

876.48 -> In doing that, we brought a lot of software along the way

880.89 -> to operate more as one engineering ecosystem,

884.73 -> modernizing and making more cloud native applications,

887.61 -> moving from on-premise data center environments,

890.82 -> mainframe systems, up into systems now based in Amazon S3

895.17 -> Lambda using DynamoDB, we have an internal initiative

899.61 -> called Serverless First.

902.04 -> And really driving the change the way we build

904.08 -> an architect software,

905.85 -> but not just which technologies we use.

909.72 -> About four years ago we made a big shift in focus

913.32 -> and adopted the AWS well-architected framework

916.89 -> as really sort of our guidestone for how do we think about

919.65 -> what does good software look like?

922.05 -> The pillars of operational excellence,

923.88 -> security, reliability, performance, cost,

927.06 -> and now sustainability,

929.04 -> really shaped the way we look at what does a good,

932.82 -> well-architected piece of software look like?

935.55 -> And we've used that as a benchmark

938.82 -> for all of these different teams,

940.23 -> all these different software systems.

942.27 -> And now deploying all of that software

945.12 -> in over 1300 AWS accounts,

948.45 -> that's really allowed us to come from sort of all walks

952.05 -> of engineering life, all different engineering cultures,

955.62 -> and start to operate more like one engineering organization.

959.67 -> That's been our journey.

960.51 -> That's been our story.

962.22 -> And I've had the pleasure to be a part of that, so.

966.96 -> Data is a huge part of what makes that mission

970.59 -> and that journey a possible reality.

974.22 -> We got some numbers up on here and these are pulled

976.5 -> from Storage Lens like Peter was talking about.

980.76 -> I have the line of sight over those 1300 AWS accounts

988.02 -> to look at our data real estate at scale.

992.19 -> We store close to 20 petabytes in S3

997.05 -> across those 1300 accounts,

999.51 -> that is across those accounts.

1001.16 -> That's not 20 petabytes in one bucket,

1003.68 -> that's not 20 petabytes in one account.

1006.26 -> That's 20 petabytes spread across 1300 AWS accounts,

1010.43 -> spread across over 33,000 different buckets.

1016.16 -> That's 153 billion

1019.67 -> objects. - It is, G,

1021.5 -> in this slide is a B, so that's billions.

1024.74 -> - So we are one tiny little drop in that bucket of,

1028.04 -> no pun intended, of 280 trillion objects stored

1031.16 -> in S3. - We appreciate that, Mark.

1032.605 -> - (chuckles) The average object size is 142 kilobytes.

1040.76 -> How could I even begin to understand that

1043.37 -> and wrap my head around that

1045.56 -> without the tool sets that Storage Lens gives me

1047.87 -> to understand how those 500 different Scrum teams

1052.43 -> are building software and operating on data.

1056.27 -> The ability first to just understand your state

1059.54 -> is where we started.

1060.8 -> Because 10 years ago,

1062.63 -> I can't say that we did understand our state,

1065.24 -> moving to the cloud, using tools like S3 and Storage Lens

1068.48 -> has made it it possible to just begin to even understand

1071.54 -> what that looks like.

1073.4 -> Now, I know, Peter, you just had a launch

1074.98 -> of some new features with Storage Lens.

1077.115 -> I'll confess I haven't even had a chance

1078.92 -> to read the blog post yet.

1080.33 -> Do you wanna talk a little bit about some of the new metrics

1082.07 -> that come out?

1082.903 -> - Sure, thanks Mark.

1083.736 -> So 34 new metrics in Storage Lens.

1086.72 -> I think I have that right.

1088.01 -> 34 new metrics.

1089.39 -> And again, the beauty of Storage Lens is

1092.51 -> you can get it up and running in just a matter of minutes.

1095.3 -> All those dashboards are ready to go

1097.28 -> with those metrics with 14 days of historical data.

1100.01 -> So if you're not using Storage Lens today,

1103.52 -> that would be my one ask is look at Storage Lens

1106.7 -> and then look at enabling object versioning.

1109.34 -> Now I'm gonna sit down and done the commercial,

1112.43 -> but thank you. - Well,

1114.41 -> that's a good commercial.

1116.18 -> But that is actually a true story for us.

1118.61 -> So getting into talking about,

1121.85 -> how we started talking about a backup strategy,

1124.19 -> those hundreds of teams

1125.882 -> are doing their local backup strategy.

1128.84 -> They have operational backups,

1130.58 -> they're thinking about what happens in the event

1132.32 -> of a disaster.

1134.57 -> But we started really doing some threat modeling

1137.78 -> to understand some of these more nefarious scenarios

1141.56 -> like ransomware.

1143.03 -> And the starting question was,

1146.87 -> how much data do we even have

1148.16 -> that we have to even be concerned about?

1150.44 -> And it was, someone asked me a question

1152.69 -> and I went scrambling for an answer,

1154.73 -> and a couple hours later we turned on Storage Lens

1156.89 -> at the organization's level

1158.75 -> and we had starting dashboards right outta the box

1161.21 -> and helped me start thinking about

1162.86 -> how to organize a backup strategy.

1166.88 -> Very quickly, we got into that backup strategy

1169.28 -> and we came to a couple of interesting conclusions.

1172.64 -> Cox Automotive is not trying to transform the way

1175.28 -> the world backs up data, that's what Peter does.

1178.16 -> That's what Woon is doing.

1180.74 -> We are trying to transform the automotive industry.

1182.63 -> It's not our core competency.

1184.67 -> We did actually briefly have the conversation

1187.4 -> about should we build this ourselves?

1189.74 -> Maybe we could build this,

1190.97 -> but it's not what we're here to do.

1193.37 -> So we like to partner with organizations like Clumio

1196.97 -> to bring them, waking up every morning

1200.66 -> thinking about that problem,

1201.95 -> about how to maximize the efficiency of a backup.

1205.52 -> And we partner with them so that we can focus

1208.49 -> on our core competency.

1210.68 -> The ransomware risks are real.

1212.72 -> You read what's going on in the industry.

1216.38 -> We ran a lot of threat scenarios internally

1218.48 -> and saw that this was a real potential risk.

1220.73 -> Our AWS multi-account strategy running across 1300 accounts

1225.59 -> really helps mitigate that risk.

1228.08 -> But that wasn't enough

1229.13 -> 'cause it only takes one account

1231.05 -> on that critical public facing property

1235.61 -> to put you on the headlines of the Wall Street Journal.

1242.42 -> It's a very complex problem, technically speaking.

1245.87 -> We did the simple math.

1246.89 -> We did the simple way of thinking about it.

1248.42 -> We have 20 petabytes of data and it costs us this much,

1252.44 -> we need to back it all up.

1254.75 -> So we were gonna spend twice as much in what we're doing.

1258.26 -> The simple backup strategies would say yes.

1262.34 -> But we saw with Clumio, was that they thought

1265.13 -> about that problem a lot harder than we had.

1268.37 -> And the results we're getting with Clumio

1271.28 -> in the efficiencies, even in the dollar

1273.59 -> in how we are approaching our backup strategy.

1275.75 -> Now, change that 2X dynamic, it's not like that.

1283.37 -> Thinking about when we first met Clumio,

1285.5 -> when I first met Woon,

1288.47 -> what we saw was exceptional competency

1291.62 -> in the space of data and thinking about

1294.11 -> how to handle and manage data.

1298.19 -> The things that Woon's gonna show in the demo in a bit,

1300.65 -> the way that they manage Clumio partitions,

1304.13 -> the way they think about reorganizing data

1306.53 -> for optimal storage and cost efficiency in it,

1310.07 -> is something that we had never even considered.

1314.3 -> And here's the more important part, that was simple to use.

1318.77 -> Part of the hardest thing I have to focus on

1320.96 -> when thinking about a backup strategy

1322.55 -> is bringing those 500 teams together

1325.46 -> and getting them to take action.

1328.22 -> Getting them to put a story in their backlog.

1331.19 -> Getting them to all collectively focus on and engage

1334.22 -> in a backup strategy.

1335.48 -> Whatever we brought them, it had to be simple.

1340.16 -> It had to solve our problems

1343.07 -> around needing a truly air-gapped solution

1347.59 -> for that ransomware scenario.

1350 -> That was what kicked off our focus on data.

1353.51 -> Now, air gap is a word that's been around our industry

1357.89 -> probably for 30 plus years.

1360.26 -> It's this very traditional idea of no wires connecting

1364.13 -> these two environments.

1365.54 -> No way in or out.

1366.86 -> Someone compromises your operational environment,

1372.74 -> there's no way that they can compromise

1375.41 -> your data vault environment.

1377.9 -> And that's important.

1379.07 -> It's an important component of our strategy

1382.04 -> to mitigate the risk around a ransomware event.

1385.94 -> But as we looked at solutions, there's a tough part of that.

1390.38 -> Many of the ways of doing that

1392.33 -> take it outside the four walls of AWS.

1396.35 -> It takes it outside the technology

1399.44 -> that gives us the durability and the resiliency guarantees,

1403.31 -> traditionally speaking.

1405.98 -> We didn't wanna lose the benefits of how we store

1408.74 -> and manage data in S3 in our own accounts

1412.25 -> when working with a backup solution.

1414.71 -> And so we very specifically went searching for a partner

1417.77 -> who was focused in natively developing

1420.74 -> air gap style technologies working inside the AWS ecosystem.

1426.23 -> And that's what we found when we looked at Clumio.

1429.5 -> It had to perform when we're talking about this much backup

1432.44 -> at this much scale,

1433.7 -> these are running operational systems

1436.19 -> with very tight RTO and RPO concerns.

1439.19 -> We need to make sure that we can get those levels of return

1443.24 -> to operation and recovery point

1445.34 -> without impacting 24/7 running production workloads.

1451.04 -> And finally we needed good partners.

1454.16 -> And this is probably the most important part to us.

1459.2 -> When we first met the Clumio team,

1461.72 -> it did not do everything that we needed it to do.

1465.56 -> It didn't work the way we needed it to work

1468.74 -> to easily roll it out to 500 teams across 1300 counts.

1473.75 -> So we partnered and we worked closely with their team,

1476.66 -> worked personally with Woon,

1478.07 -> worked with much of the Clumio team here today.

1480.77 -> And very rapidly, we innovated.

1482.66 -> They understood our problem, they understood our objectives

1486.92 -> and they shaped their product quickly.

1489.98 -> And I can tell you working with other partners quickly

1491.99 -> is not always the operative word in that sentence.

1495.997 -> So that partnership, thank you,

1497.96 -> thank you to the Clumio team.

1499.64 -> It's been excellent.

1502.76 -> So what are the results of all of this?

1504.41 -> We've been working with them for roughly the past year,

1508.16 -> being able to standardize a data protection capability

1511.88 -> in over 1300 accounts.

1515 -> I talked about that well-architected framework

1517.46 -> and the idea of workloads.

1519.26 -> These are all the different collective software systems

1522.08 -> that we have based on something we call our core program,

1526.34 -> which is Cox Automotive's Observability

1528.71 -> and Resiliency Engineering Program.

1531.59 -> We've been able to rate those 600 workloads

1534.26 -> to understand which of them are most critical

1537.86 -> for these resiliency capabilities.

1539.66 -> The first in the line to be backed up

1542.39 -> to have full ransomware protection.

1545.06 -> And that's made us a real actionable game plan

1547.79 -> to start rolling out the technology

1550.34 -> to these workloads in the accounts.

1553.25 -> We worked with the Clumio team

1554.57 -> to develop a Terraform provider.

1556.49 -> We terraform all 1300 AWS landing zones in those accounts.

1561.71 -> And so now our provisioning process that provisions

1564.71 -> a new account, we do that on a multiple times a week.

1568.37 -> Every single account comes fully integrated

1571.64 -> with the Clumio Stack.

1574.28 -> We use a tagging strategy developed with,

1577.73 -> to match the protection group patterns within Clumio,

1580.7 -> that means it's as simple for a team

1583.43 -> as putting a tag on a bucket to start backing up.

1587.03 -> No plugging in the tool,

1589.01 -> no integrating, no setup,

1592.34 -> they simply drop a tag and it starts backing up.

1595.58 -> And that was the kind of simplicity

1597.14 -> that we were looking for.

1600.922 -> The way that Clumio looks at cost in their credit system

1604.88 -> allows us to consume as we go and clearly track which teams

1610.25 -> are consuming how much of the backup capacity.

1613.76 -> It makes us effective in that cost pillar

1616.61 -> in the well-architected framework,

1617.87 -> really to understand how that money's being spent,

1620.48 -> not spend and waste while the product sits on a shelf,

1624.77 -> but buy as we go and buy as we need and that's been great.

1630.14 -> And we've developed a lot of new interesting features

1632.36 -> with them, especially around that last one.

1634.37 -> B-Y-O-K, Bring Your Own Key,

1636.95 -> talked about that native part of a solution

1641.15 -> working within the AWS space,

1643.4 -> working directly with AWS KMS.

1647.09 -> We are able to mutually share a pair of KMS keys,

1651.68 -> one owned by Cox Automotive, one owned by Clumio,

1655.49 -> which makes sure the right level of balance.

1658.1 -> The idea that I can take my critical data

1661.1 -> and put it in the hands of a trusted partner like Clumio,

1665.18 -> but still have the power to have control

1667.73 -> while it's left my four walls,

1670.7 -> to shut down the access to that data

1672.98 -> by revoking that encryption key.

1674.81 -> Patterns like that developed mutually,

1677.66 -> give us the right balance where I can trust

1680.6 -> what is an outside entity

1682.79 -> to handle my most sensitive and critical data.

1687.38 -> So I'm gonna stop there,

1689.78 -> I'm gonna hand it over to Woon

1691.37 -> and he's gonna show us how the magic happens.

1693.02 -> - Thank you.

1694.33 -> So I start with what's Clumio?

1696.5 -> So Clumio is a data protection service created on AWS

1700.13 -> to support the workloads and data sources running on AWS.

1703.52 -> To that end, we wanted to create a service

1705.59 -> that is as scalable and elastic as all the data services

1709.16 -> and applications that you're running

1710.81 -> on the public cloud on AWS.

1715.01 -> Before I jump in,

1716.42 -> I just wanna share a few observation

1718.94 -> that makes data protection for S3 so challenging.

1722.09 -> Like Peter mentioned,

1723.32 -> S3 is being used very widely

1725.75 -> because of the flexibility, simplicity, scalability,

1728.96 -> performance, and the various point of integration

1731.75 -> makes S3 a natural choice when it comes to be

1734.75 -> the application primary data store,

1737 -> especially for those modern applications

1739.04 -> that were actually born in the cloud.

1741.92 -> What that means to us is that now,

1744.26 -> we have buckets where objects are being created

1748.13 -> by hundreds of thousands, if not by millions.

1750.74 -> They're always created per hour.

1754.01 -> We also see buckets that are huge,

1756.41 -> they contain easily one, two billions of objects.

1759.53 -> Moreover, we have customers coming to us

1761.63 -> with 10, 20, 30 billions of objects per buckets.

1765.14 -> We also see a lot of variety

1767.15 -> because of the different use cases they have data

1770.12 -> with different characteristics and different requirements.

1772.97 -> Let me just give you one example.

1774.56 -> Let's say you have a bucket with 1 billion objects.

1776.99 -> I'm pretty sure that within that bucket

1779.06 -> you will have some objects,

1780.53 -> yet you probably don't care backing them up,

1782.99 -> but you'll have another subset of the objects

1784.94 -> that you are required to keep a second copy

1787.43 -> and you wanna retain it for one year.

1789.74 -> Even more, you may have another subset of objects

1792.47 -> within the same bucket that you would like to back it up

1796.7 -> and this time, you wanna retain it for seven years.

1799.88 -> And we wanted to create a service that is as flexible

1802.46 -> so that you can actually satisfy

1804.35 -> all the compliance requirement and at the same time,

1807.11 -> optimize the cost by not backing out what you don't need to.

1812.78 -> While looking at all of these things,

1814.94 -> you see that it is very hard for a single solution

1818.15 -> to be the magic bullet that solves all of the problems.

1821.3 -> Another example is depending on the use cases

1823.88 -> and how you use your buckets, many times it is not ideal,

1827.42 -> or really practical for enabling things like object locking.

1831.35 -> And at Clumio, what we try to do is to create a service,

1834.26 -> another tool in your tool arsenal, in your toolbox

1837.68 -> along the features that Peter talked about earlier,

1841.28 -> which you can use and achieve data protection

1843.53 -> for your data in S3.

1848.362 -> So I'm gonna start with the high level architecture overview

1851.96 -> and then we'll keep going down the road.

1854.24 -> First of all, on the left hand side, we have ACME.

1857.48 -> That's a hypothetical customer

1859.67 -> and they're looking to protect a big bucket in the middle.

1862.76 -> The way that they install Clumio

1864.77 -> is by deploying a cloud formation template or a Terraform.

1868.34 -> Once that confirmation template is deployed,

1870.68 -> we go ahead and install an IAM Role and an EventBridge.

1874.88 -> That's all.

1876.08 -> In your account, the only Clumio footprint

1878.36 -> is literally just that IAM Role and that EventBridge.

1881.99 -> While that is getting deployed on Clumio side,

1884.78 -> we actually go ahead and create a dedicated AWS account

1887.72 -> for the customer, ACME.

1889.34 -> the way that we segregate data across the customer

1891.74 -> is by actually creating dedicated AWS accounts

1894.44 -> for every customer that onboards with Clumio.

1897.53 -> - And Woon, I'll say,

1900.47 -> there was hesitation with the idea of moving data

1904.91 -> into your platform because it could be co-mingled

1907.73 -> with other customers.

1910.19 -> You're using the same pattern that we use

1912.44 -> to create segmentation to create isolation

1915.89 -> for Cox Automotive's data separate

1918.17 -> from your other customers.

1920.69 -> And still get all the perimeter security

1923.12 -> that it gives in isolation.

1924.89 -> So this was a game changer that made our many parts

1928.58 -> of our organization much more comfortable

1931.25 -> with the idea of working with a third-party for data backup.

1934.85 -> - Yes, a lot of the things that we do

1936.41 -> is really security first.

1939.77 -> And all the access that happens from the Clumio side

1942.89 -> back to the customer accounts

1944.11 -> is really through that IAM Role

1946.16 -> and we will use that EventBridge

1947.78 -> to capture the changes that are happening in the bucket

1950.93 -> that we're backing up.

1952.13 -> And we're also integrating with S3 inventory

1954.5 -> to capture the catalog of the object

1956.84 -> that you have in that bucket.

1959.12 -> And then as you can see,

1960.92 -> and then a lot of the data processing happens

1963.29 -> through Lambda functions that they all run

1965.36 -> on our side of the account.

1967.34 -> Everything starting from data movement, verification,

1970.91 -> optimizations and indexing.

1972.89 -> As you can see, a lot of the complexity

1975.02 -> is on the Clumio side and that's by design.

1977.45 -> We wanna deliver Clumio as a service,

1979.55 -> to that end, we wanna keep everything that is complicated

1982.16 -> to our end and leave the customer side of the account.

1985.91 -> Pretty simple.

1986.743 -> Just IAM Role and an EventBridge.

1990.546 -> - And it's worth pointing out.

1991.379 -> That means near zero operational cost.

1993.95 -> Our AWS bill, we see near zero cost

1997.16 -> for you operating within our accounts.

1999.74 -> The compute resides with you and we buy credits from you.

2004.3 -> - Correct.

2005.133 -> All the patching, troubleshooting, all the observability

2008.38 -> is actually built right there in our side.

2013.48 -> So let's talk about a little bit about backup

2015.19 -> and know a little bit about the flexibility

2017.08 -> that's a what to backup.

2020.38 -> We wanna optimize costs for our customers

2022.51 -> and one of the easiest ways to do it

2024.31 -> is to allow the customers to tell us

2026.23 -> what's important to them.

2028.09 -> What we allow you to do is to specify a set of filters.

2031.69 -> You tell us what prefix, what timestamps,

2034.99 -> and what storage passes that you wanna back.

2037.84 -> And this is the way that you optimize both for cost

2040.75 -> and also for time that it takes us to do the backup.

2043.45 -> 'Cause again, given a bucket with a billion objects,

2046.42 -> you may not wanna back up everything.

2049.48 -> Conceptually, it's pretty simple.

2051.22 -> You take a long list of objects

2052.96 -> and you apply Lambda function

2054.67 -> that goes ahead and applies those filter functions.

2057.16 -> And then it outputs another smaller, shorter list

2060.61 -> that contains the list of objects that passes the filter.

2065.17 -> We also validate the inventory on a daily basis.

2067.99 -> Remember we're actually doing continuous backup,

2070.06 -> we're integrating with EventBridge and that's how we know

2072.79 -> when objects are getting created or deleted.

2077.62 -> But we wanna make sure that the backup really captures

2081.49 -> hundred percent of all the objects that you have.

2084.85 -> If for whatever reason were to drop one message means that

2089.14 -> that one message or that one object never gets backed up

2092.71 -> because we are doing continuous backups based on that event.

2096.88 -> What we wanna do is to actually go ahead and check,

2100.09 -> crosscheck against the S3 inventory on a daily basis

2103.48 -> and then detect if there's any objects that are missing,

2106.45 -> we'll go ahead and do what's called a catch up backup

2109.3 -> or fixed backup to make sure that everything is captured.

2113.92 -> In other words, a backup that captures 99.99%

2117.28 -> of your objects, it is still a failed backup.

2119.53 -> You do want your backup software to capture 100%

2123.31 -> of the objects.

2126.19 -> The way that we achieve this is that

2128.07 -> we developed a technology that we allow us to compare

2131.35 -> the catalog of the objects that we have in our backup

2134.11 -> against the S3 inventory.

2135.82 -> We will actually get two lists

2137.59 -> and then we will detect whether there's a missing object

2140.53 -> and if there is one,

2141.82 -> it will output again, a shorter list,

2143.92 -> which it will be a shorter list and it will actually do

2146.47 -> the closed backup based on that shorter list.

2150.16 -> Again, think about it,

2151.18 -> the chances of dropping that event is very small.

2153.55 -> It will actually be 0.001%,

2156.52 -> but if you have 10 billion objects,

2158.8 -> that actually is a decent number.

2162.73 -> And again, conceptually things are very simple,

2165.19 -> but if you're processing tens of billions of objects,

2167.413 -> it's not so easy

2169.06 -> because even the list of objects, they're terabytes in size.

2172.51 -> And then we need to run those filters concurrently

2174.73 -> using multiple Lambda functions at the same time

2177.7 -> so that we can actually perform backups in a timely manner.

2182.77 -> Moreover, comparing a very long list of objects is, again,

2187.06 -> very, very challenging.

2188.35 -> If you have two lists with 30 billions objects on each side

2191.89 -> and finding which one is missing,

2193.6 -> it's actually very, very challenging to do.

2197.05 -> Some of the technologies that we've built here also allow us

2200.05 -> to support and do backups in buckets

2203.59 -> that do not have versioning enabled, for instance.

2206.38 -> Because again,

2207.28 -> it may not be the most practical things to do in some cases.

2212.77 -> Let's talk about ingest.

2214.3 -> So now we know that we collected,

2216.1 -> we have the list of objects that needs to be backed up.

2219.43 -> So how hard could it be?

2221.11 -> You know that the objects are organized in prefix

2223.75 -> and all we need to do is to actually fire up

2226.33 -> a bunch of Lambda functions

2228.07 -> and get the objects from one side

2230.26 -> and move them to the other side.

2232.84 -> Actually, it turns out to be that it's not that easy.

2235.42 -> If you fire up all those Lambda functions

2237.46 -> and they all start working on the same prefix or partition,

2240.88 -> you're gonna get a lot of API throttle.

2243.22 -> You can increase the Lambda functions,

2245.11 -> but you're not gonna make things any faster.

2246.91 -> In fact, all you're gonna get is just API throttles.

2250.18 -> So what we ended up doing is that we introduce

2252.97 -> a notion of Clumio partition.

2255.28 -> We looking at the list of objects that needs to be backed up

2258.52 -> and based on some heuristics,

2260.29 -> we determine the Clumio side of partition

2263.2 -> and we actually schedule the Lambda functions

2265.21 -> across the different partitions.

2266.92 -> What that allow us to do is now,

2269.02 -> we can actually have these Lambda functions

2271.36 -> working on different parts of the key space

2274.323 -> without choking a single prefix.

2277.99 -> But again, at the same time, remember,

2280 -> we're not the only one using that bucket.

2282.19 -> We're a backup.

2283.24 -> There's the primary application that is currently using

2286.3 -> that bucket and we're continuously monitoring API throttle.

2290.35 -> And if we see that we're actually getting API throttles,

2293.41 -> it could be because the primary application is using it,

2296.5 -> we go ahead and reschedule things.

2298.54 -> What that means is that we can actually give a little bit

2301.21 -> of a break to that blue partition out there

2303.28 -> and while we actually do a little bit more work

2305.56 -> on that yellow partition.

2307.12 -> And we continuously do that as part of the backup

2310.03 -> so that we can actually satisfy the backup performance

2313.21 -> and at the same time, allow the primary workload

2316.18 -> to actually have enough request per second

2319.15 -> to carry out the work.

2321.46 -> - And that's for, I'll say,

2322.66 -> that was one of our early concerns.

2324.64 -> There would be no worse outage bridge to be on to say,

2329.38 -> our software is down because the backup is running.

2332.62 -> And to have it intelligently understand our workload,

2336.64 -> and adjust accordingly,

2338.53 -> removes the need for us to have that,

2340.69 -> to factor in that concern.

2341.89 -> We no longer think about,

2343.591 -> well, I have to schedule these at 3:00 AM

2345.46 -> when my workload is low

2347.26 -> 'cause we're competing for resources.

2349.3 -> - Yeah, but remember,

2350.413 -> S3's being used as the primary data store.

2353.38 -> So you have the application that is continuously reading

2356.11 -> and writing to that bucket.

2357.76 -> And if the backup comes in and it takes up all the APIs,

2361.39 -> all the provisioned APIs, then yes, we can't do that.

2364.84 -> So we allow you to, you know,

2366.31 -> we monitor the API throttle, we reschedule things,

2368.86 -> we allow the customers to even specify

2370.93 -> what is the maximum allowed API rate

2373.66 -> that we're allowed to consume.

2376.18 -> So let's talk about restore.

2377.65 -> It's very similar to the backup side of the house.

2380.32 -> Again, we're super focused in actually optimizing things

2383.83 -> for our customers.

2384.94 -> And the way that you do that is to actually restore

2387.34 -> only what you need.

2388.69 -> Like if you have 10 billions of objects backed up in Clumio,

2393.01 -> restoring all that, it does take time

2395.137 -> and it is actually time consuming and resource consuming.

2399.16 -> The best way to actually avoid that

2400.66 -> is to restore exactly what you need.

2402.97 -> You can restore objects by prefix.

2404.98 -> You can restore objects based on specific timestamps, tags,

2409.42 -> there's a variety of the filters that you can specify

2412.33 -> and we will only restore those objects that are required.

2415.6 -> This allows you to actually restore the objects

2418.21 -> that you needed right now

2419.71 -> and restore some of the other objects

2421.35 -> at a later point in time.

2424.09 -> The way that we achieve that is to basically

2426.52 -> working through the metadata.

2427.99 -> For every object that we back up,

2429.76 -> we maintain metadata entry for every object.

2433.3 -> For every metadata blob,

2434.68 -> we actually maintain the metadata in parquet files

2437.38 -> and we heavily use AWS Athena to query that metadata engine.

2442 -> We know that Athena behaves better

2444.1 -> if you have metadata objects that are somewhat bigger

2446.62 -> in the order of hundreds of megabytes.

2449.53 -> So what we do in the backend

2451.21 -> is that using various Lambda function,

2453.13 -> we're continuously optimizing that metadata payload.

2455.89 -> We're combining the parquet files,

2457.87 -> we're actually partitioning them differently

2459.85 -> so that when it is time to query,

2461.35 -> it is actually readily available

2463.42 -> and we can return that list of objects

2465.37 -> to be restored very, very quickly.

2467.59 -> And that's something that we continuously do in the backend

2470.62 -> for our customers.

2472.63 -> If you see, these are all challenges that happens

2475.81 -> when you have a lot of objects.

2477.25 -> And things are many times you don't think about it

2480.88 -> when you start thinking about implementing your own.

2486.76 -> So let's talk about observability.

2488.92 -> I mentioned that we are a service,

2491.74 -> we wanna own all the complexity ourselves

2494.38 -> because we don't wanna give it to you.

2496.87 -> We wanna be the ones that detect the failure.

2499.21 -> We wanna be the ones that troubleshoots every failure.

2501.67 -> We wanna be the one that monitors

2503.59 -> and actually patches everything for you

2505.63 -> so that you don't have to do anything,

2507.91 -> you just use it as a service.

2509.89 -> But at the same time, I just mentioned

2512.26 -> that we use thousands of Lambda functions,

2514.39 -> so different sizes and types.

2516.4 -> And at the same time, we also have hundreds of customers,

2519.55 -> meaning hundreds of AWS accounts

2521.89 -> and all of these Lambda functions,

2523.6 -> they're running across all of these hundred AWS accounts.

2527.68 -> And we have backups continuously happening

2530.59 -> all over the place.

2531.82 -> So how do we control all this?

2533.77 -> 'Cause if something fails, how do we know?

2536.35 -> Because remember, we need to be the one

2538.63 -> that detects the failure and we need to be the one

2541.06 -> that troubleshoots it and fix it

2542.47 -> so that you don't have to.

2544.69 -> For us to achieve this,

2545.77 -> we actually implemented an internal framework

2547.93 -> that we call a Clumio Workflow Engine.

2550.45 -> So for the sake of time,

2551.89 -> I'm not gonna be able to spend too much time on it,

2553.93 -> but I promise I'll actually give a live demo of this

2556.36 -> and I'll show you how it works in real life.

2561.58 -> I wanna talk a little bit about cost optimizations

2564.58 -> and also some of the confusions

2565.93 -> and questions that I get from our customers.

2568.51 -> A lot of the time, I get questions about, this is great,

2571.78 -> but it looks very expensive.

2573.52 -> And I also get question, how is this different

2575.47 -> from replication? Is this replication?

2578.95 -> So to answer that question, let me just use an analogy.

2581.89 -> I'm a big fan of books.

2583.78 -> Let's say you have a bookshelf full of books

2585.76 -> and the way you organize your books

2587.71 -> is basically based on how you would actually access

2590.5 -> those books every day.

2592.18 -> Now let's say we need to create a second copy

2595.06 -> or create a backup of those books that you have at home.

2598.06 -> One way to do that is to actually buy

2600.7 -> an exact same looking bookshelf and replicate the format

2604.06 -> to the second bookshelf.

2605.59 -> Sure, you'll end up having two copies and two books.

2608.65 -> However, if what you're looking is backup,

2611.08 -> I'll argue that is not the best way

2612.79 -> to actually achieve backup.

2614.41 -> If you wanna back it up and what you truly want is backup,

2617.38 -> the best way and the most efficient way to do it

2620.11 -> is to buy a box, stack all of those books

2622.78 -> and put that box away in the storage.

2625.78 -> That is a more efficient way to actually to do backups.

2630.79 -> Just like replication is not, you know,

2633.46 -> just like backup cannot be a replication.

2636.82 -> Replication is not an ideal backup solution either.

2639.79 -> So it is again, the right tool for the right problem.

2646.72 -> So what I'm gonna do is I'm gonna very quickly

2648.73 -> go over three announcements

2650.32 -> and then we'll actually switch it over to the live demo.

2654.01 -> So the first one is about 15-minute RPO.

2657.94 -> I'm actually super happy to announce that we fully,

2660.88 -> with the help with the S3 folks,

2662.74 -> we finished fully integrating with the EventBridge

2666.13 -> that allow us to support 15-minute RPO.

2669.79 -> What we do is that we actually performed

2672.49 -> 15-minute micro backups.

2674.29 -> So every 15 minutes we will actually go

2676.57 -> and capture those objects and we'll back it up.

2679.24 -> And then just like I told you before, every day,

2682.42 -> once we receive the daily S3 inventory,

2684.88 -> we will actually cross examine all of the micro backups

2687.97 -> that we took in the last 24 hours

2690.01 -> and we will actually compare it with the S3 inventory.

2693.04 -> And if there's anything missing, we'll at that point,

2695.89 -> do what we call the close backup and fix everything up

2699.01 -> so that you do have the guarantee

2701.05 -> that we capture all of the objects.

2703.9 -> What this means to you is that it means that

2706.39 -> in the worst case,

2707.32 -> you're losing up to 15-minute worth of data in your bucket

2710.29 -> once this is enabled.

2714.01 -> Next, with a lot of the data optimization

2717.04 -> and the partitioning, and the scheduling,

2719.65 -> we can now support up to 30 billion objects per bucket.

2724.57 -> What this means to you is that if you have a large bucket

2727.6 -> and you're struggling with data protection, come talk to us.

2732.85 -> Okay, this last one is something

2734.65 -> that is actually my favorite one.

2736.87 -> We're talking about instant access.

2739.15 -> Clumio, we are actually very, very motivated

2741.85 -> to actually optimize cost and performance for our customers.

2745.66 -> Now if you wanna restore a billion objects, what do you do?

2749.8 -> The first thing that you could do is apply filter

2752.23 -> and reduce that object comes down.

2754.27 -> That's one way to do that.

2755.59 -> But at the end of the filter,

2756.7 -> if you're left still with hundreds of minutes of object,

2759.61 -> restoring that, it will actually take time.

2763.72 -> It will take time and resources.

2765.34 -> And what we want you to do is for some specific cases

2768.1 -> such as DR testing, backup testing, or true emergency,

2772.09 -> we wanted to make the data readily available

2774.79 -> and also at a fraction of the cost.

2777.82 -> The way we do that is by a feature called instant access

2780.61 -> where we expose the data

2782.17 -> that we store in the backup directly.

2785.29 -> So we exposed an S3 endpoint that is S3 compatible,

2789.4 -> that you have the backup data there.

2792.01 -> So for example, we can take a backup

2795.07 -> or a specific prefix back, let's say six hours ago,

2799.06 -> and we will return back to you an S3 endpoint.

2802 -> If you actually connect to that S3 endpoint,

2804.22 -> it will contain all the objects in that bucket of prefix

2807.64 -> as of six hours ago.

2810.97 -> If you want it to be as of three hours ago,

2813.52 -> we can repeat the same thing.

2815.53 -> We can actually take you back to any point in time

2818.56 -> and we can actually share an S3 endpoint

2821.23 -> that contains all the data at that specific point in time.

2825.49 -> And we can do all this at a fraction of the cost

2828.85 -> and nearly instantaneously.

2831.22 -> Now, if you're doing things like DR testing, backup testing,

2834.79 -> or you're in a true emergency, this is something

2837.61 -> that could really be a life changer for you.

2841.66 -> All right, we'll do the demo.

2843.67 -> By the way, it is a live demo.

2848.56 -> All right, so what I'm gonna do is, first of all,

2851.92 -> I'm gonna log into my test cluster.

2858.16 -> Got all my passwords there.

2865.21 -> So this is kind of the home screen that you created on

2869.26 -> once you logged in.

2870.19 -> We have the ver-

2871.023 -> oops, what happened. (chuckles)

2873.16 -> We have the various dashboards and stuff,

2874.81 -> but again, for the sake of this discussion,

2876.43 -> since we're talking about data protection

2878.92 -> was quickly skip over.

2880.66 -> But this is kinda gives you a dashboard,

2882.82 -> a visibility into what's happening on your environment

2885.4 -> in terms of what's protected,

2887.95 -> how much is costing you and so on so forth.

2891.58 -> One of the first things that you will have to do

2893.79 -> is to actually register your environment.

2896.173 -> Like I mentioned, you can actually do it through

2898.133 -> a cloud formation template or via Terraform.

2901.66 -> Once you register that environment,

2903.43 -> essentially you specify the AWS account and a region,

2906.82 -> and an optional name and you will actually,

2908.92 -> the whole thing will take you no more than 15 minutes

2911.47 -> to get up and running

2913.33 -> 'cause all we are doing again is creating that role

2915.4 -> and that EventBridge on your account.

2918.22 -> Once that's done, typically what you do

2920.883 -> is that you create a policy, you name it,

2924.49 -> and essentially, let's first name it.

2928.985 -> And then you can enable the various data sources

2934.69 -> that we support.

2935.74 -> In the case of S3, we support two tier,

2938.62 -> standard and frozen with different pricing points

2941.83 -> and different retention and access performance.

2946.21 -> And then you can actually set the retention.

2948.73 -> For the sake of demo we'll skip that

2950.44 -> and we will use some of the policies that we already have.

2955.09 -> Next, if you wanna protect your S3 buckets,

2959.65 -> what you'll end up doing,

2961.96 -> the first thing that you'll end up doing

2963.4 -> is to actually create what's called the protection group.

2966.7 -> A protection group is really a combination of buckets

2969.7 -> along with the filters.

2971.26 -> This is how you're telling us what's important to you.

2974.47 -> So the way we do that, let's just name the protection group,

2978.01 -> let's call it reinvent.

2979.42 -> And then there are multiple ways

2980.44 -> that you can actually add a bucket, you can add it by tag.

2983.59 -> Let's just do it by tag.

2988.3 -> So then what happens is that

2989.8 -> it automatically searches your environment.

2991.93 -> It automatically adds all of the buckets

2994.24 -> that contains the tag reinvent demo.

2997.72 -> And then it is not only just one time,

2999.7 -> if after the protection group is created,

3001.68 -> you create another bucket with such tag,

3003.36 -> they get automatically added.

3007.26 -> You can also add these buckets manually.

3009.78 -> Let's say I just picked these two.

3014.13 -> From here, now that I select the buckets,

3016.5 -> I need to apply the filters.

3018.12 -> The way that you apply the filters,

3019.8 -> you can actually tell us whether you wanna protect

3021.75 -> all the storage classes or you wanna leave some out.

3025.2 -> Maybe you're not interested in protecting

3027.6 -> some of the One Zone-IA.

3031.5 -> You can actually tell us whether you wanna back up

3033.54 -> all the versions of the object or just the latest version.

3037.17 -> And you can also tell us which prefixes

3039.33 -> are interesting for you.

3040.95 -> Let's say protect and then within protect we can exclude,

3046.26 -> let's say the prefix junk

3049.38 -> and we can also back up the prefix important.

3057.18 -> Next, now that we specify the bucket and the filter,

3060.15 -> you have defined the what?

3062.1 -> Now we go ahead and select the policy

3064.143 -> that you wanna apply to.

3065.91 -> Whether you wanna back it up every day,

3067.65 -> we can for seven year.

3069.15 -> Now you're configuring the frequency

3071.1 -> under the retention period.

3073.26 -> So let me cancel that for the sake of demo

3075.15 -> and then we'll just use some of the protection group

3077.67 -> that I created before the demo.

3079.41 -> I have few protections group that I created,

3081.57 -> some of them in medium size, which is about 6 billion,

3084.48 -> some of them that are somewhat large

3086.22 -> in the order of 30 billion objects.

3088.26 -> And then we're gonna concentrate

3089.88 -> in these two protection groups.

3093.54 -> These two protection groups contains the two same buckets.

3096.3 -> They contain the reinvent2022clumio1 and clumio2

3100.17 -> both added by tag.

3103.23 -> Over here, this protection group, the seven year one,

3106.44 -> again, contains the exact same two buckets.

3109.26 -> And you may ask the question, why?

3111.93 -> Why do we have two protection groups

3113.82 -> for the same two buckets?

3116.58 -> The reason is that they're actually protecting

3119.7 -> different prefixes within those buckets.

3123.15 -> This one is actually protecting everything,

3125.16 -> the prefix year one, except what's under year one/temp.

3130.68 -> And with that, we're actually applying a yearly retention

3134.19 -> because that is my compliance,

3135.72 -> that is my retention requirement.

3138.93 -> Now if I actually go back and I look at the seven year one,

3143.82 -> again, it is exact same bucket.

3145.98 -> However, they are the two different prefixes.

3148.53 -> This time I'm actually protecting the seven year

3151.47 -> except the temp.

3155.22 -> And this time I'm actually retaining it for seven year,

3159.18 -> unlike the previous one that I was actually retaining

3161.46 -> just for one year.

3163.62 -> So that's on the backup side of the house.

3165.93 -> Let's just explore the different options that we have

3169.2 -> for restored process for restores.

3173.04 -> The simple one is to restore at a single prefix

3176.52 -> or entire buckets.

3177.75 -> You come in and you pick the buckets that you're interested

3182.01 -> both of them or just one of them.

3184.41 -> And then you apply filters,

3186.12 -> you tell us exactly the point in time to the second.

3190.98 -> And then moreover, you can even apply filters.

3193.98 -> And if you don't, it happens to be the entire bucket.

3197.04 -> But you can actually apply filters and we can do things

3200.55 -> like demo and we can even apply filter based on size

3206.16 -> and then do a preview.

3208.83 -> So what we do, we now query the metadata engine

3212.1 -> and you will return a preview of the object

3214.2 -> that gets restored before we actually execute it.

3218.82 -> Once you click on summary,

3220.32 -> then now you're telling us where to restore it to.

3224.07 -> You're telling us whether you wanna restore

3225.66 -> all the versions.

3228.24 -> It tells you about the protection group,

3229.8 -> the bucket that we selected, the filters.

3232.38 -> We can actually restore it back

3233.85 -> to the original source account,

3235.35 -> but we can actually restore it

3236.58 -> to any other registered source account.

3239.37 -> It doesn't have to be in the same account.

3242.01 -> You tell us to which bucket to restore it to.

3244.92 -> And you can also tell us what is the storage class

3248.04 -> that you wanna use when we're restoring it back.

3251.01 -> From there, you can also tell us whether you wanna add

3253.77 -> some prefixes in front of this object

3256.41 -> and we can even add some of the object tags.

3261.3 -> Another option is to actually restore a single object.

3264.54 -> It is pretty much the same.

3265.8 -> You specify the time and different filters,

3268.56 -> but the difference is that you get to specify

3271.92 -> the specific version in the chain of the versions

3274.68 -> for the object.

3276.87 -> Lastly, let me actually quickly show a demo,

3280.02 -> the instant access demo.

3283.17 -> So let's just do an instant access.

3285.63 -> Let's do reinventLive.

3290.25 -> So what I'm doing here is that I'm actually creating

3293.43 -> an endpoint that contains the list of objects

3296.22 -> as of that backup on the 30th that I clicked

3299.31 -> as of 3:00 PM or 5.

3301.71 -> And if I click the endpoint,

3303.78 -> we will actually go to the instant access

3306 -> and we will see the reinventLive.

3308.52 -> That being prepared as we speak.

3312.42 -> Let me just quickly switch it over, 0%.

3316.2 -> Oh, it's actually done.

3318.24 -> Okay, that was pretty quick.

3319.59 -> So it's done.

3320.82 -> So this is the live mount point,

3324.84 -> the access point that we have created.

3326.94 -> And then the way that we do it,

3328.08 -> we just copy that URL

3329.237 -> 'cause again, that URL is the S3 point,

3332 -> S3 endpoint that contains the objects at that point in time.

3335.97 -> So let me just quickly switch it over.

3339.36 -> I have a quick cheat sheet out here

3342.45 -> to help myself doing the demo.

3344.52 -> I'm just exploring a couple of environment variables.

3348.57 -> So this will be the endpoint and then the URL

3354.3 -> that I need to copy is reinventLive.

3357.36 -> Copy, copy, paste that.

3359.79 -> And that's it.

3361.92 -> Now good to go.

3363.66 -> And this is all the AWS CLI tool that we all love and know.

3371.19 -> So now I'm able to go ahead and do things like list objects.

3378.39 -> So as you can see, we can actually list the objects.

3381.75 -> And this is basically no different than you listing

3384.51 -> the original bucket,

3385.38 -> except that we are actually listing the bucket

3387.699 -> as of 30th at 3:00 PM.

3393.3 -> We can do things like head object.

3407.16 -> Let's do that...

3411.63 -> That's the head object.

3413.46 -> We can actually do head object with the different objects.

3421.62 -> That will work.

3423.158 -> As you can see, the ETag,

3424.5 -> will actually go ahead and match this 134c.

3429.33 -> We can actually even do get objects.

3437.215 -> And let me just do this one out here.

3441.69 -> And just to prove you, I have nothing.

3450.09 -> So that will be the get object,

3453 -> if you do ls, we do see that object out here

3456.243 -> that we downloaded and...

3466.119 -> And yeah, that's kinda the picture

3467.76 -> that I took out of our slides.

3469.2 -> But that's the object that I put it in S3 buckets

3471.12 -> and it showed up.

3473.64 -> That's about the instant access.

3475.83 -> It gives you point in time, immediate access

3478.74 -> to the data in the bucket.

3480.06 -> You can take it to 3:00 PM, 4:00 PM,

3482.43 -> we will actually give you that S3 compatible endpoint,

3485.04 -> nearly instantaneously and a fraction of the cost.

3489.36 -> Lastly, like I promise,

3490.71 -> let me just do a quick demo about the whole observability

3493.86 -> that I talked about a little earlier.

3497.49 -> So this is truly live.

3499.8 -> So what I'm gonna do is what you guys see as a customer

3503.34 -> is this UI, you don't see the backend,

3506.4 -> but what we see in the backend is again,

3508.8 -> the whole workflow engine that we manage day in, day out.

3512.85 -> So let me copy out this task idea here,

3515.67 -> and then it is an internal debugger that we implemented.

3521.46 -> I have to go to task.

3524.1 -> And then we'll have the different task IDs

3527.25 -> and let me just paste this one out.

3537.6 -> So this one allow us to see exactly what's going on

3543.03 -> in every single backup.

3544.86 -> We know what's happening.

3546.39 -> We're getting what we call the container information.

3549.48 -> We're setting the progresses,

3550.89 -> we're waiting for bucket configurations.

3553.35 -> We're actually doing some of the CDC queries.

3555.75 -> I mean some of the data capture,

3557.58 -> change data captures that we talked about.

3559.74 -> This is a if step.

3561.03 -> So it's actually a little bit of a programming language

3563.16 -> that I actually put together.

3564.48 -> If this step is successful, it will take the black arrow out

3567.42 -> and you execute the next step.

3569.04 -> If these steps fails, this will turn red

3571.44 -> and it will actually take the red arrow.

3573.51 -> And if the user cancels at that point,

3575.79 -> we will actually take the purple arrow out.

3579.3 -> This is an if statement where you have then and else,

3584.28 -> and this would actually be something like a subroutine call.

3588.48 -> I can actually click on it and I can go deeper

3591.45 -> as to see what's happening in that subroutine.

3593.97 -> But if anything fails, this is exactly how we know

3596.97 -> where you fail, why it fail, what was the input

3600 -> and what was the step that was executed

3601.71 -> before and what did it output.

3604.14 -> And with every failure, this is how we can troubleshoot it

3607.29 -> within minutes and not within hours.

3609.66 -> And when something fails, we get to update that one step.

3614.04 -> That's much easier.

3615.21 -> That's why we can troubleshoot it

3617.04 -> in matter of hours and not days.

3619.41 -> And we can actually do all this for you

3621.66 -> and you don't have to do it.

3623.19 -> This is a live example.

3624.51 -> So let me actually give you one of a bad example

3628.86 -> that I had collected.

3630.42 -> In this case, I actually introduced an error

3634.38 -> intentionally and I'll show you what it does.

3638.88 -> So as you can see, yellow...

3643.98 -> As you can see, this is red.

3645.81 -> We know that it has failed.

3647.43 -> Where is it fail?

3648.45 -> We know that all of these steps

3649.8 -> are actually executed correctly,

3651.75 -> but it ended up failing right here.

3654.12 -> You'll see the red border out there.

3656.67 -> So then because this has failed,

3658.44 -> we're not executing the steps below it,

3660.45 -> but instead, we're actually taking the red arrow out.

3663.78 -> And then what we are doing is such as,

3666.27 -> we're generating an event that something has failed.

3668.61 -> That's what the X is for.

3670.02 -> We're terminating the task as a failure.

3672.51 -> We're actually sending a notification to our support team

3675.15 -> that they know that something has failed

3677.1 -> and we have all of the steps that has failed.

3679.92 -> And then now, because this is the subroutine

3683.58 -> that has failed with the Lambda functions,

3685.62 -> we can actually click on it and look at it even deeper.

3689.64 -> This is the function that what it does,

3691.98 -> remember we're processing large number of manifest file.

3696.03 -> So we apply the filter

3697.35 -> using multiple Lambda functions concurrently.

3700.35 -> So we have what we call the fork and joint step.

3707.1 -> So at this point,

3708.09 -> we compute how many Lambda functions to use.

3710.76 -> In this particular example,

3712.23 -> you're using 32 Lambda's concurrently

3714.99 -> to actually filter different areas of the manifest file

3718.56 -> to apply the filter and know what to back up

3720.96 -> and not to back up.

3722.73 -> So what happens is that I introduce an error on this step,

3725.58 -> that's why this turned red.

3727.2 -> But then when all these stored in two Lambda's

3728.997 -> are actually done processing different areas

3731.4 -> of the manifest file,

3732.84 -> they all join out here and then they will actually continue.

3736.38 -> But it so happens that this step has fail.

3738.75 -> That's why we're actually taking the purple arrow out,

3741.3 -> that which that ends up failing.

3743.34 -> And then we bubble that to the parent workflow,

3745.8 -> which then takes the exit path to notify our support team.

3749.31 -> But looking at this,

3750.267 -> now we can actually debug things within minutes

3753.24 -> and we can now truly be a service

3755.7 -> and take all that complexity away from you and we own it.

3760.5 -> - And what's unprecedented is

3763.17 -> they're taking on that complexity through observability,

3767.37 -> but also transparency to us as a customer.

3770.696 -> And that makes a big difference

3771.75 -> in it not being a black box to us,

3774.75 -> so we can understand how you're approaching

3776.4 -> the problem too.

3780.57 -> - That's it.

3781.53 -> All right, just on time.

3784.26 -> Or maybe a little over.

3786.059 -> (speakers chuckling)

3788.46 -> - Thank you all for coming.

3789.69 -> We really appreciate it.

3790.59 -> And we'll be up here if you have any questions.

3792.392 -> (attendees applauding)

Source: https://www.youtube.com/watch?v=-dr71gKGZGc