AWS re:Invent 2022 - Threat detection and incident response using cloud-native services (SEC309)

Aug 16, 2023

AWS re:Invent 2022 - Threat detection and incident response using cloud-native services (SEC309)

Threat detection and incident response processes in the cloud have many similarities to on premises, but there are some fundamental differences. In this session, explore how cloud-native services can be used to support threat detection and incident response processes in AWS environments. In addition, learn how cloud-native security services can be integrated into security information and event management solutions and if a classic SIEM approach is still required. This session covers native services such as Amazon GuardDuty, AWS CloudTrail, AWS Security Hub, Amazon OpenSearch Service, AWS Shield Advanced, and more.

Learn more about AWS re:Invent at https://go.aws/3ikK4dD.

Subscribe:
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4

ABOUT AWS
Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.

AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

#reInvent2022 #AWSreInvent2022 #AWSEvents

Content

0.81 -> - Hey everyone, welcome

2.67 -> to our session today on threat detection

4.53 -> and incident response using cloud-native services.

7.62 -> My name is Margo Cronin

9.09 -> and I'm a Solutions Architect Specializing

11.13 -> in Security and Compliance.

13.08 -> - Yeah. Hello everybody.

13.95 -> My name is Armin Schneider.

15 -> I'm also a Specialist Solution Architect

16.71 -> for Security and Compliance and looking forward

18.93 -> to the session today.

22.77 -> - So this is our agenda today.

25.23 -> Cybersecurity and cyber risk has always been a topic

28.77 -> that customers have cared passionately about.

31.53 -> And now with the widest breadth of services and tools

34.49 -> in the cloud.

35.97 -> There's even more actions the customer can take

38.76 -> to mitigate some risks in these spaces.

41.31 -> So myself, Armin,

42.66 -> and other solution architects carry up many assignments

45.3 -> in this area, which brought together our session today.

49.92 -> So, today we're gonna be talking about what's different

52.82 -> in the cloud,

54 -> but also what's remains the same

55.38 -> from what you would've experienced before.

57.96 -> And we're going to look at threat detection

59.61 -> and instant response

60.78 -> in phases, preparation, detection, containment,

65.31 -> collection, analysis, and then automation

68.64 -> and remediation and post-incident analysis.

74.79 -> Actually, what we're going to use today

76.86 -> to guide our session is the NIST 800-61 lifecycle.

83.04 -> You might be using a different one

84.42 -> in your organization, that's fine.

86.94 -> What we're trying to do is we're trying to begin

88.53 -> with technical capabilities, not beginning

90.9 -> with cloud-native services.

92.7 -> What are the capabilities and the requirements you're trying

95.04 -> to drive with those cloud-native services?

101.55 -> - All right, thanks Margo.

103.38 -> So I want to take over the first part and want

105.66 -> to talk about what's different in the cloud.

107.76 -> And I think the big thing is there is just

110.13 -> an additional layer in this whole picture.

112.71 -> And this is a control plane, right?

114.96 -> And in fact, I mean this is a paradigm shift

117.63 -> in how environments operate and how they exist.

121.2 -> There is quite a lot of additional log data, which we need

123.78 -> to consider in the incidents response process.

126.54 -> But on the flip side, there's also a much better way

129.33 -> or automated way in order to react to incidents.

132.87 -> And then finally, there is also a more continuous iteration

136.38 -> between the life cycles, which we will show then

139.41 -> during the cause of the session.

142.11 -> In order to start with this, I mean we wanted to start

144.63 -> and look into the AWS global infrastructure first

148.2 -> in order to elaborate a little bit more details

150.45 -> on what really is different.

153.96 -> The stuff might known for some

155.52 -> of you, we want little take a look to it in the context

159.57 -> of incidents response and what we start

162.36 -> with the global infrastructure.

163.65 -> We want to talk about the concept of a region, right?

167.4 -> And a region for us is basically,

168.96 -> a physical location where we cluster data centers

172.23 -> and we call each group of data center called

174.51 -> an availability zone.

176.76 -> Each of our region has at least three availability zones

180.69 -> and we currently have about 96 availability zones

185.61 -> across 30 different regions, right?

189.06 -> So why is that important in the incidents response process?

193.8 -> A, in the response process, right?

195.66 -> We might want to go to a different region in case we need,

199.32 -> in the case of something has an incident

201.96 -> and something happened, right?

203.52 -> We're often seen that compromised accounts has been used

207.54 -> in regions where people are usually not using the regions.

211.53 -> Or they try to hide them in those regions.

213.57 -> So, it's really important and we come to that

215.79 -> in more detail to have those region concept in mind.

219.09 -> The next thing we want to take look

220.34 -> at is basically AWS account, right?

223.08 -> An AWS account, it's basically a natural security boundary

227.443 -> for billing and security access to your resources in fact.

233.07 -> So within account we have resources, right?

237.3 -> And this could be databases, virtual machines,

240.51 -> higher level services, storage objects

242.91 -> and so on and so forth.

244.5 -> And then we have the virtual private cloud

247.59 -> or our network infrastructure.

249.81 -> Where we have basically subnets

252.57 -> and basically replaced the traditional network terminology

256.855 -> in the cloud-native functionality basically also

259.68 -> with the scale of the cloud.

261.57 -> And then there are couple of other services

263.52 -> in those regions such as gateways

266.43 -> and other kind of functionality.

268.77 -> And within an account, we can spread

271.44 -> across many regions again, right?

273.02 -> So it's important an account is by default allowing us

275.88 -> to go all of the native regions in a commercial platform.

279.57 -> There is one thing,

280.98 -> and it's worth mentioning here, which is going

283.35 -> across region, and this is the identity

285.51 -> and access management and we will look into that later

288.72 -> on why this is important to get control, who is allowed

291.51 -> to do what in which kind of region,

294.18 -> but also on which kind of resources, especially

297.9 -> if you want to go with isolation technologies

300.87 -> in the later stage.

302.76 -> So while the account concept is our isolation layer,

306.12 -> what we're seeing on our customer side,

307.8 -> the customers are running multiple account.

309.57 -> And it's basically something we're guiding customers through

312.51 -> because typically we are saying this is a good way

314.37 -> of isolating your stuff.

316.41 -> And then basically customers are running hundreds

319.41 -> and sometimes thousands of account.

321.45 -> So in order to get that under control, we then started

324.45 -> with a service called AWS organizations.

327.03 -> Which is basically an account management service

329.82 -> that enables you to organize and manage accounts

332.85 -> across your entire stage.

335.79 -> And in order to do that, we have one specific account,

339.27 -> which we call the management account.

341.46 -> And then we have things like organizational units

343.824 -> and then we have sub-organizational units

346.53 -> and we're having accounts within those organizational units.

349.17 -> So this is basically the structure and how you can build it.

352.26 -> Still keeping

353.093 -> in mind though, the isolation boundary is still the account,

356.64 -> but what we can do in this, the account structure

359.43 -> and using organizations, we can have a centralized control

362.76 -> over identity and access management.

365.58 -> And we're having a concept of service control policies

368.43 -> and we will come to that in a later stage, which helps us

371.73 -> to control what can happen in certain accounts.

374.52 -> So we can really use service control policies later on

377.52 -> in order to build forensic environments, even if

380.43 -> in your own infrastructure if you want to.

382.59 -> And service control policy will limit us on what can happen

385.47 -> in those accounts and so on and so forth.

387.66 -> There is another item which is not on the slide,

390.36 -> but it's also quite important and we'll look into that

392.43 -> in more detailed.

393.6 -> The logging can also be centralized across all

395.99 -> of those accounts.

396.99 -> Especially in the case of an incident, you might want

399.09 -> to have your logs all in one place.

402.12 -> So that's basically, you know,

403.77 -> the global infrastructure elaborating what is different.

407.88 -> So what remains the same, right?

409.32 -> I mean quite a lot of the task, like the general process

411.81 -> for performing incident response remains the same.

414.48 -> Though the life cycle Margo show is coming

417.21 -> from older days and it's still valid.

419.85 -> We will see during the course

421.05 -> of the present day, there is more iteration

423.03 -> in the cloud, definitely.

425.227 -> You still need your subject matter expert,

427.62 -> especially when it comes to forensics, right?

429.57 -> You need to have the people which have this kind of skill.

431.91 -> There is no way you can go without those things.

434.905 -> However, there are things

437.49 -> like the collection of native's logs

439.17 -> and endpoint to be captured.

440.49 -> Which are traditional things you had to do all

441.893 -> through the past.

443.58 -> This is where the cloud can really help and automate them.

446.34 -> Capturing the logs from your endpoint,

448.02 -> but also making snapshot and restoring stuff and so on

450.537 -> and so forth.

451.37 -> It's something we're digging deeper in, either

453.99 -> in the containment phase but also

455.7 -> in the remediation phase

457.2 -> where the cloud-native service tools

460.92 -> uses remaining processes.

463.62 -> All right, so let's start with our first part here.

466.347 -> And this is basically how cloud-native services are related

470.91 -> to this cycle, right?

472.56 -> And I won't go in all details

474.48 -> on what the services are doing.

476.01 -> We just wanna highlight

477.18 -> that there are different services which we are covering

479.61 -> in a later stage related to different phases of the cycle.

483.66 -> And if you start

484.493 -> with AWS CloudTrail where you can capture user activity

488.94 -> and API activity.

491.07 -> Well this falls into multiple areas already.

493.32 -> So obviously detection

495.21 -> and analysis is pretty obvious on that.

498.24 -> But also in the preparation phase, you need

500.01 -> to worry about it because you need

501.33 -> to take a look that it setted up.

503.07 -> Then we have things

503.903 -> like Amazon GuardDuty, classical threat detection.

506.49 -> Well it falls into the detection part.

509.49 -> Then we are using AWS security hub.

512.52 -> This one will basically fall

514.02 -> into triage collection pretty much.

517.38 -> But also the analysis phase, right?

520.2 -> Where you're having things like systems manager.

523.26 -> And systems manager comes up in multiple spaces here.

527.46 -> In the containment phase, it can definitely help us

530.1 -> to orchestrate things during the analysis, especially

533.07 -> in the forensic analysis,

535.2 -> but it also then comes into place into the remediation

538.32 -> and restoration phase at the later stage.

540.12 -> So it's basically the service which helps us

541.74 -> to automate, especially on instances.

545.04 -> Then we have Amazon Detective,

547.32 -> this is basically our analysis services,

550.56 -> which we will dig in deeper.

552.84 -> Which is definitely used in containment,

554.82 -> but it's also used in analysis and it might be used

558.63 -> and we're really recommending this

560.04 -> in the post-incident activity, right?

562.02 -> You might have already fixed your problems,

564.42 -> but you still want to take a look to other stuff

567.45 -> around which you might want to capture for further steps

570.09 -> in the future.

571.2 -> So not wondering why this is coming up

573.87 -> in the post-incident activity.

575.82 -> And lastly on this slide, it's AWS config.

578.05 -> And AWS config also can help us

580.38 -> to A; measure the state

583.08 -> of our environment, though is it configured properly,

585.48 -> but it can also help us

586.86 -> to trigger automated remediation task based on this.

590.07 -> So that's why also comes up in detection,

592.493 -> but later on, also in remediation.

595.35 -> Now, as I said, we will dig deeper in all of this

598.44 -> in order to save a bit time.

600.3 -> Let's go with the first phase of the cycle.

604.08 -> And in the preparation phase,

605.79 -> and I'll make this first one pretty short, right?

608.76 -> You have to keep in mind that the majority

610.56 -> of the cloud-native services,

612.06 -> but also the services you might use from third parties needs

615.57 -> to be configured or at least existing or enabled before.

620.88 -> Some of them you might wanna be able to turn it on later on,

624.12 -> but it's really pretty much important

626.04 -> to make sure these services are enabled

627.99 -> if you want to use it at a later stage.

630.75 -> As I said, some of them have an exception,

632.76 -> but the majority, keep in mind, they need

634.26 -> to be configured, enabled depending

636.57 -> on the use case, whether you use all of them

638.52 -> or not, it's a different question.

641.13 -> So the same thing is basically true

643.32 -> for the log data, right?

645.27 -> And we have quite a lot of additional log data compared

648.84 -> to the traditional product.

650.49 -> Don't wanna name all of them, right?

652.26 -> But you need to keep in mind, depending

654.6 -> on the service, you might need to enable the log data.

657.51 -> AWS CloudTrail, which strikes the user activity

660 -> and the API logging, we are enabling it by default

664.62 -> for 90 days.

665.76 -> But if you want to have the data longer

667.77 -> or you want to look it for years

669.96 -> and even multiple years, right?

671.22 -> You need to make sure you have it configured.

673.68 -> The same thing is true, for example,

675.03 -> for VPC flow logs, which is our net flow data.

678.06 -> If you want to have this data available, you need

680.04 -> to configure this and need

681.69 -> to make sure it is stored somewhere where it's accessible

684.84 -> in the case you need it.

686.34 -> Another one I just wanna mention here,

688.17 -> because it's sometimes really overseen, very

690.51 -> often load balancer logs.

692.67 -> I mean especially

693.503 -> in incident response processes, they are important

697.26 -> because they are guide you to the real resources

699.57 -> behind the load balancer, right?

701.19 -> But they are the entry door.

702.9 -> So you really wanna make sure you have

704.64 -> the load balancer logs and at least store them

706.83 -> for a certain amount of time.

709.32 -> Not really considered as a security functionality,

712.05 -> but it's really important if you have stuff

713.7 -> behind load balancers.

715.86 -> Also worth mentioning WAF logs.

717.72 -> I mean if you, everything on the edge, right?

719.85 -> You might want to take a look to those things

722.07 -> like CloudFront or WAF logs for that purpose.

725.43 -> It all depends on what you're using on our platform,

728.01 -> but it's really important to have those logs enabled front

731.22 -> because if you turn it on

732.33 -> after the incident happen, it's not gonna help.

735.63 -> All right.

736.463 -> And lastly here, prepare your forensic environment.

739.41 -> And I talked about a bit in the organization structure.

742.59 -> So you might have an account environment

746.1 -> or an OU depending on how you want to go ready

748.8 -> for forensic, right?

750.36 -> And you might wanna really limit what can happen

752.61 -> in those accounts or make sure you use all

755.1 -> of our isolation capabilities.

757.14 -> Though you can safely do forensic

759.24 -> and investigation in those accounts.

761.37 -> As I said, the cloud will really help you with that.

764.1 -> But again, you need to be ready and have it prepared.

767.22 -> It's not saying this needs to run all the time,

769.23 -> but you need to have the process in order to get it done.

772.44 -> Same thing is true for containment, right?

774.9 -> If you wanna isolate machine on the network

777.15 -> and those kind of things, you need to make sure you know how

780.03 -> to do it and at which level do you want do it.

781.86 -> We'll look into that a bit deeper.

784.47 -> The forensics tools, right?

785.97 -> Systems manage and things can help you to roll that out.

788.49 -> But again, you need to have it ready as a run command

792.36 -> or something like this in order to use it.

795.09 -> And then the last one,

796.05 -> and this is something I really, it's very close

798.33 -> to my heart, right?

799.59 -> Have your log analysis ready.

801.45 -> I mean we are seeing it way too often

803.22 -> that customers had something, right?

806.58 -> They wanna do a log analysis

808.14 -> that they had all the logs configured.

809.7 -> They had it for three years, four years, five years

812.37 -> in S3, in glacier or in a seam, right?

815.13 -> They had no idea, no process, how to analyze it, right?

818.575 -> Make sure you are ready for that analyzer.

821.22 -> It's not saying you need

822.12 -> to have all the stuff running all the time,

824.34 -> but have a process to get it up quickly

826.59 -> because otherwise you can still build it afterwards.

829.92 -> It's nothing which you can't do after the fact,

832.53 -> but it will cost you a lot of time.

834.24 -> And sometimes we've seen customer delays of days

837.39 -> before they had something up and running in order

839.19 -> to look to their log.

840.18 -> This is really something.

842.13 -> There's a lot of stuff existing outside on samples

845.28 -> on GitHub and other places where you can grab queries

847.98 -> and all those kind of things.

849.24 -> Just make sure you have it up and running and ready.

851.4 -> No matter which service you use by the way.

853.98 -> Okay, that's about on the preparation phase.

859.23 -> We're taking over to detection

860.64 -> and I'll moving back to Margo.

862.86 -> - Thanks Armin.

864.96 -> So, we've carried out our preparation activities.

868.23 -> We've identified our organizational units.

870.84 -> We're aware of AWS environment, we've enabled our logging.

874.5 -> Now we need to move

875.82 -> and identify how we can do our detection activities.

878.67 -> How we can detect those unwanted behaviors

881.25 -> or unexpected behaviors in those incidents.

884.16 -> Something that is possibly different in the cloud

886.95 -> or slightly different is the proliferation of data

889.59 -> and the proliferation of logs.

891.96 -> But, as well as that there's actually now different layers

895.59 -> or different areas of data.

897.72 -> This did exist before,

899.28 -> but it's become even more granular and powerful

901.47 -> in the cloud.

902.43 -> So if we examine these different areas.

904.38 -> First of all, we're talking about our log data.

907.98 -> So as we saw the series of log services, Armin had some

910.83 -> of them up earlier on.

912.15 -> So there's an awful lot of log data

913.617 -> and we use a service GuardDuty.

915.57 -> Which is a threat detection service,

917.52 -> which continuously monitors your AWS accounts

920.13 -> for unusual behavior, unauthorized activity.

923.55 -> But then there's another layer

925.08 -> or area of data, which is the configuration

928.38 -> of your resources.

930.15 -> If you experience an incident

932.1 -> in your AWS environments, the configuration state

934.89 -> of your resource at that time is incredibly powerful data.

939.81 -> So AWS config is a service which allows you

942.72 -> to assess, audit, and evaluate the configuration

946.32 -> of your resources at a point in time

948.66 -> and over periods of time.

951.6 -> And finally, then there's inspector,

955.519 -> which is an automated vulnerability management scanning

959.7 -> for software vulnerabilities

961.77 -> and unintended network exposure.

965.13 -> So, these are three distinct areas of data which we need

968.76 -> to use to carry out our detection capabilities.

972.84 -> Let's look first of all at GuardDuty.

975.78 -> To detect unauthorized and unexpected activity

978.96 -> in your AWS environment.

980.91 -> GuardDuty analyzes data from various sources.

984.33 -> This data is unmeasured and unchecked.

987.81 -> It is not judged, if you will.

990.12 -> It's essentially data from various log sources

992.73 -> that you have enabled in the earlier preparation phases.

996.51 -> So for example, CloudTrail, capturing AWS API activity

1001.34 -> in your account.

1002.45 -> Or VPC flow logs, capturing traffic

1008.09 -> to your network interfaces in your VPCs

1010.502 -> or Kubernetes audit logs, capturing APIs from users

1014.48 -> and applications to your Kubernetes cluster.

1018.5 -> GuardDuty uses this log data to check, detect,

1022.88 -> and measure anomalies and unwanted behavior related

1026.637 -> to AWS resource types like easy tool instances

1029.84 -> as three buckets and IM.

1032.3 -> Now, it's important to say

1033.77 -> with GuardDuty, you don't actually need

1035.72 -> to configure these log sources.

1037.49 -> GuardDuty will do this natively for you.

1039.92 -> So if you were using a partner tool or third party tool

1042.71 -> to do threat detection,

1044.18 -> you would be enabling the log sources.

1046.1 -> But in this particular scenario, GuardDuty does

1048.17 -> that natively for you.

1049.61 -> It extracts fields from the log files, they're encrypted

1053.45 -> in transit.

1054.44 -> It does this for profiling and then discards the logs.

1058.43 -> And so GuardDuty is a machine learning service

1060.86 -> that does threat intelligence and anomaly detection,

1064.4 -> identifying unwanted behavior, unexpected activity

1068.21 -> in your AWS accounts.

1070.34 -> And it finds then various finding types related

1073.1 -> to Bitcoin mining, command and control server activity,

1076.64 -> unusual user behavior and unusual traffic patterns.

1081.08 -> Now, a little bit

1082.19 -> like the log data, what can happen sometimes

1084.65 -> with these services is anomaly findings,

1087.32 -> a proliferation of findings.

1089.75 -> And so GuardDuty in that scenario gives you the opportunity

1093.56 -> to suppress findings, to filter findings

1096.35 -> and to sort findings.

1098.21 -> So for example, a finding type might be marked as high

1101.45 -> because this is maybe

1102.38 -> an EC2 instance which has been compromised

1105.38 -> or maybe a finding types marked

1106.97 -> as low, which is something unsuccessful.

1109.13 -> You can look at at a later point, perhaps something

1111.98 -> like a port scan.

1113.96 -> And so through our session today

1115.88 -> for a lot of these services, we're gonna look

1117.53 -> into the management console to see what it looks like.

1120.35 -> And if you look into the GuardDuty console,

1122.36 -> what you see are the finding types related

1125.18 -> to the resources when the event occurred,

1128.637 -> and the account idea it's associated with.

1132.05 -> And if you see this particular finding type here,

1135.68 -> it's marked as high on the left hand side.

1138.47 -> And so if we look into the finding type,

1141.05 -> what we see is information about the threat.

1143.78 -> And what we have here is an EC2 instance,

1146.51 -> which is querying a domain name that's associated

1151.79 -> with a known command and control activity.

1155.03 -> With information about where

1156.56 -> that instance resides, what region and the severity type.

1160.58 -> And then, this particular GuardDuty finding will

1163.73 -> automatically trigger a malware scan.

1166.58 -> And the malware scan itself then comes up

1168.62 -> with a finding which is associated back

1171.11 -> to the GuardDuty finding.

1172.697 -> And what we have here is we have identified the root cause

1176.06 -> of the threat, which is a virus.

1180.71 -> So that's analyzing the AWS logs.

1183.59 -> But if we move then to the next layer, the next area

1186.53 -> of data, we're looking at the configuration

1189.17 -> of our resources.

1191.66 -> Therefore for AWS config,

1194.54 -> our data types are actually the resources themselves.

1197.6 -> The configuration states of the resources.

1200.78 -> When you enable AWS config in your account,

1203.69 -> the config recorder records the configuration state

1207.08 -> of each resource and this builds a configuration item.

1210.59 -> And this builds up an inventory of configuration state.

1214.34 -> And it allows AWS config

1216.56 -> to evaluate the configuration states over time

1220.43 -> of your resources.

1223.16 -> So, AWS config does this using config rules.

1228.08 -> Evaluating the configuration state

1229.91 -> of the resources over time.

1233.15 -> And using config rules,

1234.77 -> you can reflect your desired configuration state

1237.62 -> of a resource.

1238.453 -> Maybe something you have committed to an auditor

1240.71 -> or regulator.

1241.79 -> And there are different types of config rules.

1243.95 -> There are AWS managed rules, which for many

1246.35 -> of our resources, which we recommend,

1248.69 -> or alternatively maybe you want

1250.04 -> to write your own managed rule.

1251.51 -> And so you can use guard, which is a policy is code language

1256.31 -> or using Lambda functions.

1258.95 -> And then you can bring together these rules

1261.56 -> into a conformance pack.

1263 -> So this is a group of rules of configurations, of resources

1266.66 -> that reflects the desired configuration state

1269.24 -> of your prepared environments.

1271.16 -> That maybe you have committed to a regulator or auditor.

1274.31 -> And the conformance pack then will

1276.23 -> also include automatic remediation so

1279.14 -> that you can start remediating a deviation of somebody

1282.5 -> with maybe a permissive policy goes

1285.29 -> and changes the configuration of a key resource.

1287.93 -> It can automatically be remediated.

1290.12 -> We're gonna talk a little bit more about that later on.

1293.03 -> And these rules are triggered based on changes

1295.4 -> in the configuration states of your resources.

1298.73 -> So if we look into AWS config in the console, you can see it

1302.57 -> from two particular views.

1304.34 -> One from the resources or one from the rules themselves.

1308.06 -> And what we see here is a view of resources we're filtering

1310.853 -> for security groups and we see that we have a series

1314.12 -> of security groups have found to be non-compliant.

1317.57 -> If we go into one

1318.74 -> of these non-compliant findings, we can actually see a

1321.98 -> per second deviation,

1323.75 -> which is incredibly powerful if we've experienced a threat

1327.16 -> in our environment.

1328.13 -> If we've experienced an incident in our environment

1330.83 -> to be able to identify the second

1333.93 -> that the configuration of a resource has changed.

1336.56 -> And to automatically trigger the remediation on

1339.17 -> that second as well.

1341.6 -> Now you can approach this from a resource perspective,

1344.84 -> or alternatively, maybe you wanna approach it

1347.12 -> from a rules perspective.

1348.44 -> That you have set rules for your environment

1350.39 -> that you've committed to an auditor or regulator.

1352.91 -> And you want to identify what resources are deviating

1356 -> from these rules and when.

1357.95 -> And so if we look at a rule,

1359.18 -> this is an AWS managed rule around KMS key rotation,

1362.69 -> and we see that we have four non-compliant resources.

1366.44 -> And if we click into that,

1368.068 -> but we'll see our key IDs associated

1370.1 -> to the non-compliant resources

1372.08 -> where we can then automate our response to.

1374.96 -> It might be, for example, re-encrypting the data

1377.33 -> with a new key.

1381.23 -> And finally then we have Amazon inspector.

1383.51 -> We're moving into the third layer of data.

1385.43 -> Which is our instances and our container registries.

1389.63 -> An Amazon inspector is

1390.98 -> an automated vulnerability management service

1393.56 -> that continuously scans your AWS workloads

1396.92 -> for software vulnerabilities

1398.3 -> and unintended network exposure.

1401.54 -> It does this

1402.62 -> in near real-time using 50 intelligence sources.

1406.22 -> And based on the network exposure it comes up

1408.98 -> with contextualized risk-based score.

1414.5 -> These scans happen continuously and have next to no impact

1418.58 -> on the performance of your fleet.

1420.98 -> So this happens automatically and continuously and runs

1424.16 -> across those prepared environments that we did

1426.35 -> in the earlier stage of the cycle.

1428.57 -> So here we see inspectors scanning a hundred percent

1431.36 -> of the accounts and it is identified

1434.03 -> for 16% of the instances

1436.678 -> four instances where we have found critical findings.

1442.008 -> We see that, excuse me, sorry, we've scanned four instances

1445.52 -> and we have found 21 critical findings,

1448.4 -> but we see that there are zero critical findings

1450.65 -> when it comes to network exposure.

1453.02 -> There's also a history of scams that have taken place

1456.29 -> across our environments as well.

1461.09 -> If we go into this, we find out more information related

1464.48 -> to one of the instances

1466.43 -> where we have experienced our problems.

1468.26 -> And we see it's a CVE, a Redhat CVE,

1471.08 -> and we have suggested remediation action.

1474.14 -> Inspector also gives us a highly contextualized risk score.

1478.67 -> This actually varies from the NVD score,

1481.4 -> the CVSS NVD industry score.

1484.31 -> And the reason for that is because the deviation...

1487.76 -> The reason for that is because we saw in the previous screen

1490.7 -> that this instance wasn't reachable outside of the network.

1494.18 -> There were zero network reachability scores.

1498.26 -> And so as a result, the suggested risk score

1500.66 -> from inspectors slightly lower the than the NVD score.

1504.56 -> And based on that you can automate your response

1507.5 -> or carry out a manual response at a later stage.

1513.26 -> So, onto containment collection analysis.

1515.475 -> - Alright, thank you. Thank you very much Margo.

1517.52 -> Maybe just one thing to mention

1519.35 -> because this is the way it is

1520.94 -> on reinvent, there's just another support

1523.22 -> for Lambda coming expected these days,

1525.74 -> where we're launching new things, right?

1526.883 -> Just to mention that though, we didn't know

1529.55 -> that when we are creating the slides.

1531.92 -> Alright, let's take a look to the next phases.

1534.77 -> Multiple of them though we're taking

1536.24 -> the collection, containment and analysis phases

1538.962 -> and put it into one chapter

1541.22 -> because this is basically where we're seeing quite a lot

1543.56 -> of iterations.

1544.393 -> And then the world talks

1546.14 -> through all three things together.

1548.12 -> But still starting

1548.953 -> with what we called triage and collection.

1551.69 -> And let's take a look on what have we seen so far

1554.39 -> in the detection phase.

1555.53 -> So typically,

1556.58 -> as Margo showed us, we have this whole raw data.

1559.67 -> The logs, the inventory data.

1561.65 -> The all the stuff which is,

1562.7 -> basically, information which is not,

1564.83 -> basically, weighted measures and so on and far.

1567.2 -> And then we have this area check, detect, and measure.

1570.89 -> We talked about three of the services from AWS.

1574.13 -> There are more security services falling into that pillar,

1577.4 -> but there's also the very likelihood

1579.53 -> that there's quite a lot of third party services

1581.75 -> in that area.

1583.49 -> So what we're seeing with customers here is then typically

1586.4 -> also this notion of having a security incident

1589.55 -> and event management system running, right?

1592.19 -> And this is what people are using in order

1594.98 -> to put all the things together and do the analysis

1598.01 -> if they do so, right?

1599.09 -> Some people do, some don't.

1601.01 -> But typically what happened is there is quite a lot

1603.95 -> of raw data going into those systems and typically they're

1607.52 -> also putting in the findings data into that system.

1610.61 -> And this is completely fine to do so,

1613.07 -> but we'll have to keep

1613.94 -> in mind the cloud produces quite a lot of data, right?

1617.3 -> And it could be quite difficult and sometimes even expensive

1621.77 -> to do that and put all those things into a theme solution.

1624.68 -> So what we want to take a look is an alternative

1627.98 -> or an additional approach and we want to do that

1630.62 -> in introducing what security hub can do in that purpose.

1633.98 -> So let's take a quick look what security hub is

1636.947 -> and how it works, right?

1638.232 -> So we're having this two pillars already.

1640.61 -> Check, detect, and measure.

1641.903 -> We will see in a minute there is some functionality

1644.54 -> of security hub which falls into that pillar as well.

1647.51 -> But for the sake

1648.343 -> of easiness, we're having another pillar called consolidate

1651.32 -> and aggregate, right?

1652.607 -> And this is where security hub fits in as it receives data

1657.29 -> from all those AWS services

1659.57 -> as including the third party services.

1661.34 -> And right now I think we're having

1663.2 -> 15 AWS services natively integrated with security hub

1666.919 -> and more than a 63rd party services integrated

1671.66 -> with security hub.

1672.493 -> Feeding all the data in, in order to consolidate

1676.19 -> and aggregate from different sources, right?

1679.946 -> But it's not only from different sources, it gets it

1683.48 -> from different accounts and it gets it

1685.28 -> from different regions.

1686.45 -> So security hub is really that single pane of class view

1690.41 -> on all the findings coming

1692.18 -> from other systems which are doing the measurement, right?

1695.63 -> There is another functionality

1697.07 -> in security hub, which we call

1698.33 -> the cloud security posture management.

1701 -> And this is basically a similar functionality

1703.55 -> what Margo told about in the conformance pack.

1706.73 -> And it's basically an orchestrated way

1708.8 -> of using services underneath in order to do

1711.53 -> that clouds posture management.

1713.51 -> So that's why I said look,

1714.51 -> cloud posture management usually falls into

1717.11 -> that check detect measure because it tells you

1719.66 -> if your estate is in compliant with your guidance, right?

1723.41 -> The other important thing here is,

1725.69 -> we're keeping all the findings data into a common format.

1729.62 -> So this is the Amazon security findings format.

1732.664 -> You can build your own connector and feed it into that,

1735.41 -> but it makes us our life much, much easier

1737.57 -> to have all the findings in a common format

1739.532 -> for further usage, either from a analysis phase

1744.47 -> but also from a automated response phase.

1749.45 -> If you go to the security hub consult, just quickly, right?

1752.6 -> This is just giving a quick overview,

1754.64 -> a security hub summary dashboard.

1756.29 -> And here you can see this is how we are compliant

1759.14 -> to our posture management components, right?

1762.05 -> So you see there are best practices

1763.7 -> from sender of internet security.

1765.22 -> There's also AWS foundations and there's a number of checks

1769.13 -> and rules behind the scenes which are basically measured

1772.25 -> for compliance.

1773.54 -> On the right hand side you see the resources

1775.49 -> with the most failed security checks

1777.35 -> and you can tailor those what we call inside visually,

1780.95 -> but you can also put in rules in order to act

1783.29 -> and forward those things.

1786.17 -> Important to mention here,

1787.124 -> that's why I said their security hub is one

1789.8 -> of those services which is not only working

1792.44 -> for multi account, it's also going cross-regions.

1795.05 -> So really you get all your data into one place

1797.87 -> in case something happens

1799.07 -> in the region, which you're not using that often, right?

1801.65 -> You still have a single place where you look into that data.

1803.99 -> It's really important to keep in mind.

1806.534 -> If you go one step further

1808.13 -> and let's say we're going to the findings area here.

1811.16 -> And here we basically have filtered for a GuardDuty finding.

1815 -> So this is basically a finding coming

1818.54 -> from GuardDuty, it's showing up in security hub.

1821.24 -> We'll still be able to go through all of the details, right?

1824.24 -> So what is the finding?

1826.07 -> What are the instance? Which subnets?

1828.14 -> And so on and so forth.

1829.58 -> And this is, I mean important

1831.14 -> because we need to have this kind of information

1833.24 -> in the containment phase, right?

1834.861 -> And yes, you can get it directly from GuardDuty

1838.13 -> or you can get it from security hub.

1839.72 -> In security hub to just get it all in one place.

1842.33 -> Cross-regions, cross-account.

1843.71 -> So that's basically the same kind of information

1845.78 -> because the source in this case is definitely GuardDuty.

1850.638 -> So let's take a look on the next step here.

1854.15 -> And this is basically the continuous containment

1856.37 -> in the cloud.

1857.203 -> And this is something

1858.98 -> we're not talking about specific services.

1861.89 -> We're more talking about how you can leverage functionality

1865.01 -> of different services in order to help you.

1867.68 -> And I wanna start on the left hand side

1869.27 -> with the network isolation and then just put it

1871.82 -> in a couple of logos, right?

1873.02 -> You see you have our virtual private cloud.

1875.66 -> You have security groups, you have network ACL.

1877.831 -> They have network firewall.

1879.71 -> Though these are all components which can help you

1882.32 -> to isolate your machine from the network.

1885.38 -> You need to make yourself aware that you're doing it

1887.69 -> on different levels, right?

1888.89 -> A security group can isolate things on the host base, right?

1895.208 -> A network ACL does it on a subnet base.

1897.59 -> I mean you can use routing rules

1899.217 -> or a network file doing it on a VPC base.

1903.29 -> We have to keep in mind that it might have a side effect

1906.14 -> for other resources depending on where you're doing

1908.54 -> that network isolation, right?

1910.25 -> So it's important that we are giving you methods and tools

1912.74 -> and services which helps you to do the network isolation,

1915.68 -> but make yourself familiar and which level you do it

1918.41 -> and basically which kind of side effect you would have it.

1920.75 -> If you probably cut up a complete VPC, right?

1923.87 -> Well maybe other machines in that VPC are not able

1926.36 -> to communicate anymore.

1928.07 -> So, why is network isolation here also important?

1932.18 -> In the case of an incident and you wanna step

1934.28 -> into a forensic process, you might not want

1937.1 -> to shut down the machine immediately.

1938.9 -> You might want to capture memory data or other things

1942.02 -> in your forensic process though making a snapshot

1944.96 -> and put it up

1946.01 -> in a forensic environment might not be the best idea first.

1949.73 -> First capture whatever you can.

1951.86 -> But in order to prevent further harm, right?

1954.17 -> Isolate machine

1955.07 -> from the network though, it's really important

1957.05 -> that you think about that kind of process

1959.3 -> in the forensic stages.

1960.68 -> And network isolation helps you to do things

1962.78 -> without shutting down the machine, right?

1966.08 -> The next one we already discussed, logical installation.

1969.44 -> It's basically the place where you can do your forensic.

1973.49 -> Different accounts, different OU, service control policies,

1976.52 -> all the kind of things help you

1978.11 -> to build an environment where you could,

1980.36 -> basically, put your machines in order

1982.28 -> to go into deepness and do the forensic task.

1984.83 -> So we using organizations

1986.097 -> and these kind of things to do that.

1988.61 -> And then the last one is basically the forensic automation.

1991.55 -> And we mentioned it also before, things

1994.37 -> like systems manager helps you to basically

1998.69 -> to automate tasks.

1999.68 -> This could be doing run command.

2001.96 -> This could be, you know, executing forensic tools

2005.86 -> on your machines,

2006.91 -> but it could also basically go into things

2008.98 -> like EBS snapshot where you are making a copy

2011.62 -> of your machine and storing it somewhere else, right?

2013.69 -> These are services and functionalities which helps you

2017.8 -> in order to put the stuff in a different place.

2019.84 -> Where in the old days you had

2021.697 -> to have even different hardware, right?

2023.77 -> Here, this can be also fully automated, right?

2026.59 -> And these things can help,

2028.45 -> but again, this is where the skill component comes

2031.27 -> into place.

2032.23 -> Your people need to be familiar with those things,

2034.45 -> but then they can really help in order to do this.

2037.72 -> Okay, if we go to the next one.

2040.93 -> If you want do further analysis.

2043 -> So come now to the analysis phase, right?

2046.214 -> So we've seen that before, right?

2047.98 -> We have the raw data, we have the check, detect,

2050.41 -> and measure data, we have the aggregated findings,

2053.32 -> and then we're putting a layer

2054.49 -> in we call analysis this year, right?

2056.62 -> And obviously here a theme has it's place, right?

2060.25 -> I mean you can have a theme solution over there.

2063.85 -> Third party build your own and so on the forth

2066.88 -> but there's also services from us, right?

2069.16 -> And there's a fully managed service.

2070.72 -> We mentioned that Amazon detective.

2073.57 -> And there is Amazon OpenSearch,

2075.55 -> which is basically helping us to,

2077.23 -> in order to build a theme like or analysis capabilities.

2081.241 -> And both are various things and we want to take a look

2083.83 -> to those functionality.

2086.59 -> However, what we wanna show here

2088.54 -> as well, very often you might wanna start

2091.75 -> from an existing finding.

2093.49 -> So you already have a finding in either security hub

2096.76 -> or GuardDuty or any of the other services.

2099.19 -> You want to go up to this analysis tool

2101.53 -> and then go down to the source of it, right?

2104.11 -> Because you wanna do further steps and further analysis.

2106.9 -> So from an existing finding,

2108.958 -> sometimes makes stuff much easier, right?

2111.73 -> However, you might also be sitting on top of this

2114.7 -> and you wanna start from zero

2116.77 -> and still wanna do targeted analysis, right?

2120.25 -> We don't wanna search the needle in the haystack.

2123.04 -> We wanna do it as dedicated or as targeted as possible.

2127.48 -> And we will take a look to two samples.

2129.536 -> The first one, as I said will be Amazon detective,

2134.77 -> but we're here in the security hub console first, right?

2138.01 -> And we're going back to the same kind of findings

2140.382 -> of command and control server triggered by GuardDuty.

2143.71 -> But if you now take a look

2144.64 -> to the right hand side, there is a little button

2146.77 -> with it says investigate with detective, right?

2149.92 -> And if I push that button, it opens the detective console,

2154.66 -> So you'd say, "Well it's a rocket science."

2156.34 -> No, it's not. But it gives you quite a lot of benefits.

2159.4 -> The first thing is it shows me the related findings, right?

2164.26 -> It shows me a time window around this finding

2167.17 -> and you can narrow or widen this time window, right?

2170.14 -> But it gives you basically a scope

2171.82 -> around when did this GuardDuty finding happening.

2175.93 -> And then it gives you what we called entities.

2178.54 -> And you see there is IP addresses.

2180.52 -> This by the way is internal

2181.72 -> and external IP addresses which have been involved alongside

2185.38 -> with this finding.

2187.6 -> There's also then activity in a certain account, right?

2190.78 -> In this case there is not a lot of data in there,

2192.91 -> but that's good for us.

2194.17 -> So nobody used that key and compromise something.

2196.99 -> And then there's for example, the machine information.

2200.08 -> And when you click onto this and it could click on any

2202.54 -> of them and it's screenshot here, right?

2204.91 -> You can go even further.

2206.86 -> And here you could probably look into network data, right?

2210.7 -> From that place you could also look into API data where

2213.73 -> for this exercise or

2214.9 -> for this sample, we are using network data.

2217.18 -> So we're seeing the net flow data around the timeframe

2220.84 -> of the incidents.

2222.059 -> And you can see basically if I go for inbound traffic

2225.4 -> and click onto it, I even expand it

2226.989 -> so I can see which IP address

2229.06 -> or which boards have been communicating successful

2232.33 -> or not successfully into that incident.

2234.96 -> So this really helps you for the analysis

2237.25 -> and it's also helping you for the post incident component

2242.38 -> in order to dig deeper and look into this stuff.

2245.29 -> So it gives you context around, right?

2248.26 -> Detective sits on top of CloudTrail BPC flow log

2253.87 -> and it's also working the same way as GuardDuty.

2256.09 -> You don't have to configure that resources, we do

2258.73 -> that for you, right?

2260.74 -> But keeping in mind you can only have access from it

2262.99 -> from the detective and the GuardDuty perspective.

2266.95 -> You'll not be able to use that kind of log data

2269.11 -> from another place, right?

2270.52 -> But same thing, you don't have to configure the log data

2273.76 -> if you want to use detective for the analysis.

2276.64 -> Keeping in mind you have to turn off a detective

2279.43 -> at a certain point of time.

2280.66 -> And from this point onwards, we're keeping the data for you.

2283.69 -> You just can't do it afterwards.

2285.31 -> We're not looking backwards in the logs, it's important.

2289.27 -> If you turn on detective

2290.29 -> after an incident happened, not a good idea, right?

2294.49 -> Okay, let's use one other example.

2297.49 -> And in this case I want to use,

2299.32 -> it's the same picture by the way I want

2300.64 -> to use the OpenSearch service.

2302.62 -> And what we're having here is basically a solution built

2306.91 -> by our colleagues from Japan.

2308.38 -> So solution architect colleagues from us in Japan have built

2311.41 -> that based on an OpenSearch cluster.

2315.07 -> And they are using well security data like we're using S3.

2320.32 -> I mean today you've probably seen we announce

2322.36 -> another security data.

2323.38 -> Like this is still the native component,

2325.81 -> but this is now giving us visualizations on top of data.

2329.65 -> This solution supports quite a lot

2331.33 -> of sources including raw data as well as findings data

2336.37 -> from security hub or or GuardDuty.

2338.865 -> But it has VPC flow logs, directory service logs

2341.65 -> and so on and so far,

2342.49 -> but we're having here is CloudTrail data.

2344.8 -> And you see the visualizations

2346.57 -> or the panels are reflecting

2348.61 -> a certain key performance indicator, right?

2350.97 -> So you see which kind of accounts, which regions.

2353.89 -> We added geo map where we're seeing where the origin

2356.95 -> of the API call is.

2358.75 -> But we also have other panels here.

2361.15 -> And I wanna highlight this one with a login fail count.

2363.85 -> So this is basically a panel with a filter behind

2367.09 -> of certain event type when a login failed

2369.13 -> to the console or otherwise.

2372.19 -> And the idea is here, if you click now

2375.021 -> onto this panel, it's filtering down

2379.03 -> through only this API entries in CloudTrail.

2384.1 -> And it shows us, well it happened only in this one account.

2387.46 -> It could be happening in multiple account.

2389.272 -> In this case it happened in one account.

2391.3 -> It's been in this region and it's been

2393.88 -> from Germany though it's likely me, by the way, right?

2396.37 -> So, it's basically this kind of information though,

2399.221 -> while this panel's helping us really to go closer

2403.39 -> and closer to the data, we then can take a look

2405.88 -> and see, well what are the actual entries?

2407.89 -> So we're plank that out.

2409.45 -> So you're not seeing anything here.

2411.19 -> But then if you open it, you see really the full stack

2413.77 -> of the information coming from CloudTrail.

2415.66 -> So this helps us really with the visualization

2418.63 -> to see, "Hey, something is going wrong. Something is bad."

2421.6 -> And we can really drill down into that solution

2424.33 -> and really go into the details

2426.46 -> on what might have caused that, right?

2428.53 -> And this is just another example

2430.48 -> on using cloud-native services.

2432.48 -> In this case, not a fully managed service.

2434.74 -> So we'll see how that goes moving forward in order

2438.16 -> to do more detailed analysis at a later stage.

2442.96 -> All right, so now let's over to the last part.

2445.15 -> Back to you Margo.

2446.68 -> - Thanks Armin.

2447.92 -> So in the last part of the NIST lifecycle, we're going

2450.64 -> to look at remediation recovery and post-incident activity.

2454.6 -> And there are a lot of requirements

2456.25 -> and commonality across these phases.

2458.53 -> Though interestingly, NIST calls out something

2460.69 -> that we need to be careful about.

2463.99 -> Now as security people, we all love order

2466.57 -> and structure and patterns.

2467.86 -> So to ensure that you're all still with me, I've thrown

2470.26 -> in a completely different type of slide,

2472.78 -> but in English we have a proverb

2474.91 -> and it says, "Don't shut the stable door

2477.55 -> after the horse is bolted."

2479.53 -> Meaning, in the critical proceeding phases and everything

2482.41 -> that we've just gone through and what we've seen

2484.33 -> in OpenSearch, if you're seeing an incident, this means

2487.93 -> that the unwanted behavior, the threat is already

2491.86 -> in the process of occurring.

2493.81 -> And the need to respond is key.

2496.48 -> Automation becomes key, remediation becomes key.

2499.24 -> We don't want any delay, we don't want any lag.

2502.63 -> And so with cloud-native services,

2504.61 -> a key advantage is the speed and the automation

2507.19 -> and the repair of our environments.

2510.34 -> But when we talk about what remains the same, the challenges

2514.27 -> that we face pretty much remain the same.

2516.7 -> We still have auditors and regulators, third parties

2519.97 -> that we maybe need to report to

2521.47 -> if we've experienced a breach or a threat or an incident.

2526.42 -> If we experience an incident,

2527.98 -> our end customers now have ways and tools in social media

2532.42 -> to communicate broadly about this incident.

2535.09 -> So, we care about our customers and our customers can

2538.63 -> in turn inform other customers.

2542.17 -> And we still need to know what happened

2544.69 -> at exactly what time?

2546.55 -> So our challenges remain the same in these phases.

2550.45 -> We saw in the last section, Armin went

2552.79 -> over this single pane view

2554.29 -> that security hub gives us this uniformed formed finding.

2559.15 -> And from that, we can already start

2561.31 -> in the previous phases automating our response.

2564.09 -> So our automation journey can actually begin already.

2567.64 -> Using services like EventBridge,

2569.62 -> which is a service event bus that lets you receive, filter,

2572.86 -> transform and route and deliver events to other services

2576.52 -> like the simple notification service or Lambda,

2579.37 -> for example, to perform an automated action

2582.58 -> to repair your environment or systems manager.

2585.91 -> But likewise, we can also communicate to third party tools

2588.76 -> in our organization or indeed to the humans

2591.4 -> if that's necessary.

2592.81 -> So the automation can already begin iteratively

2596.29 -> in previous phases.

2599.86 -> Now, in the NIST lifecycle, there's an interesting call out

2603.58 -> in the documentation around remediation and recovery.

2607.3 -> And what we need to be careful about in remediation,

2609.91 -> if we think about our earlier stages,

2611.83 -> we identified an instance which have been compromised.

2614.95 -> It is calling a domain name associated

2616.93 -> with a known command and control server activity.

2620.53 -> What we want to do, first

2621.7 -> of all there is eradicate the behavior.

2624.13 -> Is repair the instance.

2625.84 -> We don't want to recover a damaged instance

2628.54 -> or unpatched instance to the wrong region, sorry,

2632.32 -> to another region or within our same region.

2635.11 -> We want to eradicate the behavior first of all.

2638.29 -> So in NIST, they call out this idea of eradication

2641.41 -> and recovery being done

2642.61 -> in a phased approach to drive remediation.

2646 -> And there are various cloud-native services

2648.58 -> in both capabilities to support you.

2652.78 -> So for example, we're going to be looking at systems manager

2655.2 -> in the next couple of slides,

2656.5 -> but we already saw AWS config,

2659.23 -> which will allow you put together conformance packs,

2662.08 -> reflecting the desired configuration state of your resources

2665.02 -> and allowing you to remediate

2666.73 -> and repair the resources immediately.

2670.54 -> In recovery, we could be looking at something

2672.34 -> like AWS code development kit, which allows you

2674.77 -> to describe your environments in code.

2676.9 -> Like for example, Python or TypeScript

2679.18 -> and restore your environments.

2682.93 -> So if we think back earlier on

2684.52 -> to this instance, which we had identified in GuardDuty

2686.737 -> and later on in security hub,

2689.17 -> and we want to look at remediating this instance.

2691.72 -> The first thing we need to do is to actually patch it

2693.967 -> and to repair it.

2695.47 -> And so this finding would've been identified

2698.41 -> in security hub and triggered a rule.

2700.6 -> Or alternatively, in systems manager, which is a collection

2704.11 -> of capabilities to help you manage your infrastructure

2706.75 -> and operational problems across your AWS environments

2710.11 -> and hybrid environments at scale.

2713.23 -> A patch or systems manager can be triggered

2715.57 -> via the operation center.

2717.25 -> And the operation center is a dashboard

2719.68 -> in systems manager, a central location to view, investigate,

2723.43 -> and resolve operational items.

2726.1 -> And so this instance could have triggered systems manager

2729.34 -> to trigger then the patch manager to patch the instances

2734.17 -> and maintain the compliance of those instances.

2736.66 -> And this can be done with instances

2738.79 -> in your prepared environments in AWS,

2741.4 -> in your hybrid environments on premises

2743.5 -> or edge environments.

2746.53 -> And so if we look into systems manager...

2749.35 -> First of all, what we see in the

2750.43 -> left hand side is a systems managers, a collection

2752.95 -> of capabilities.

2754.45 -> Operations management, application management,

2757.6 -> node management and change management.

2760.69 -> And we see in this environment

2762.07 -> that have identified four instances that need to be patched.

2766.69 -> And we can see a history of scans

2769.06 -> that have already taken place.

2773.74 -> We can go into these instances and see deviation

2776.47 -> from compliance reports that we might be using

2778.9 -> like SOC reports for example.

2781.09 -> And we can trigger the patching of the instances.

2783.34 -> Can be done automatically by systems manager triggered

2787.12 -> by security manager and a rule or by the operations center

2790.33 -> or done manually.

2792.52 -> And then we can choose to scan

2794.05 -> and install, reboot the instance if necessary.

2797.26 -> And using tagging, we can choose a fleet of instances

2800.8 -> or one instance or hybrid instances or devices.

2805.06 -> And so this way

2806.17 -> as a first step, what we can do is eradicate the behavior

2810.07 -> of that instance, okay?

2811.51 -> We can patch that instance.

2813.58 -> This then means we can move on to the next step,

2815.92 -> which is maybe recovering the volume associated

2818.68 -> to that instance.

2820.84 -> And so if we look at a service like AWS backup.

2824.284 -> AWS backup is a fully managed policy-based, backup service

2828.61 -> that makes it easy to centrally manage

2830.98 -> and automate the backup of data services across AWS.

2835.09 -> It supports nine stateful services.

2839.62 -> It acts as a management layer across these services.

2842.77 -> And using a backup plan creates backups

2846.07 -> into the AWS intervals in your AWS accounts.

2850.42 -> And these backups are encrypted at rest.

2853.06 -> This can be scaled then

2854.44 -> across your AWS prepared environments

2857.02 -> using AWS organizations.

2860.77 -> And is integrated with identity, access management

2864.37 -> to lockdown on what resources are performing what actions

2867.37 -> under what conditions.

2868.99 -> And logging and notifications then done

2871.45 -> to SNS, CloudWatch and CloudTrail.

2876.1 -> And then, once the backup is taken place,

2879.13 -> the environment can be automatically restored using services

2882.55 -> like CDK, Cloud Development Kit.

2886.6 -> Third party tools like Terraform or

2888.01 -> for example AWS cloud formation.

2891.73 -> So if we think about earlier on,

2894.1 -> we have patched our instance.

2896.5 -> We have maybe looked at recovering the instance,

2899.92 -> but now we want to recover an associated volume.

2902.737 -> And one of the things AWS backup gives us is cross-account

2906.43 -> and cross-region support.

2908.32 -> And it does this using cross-account management.

2911.38 -> So here we have our instance in region A

2913.99 -> and it's associated volume.

2916.3 -> In this AWS account, we have the backup plan.

2919.99 -> Which describes the lifecycle of the backups

2922.66 -> and the vaults where the backup

2924.19 -> for this particular volume and instance will go.

2927.55 -> But then this account is associated to AWS organizations

2931.51 -> and is in an organizational unit

2933.64 -> where we have a backup policy.

2935.83 -> And the backup policy is such that this means

2938.53 -> that nobody can change the backup plan in the AWS account.

2942.55 -> And we can define that the backups for this instance,

2946.24 -> and this volume can occur into a different AWS account

2950.17 -> into different regions.

2952.09 -> And this protects our environment

2954.25 -> against account compromise our prepared environments

2956.8 -> from earlier on.

2959.5 -> So if we look at AWS backup, first

2962.05 -> of all, we see we have this notion of a vault,

2964.03 -> and here we have a bronze vault in the EU central region

2967.405 -> and we have a series of recovery points for our resources.

2971.38 -> What we see when we go into the backup plan is

2973.39 -> that we cannot change it.

2974.74 -> So this cannot be changed in this AWS account.

2978.49 -> If somebody wanted to carry out suspicious activity

2981.52 -> and amend the backup plan, they cannot do it within

2984.28 -> that account.

2985.113 -> They have to go into the AWS organizations.

2988.06 -> Then we can see the resource assignment.

2989.89 -> The resources that this backup plan applies to.

2993.28 -> And we see it's the volumes and the instance IDs.

2996.22 -> These are the protected resources

2997.72 -> that we're doing the backups on.

3002.52 -> So if we click then into the policy, we're actually then

3005.82 -> into AWS organizations.

3007.71 -> We have come out of AWS backup.

3010.17 -> And only in this view can you see the policy.

3012.57 -> You need to have these rights, you need

3014.43 -> to have this assigned to via IAM.

3018.12 -> And so if we look at the policy, we now see information as

3022.29 -> to where the backup from that backup plan will go.

3025.32 -> And we can define a different region

3026.97 -> or a different account here.

3028.5 -> And here we're sending it to EU Central.

3030.69 -> And this is the life cycle as well of the backups.

3035.79 -> And so, if we want to then carry out our restore.

3038.37 -> So we have patched the instance, we have repaired it,

3042.18 -> but now we want to restore the associated volume.

3045.54 -> We can choose an instance or an associated volume to restore

3049.95 -> and maybe it might be the latest one.

3052.38 -> Maybe we need to choose an earlier one

3054.36 -> because a compromise occurred in our environment

3056.3 -> in the last 24 hours.

3058.2 -> And so we need to choose an earlier volume to restore.

3061.41 -> We can choose the volume to restore,

3063.69 -> and then we can define information related

3065.91 -> to the subnet, related to the VPC and related

3068.85 -> to security groups that we want to deploy

3071.7 -> around the incidence and volume.

3076.17 -> And so this brings us

3077.1 -> into the post-incident activity analysis.

3080.07 -> Here, essentially what remains the same is one

3082.62 -> of the most important things that we need to do is to learn

3085.65 -> and improve from the previous stages.

3089.521 -> We have information coming in

3091.62 -> from containment, eradication and automation.

3094.11 -> And from this we can improve in our incident response.

3097.05 -> We can improve in our KPIs, we can improve in our metrics.

3100.41 -> We do have services to support us here,

3102.42 -> like systems manager as Armin previously mentioned.

3105.6 -> Which has an incident management console view,

3108.84 -> where you can define rum books, automation

3111.69 -> and do incident collaboration across teams.

3116.64 -> But an important part of this,

3118.29 -> the post-incident activity is the iterative phase,

3121.23 -> through the NIST lifecycle where you can feedback

3123.93 -> to all of the phases, learnings related to the incident

3128.468 -> and improve your activities.

3133.89 -> As well as that, in AWS professional services

3136.707 -> and solution architectures have various teams

3139.05 -> to support you in activities in this phase.

3144.96 -> - All right, thanks Margo.

3146.04 -> So let's probably go to the last slide of this.

3149.52 -> At the summary and conclusion.

3150.81 -> I think what we wanted to show today is

3152.88 -> that there's quite a lot of cloud-native services,

3155.1 -> which can help us in that overall lifecycle.

3157.77 -> That said, there's not only cloud-native services.

3160.26 -> There might be other services as well,

3162.24 -> but what we also wanted to show

3163.65 -> that we could cover the entire landscape

3167.07 -> and especially how the cloud, with its flexibility

3170.1 -> and agility, can help to respond and recover quickly, right?

3175.14 -> We're seeing how cloud-native services supporting system

3177.63 -> and analysis capabilities

3179.22 -> and we wanna highlight again, I mean the feedback

3181.56 -> into the process is really, really important in order

3185.85 -> to get better for the next time.

3188.07 -> With that, we wanna thank you and

3190.29 -> at another post-incident activity, you might do us a favor

3193.59 -> and fill out the service for us to get better.

3195.6 -> Thank you very much.

3196.8 -> - [Margo] Thank you.

3197.729 -> (audience clapping)

Source: https://www.youtube.com/watch?v=lx4igENUPVg