AWS re:Invent 2022 - Threat detection and incident response using cloud-native services (SEC309)
AWS re:Invent 2022 - Threat detection and incident response using cloud-native services (SEC309)
Threat detection and incident response processes in the cloud have many similarities to on premises, but there are some fundamental differences. In this session, explore how cloud-native services can be used to support threat detection and incident response processes in AWS environments. In addition, learn how cloud-native security services can be integrated into security information and event management solutions and if a classic SIEM approach is still required. This session covers native services such as Amazon GuardDuty, AWS CloudTrail, AWS Security Hub, Amazon OpenSearch Service, AWS Shield Advanced, and more.
ABOUT AWS Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.
AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.
#reInvent2022 #AWSreInvent2022 #AWSEvents
Content
0.81 -> - Hey everyone, welcome
2.67 -> to our session today on threat detection
4.53 -> and incident response using
cloud-native services.
7.62 -> My name is Margo Cronin
9.09 -> and I'm a Solutions Architect Specializing
11.13 -> in Security and Compliance.
13.08 -> - Yeah. Hello everybody.
13.95 -> My name is Armin Schneider.
15 -> I'm also a Specialist Solution Architect
16.71 -> for Security and Compliance
and looking forward
18.93 -> to the session today.
22.77 -> - So this is our agenda today.
25.23 -> Cybersecurity and cyber
risk has always been a topic
28.77 -> that customers have
cared passionately about.
31.53 -> And now with the widest
breadth of services and tools
34.49 -> in the cloud.
35.97 -> There's even more actions
the customer can take
38.76 -> to mitigate some risks in these spaces.
41.31 -> So myself, Armin,
42.66 -> and other solution architects
carry up many assignments
45.3 -> in this area, which brought
together our session today.
49.92 -> So, today we're gonna be
talking about what's different
52.82 -> in the cloud,
54 -> but also what's remains the same
55.38 -> from what you would've experienced before.
57.96 -> And we're going to look
at threat detection
59.61 -> and instant response
60.78 -> in phases, preparation,
detection, containment,
65.31 -> collection, analysis, and then automation
68.64 -> and remediation and
post-incident analysis.
74.79 -> Actually, what we're going to use today
76.86 -> to guide our session is
the NIST 800-61 lifecycle.
83.04 -> You might be using a different one
84.42 -> in your organization, that's fine.
86.94 -> What we're trying to do
is we're trying to begin
88.53 -> with technical capabilities, not beginning
90.9 -> with cloud-native services.
92.7 -> What are the capabilities and
the requirements you're trying
95.04 -> to drive with those cloud-native services?
101.55 -> - All right, thanks Margo.
103.38 -> So I want to take over
the first part and want
105.66 -> to talk about what's
different in the cloud.
107.76 -> And I think the big thing is there is just
110.13 -> an additional layer in this whole picture.
112.71 -> And this is a control plane, right?
114.96 -> And in fact, I mean
this is a paradigm shift
117.63 -> in how environments
operate and how they exist.
121.2 -> There is quite a lot of
additional log data, which we need
123.78 -> to consider in the
incidents response process.
126.54 -> But on the flip side, there's
also a much better way
129.33 -> or automated way in order
to react to incidents.
132.87 -> And then finally, there is also
a more continuous iteration
136.38 -> between the life cycles,
which we will show then
139.41 -> during the cause of the session.
142.11 -> In order to start with this,
I mean we wanted to start
144.63 -> and look into the AWS
global infrastructure first
148.2 -> in order to elaborate a
little bit more details
150.45 -> on what really is different.
153.96 -> The stuff might known for some
155.52 -> of you, we want little take
a look to it in the context
159.57 -> of incidents response and what we start
162.36 -> with the global infrastructure.
163.65 -> We want to talk about the
concept of a region, right?
167.4 -> And a region for us is basically,
168.96 -> a physical location where
we cluster data centers
172.23 -> and we call each group
of data center called
174.51 -> an availability zone.
176.76 -> Each of our region has at
least three availability zones
180.69 -> and we currently have
about 96 availability zones
185.61 -> across 30 different regions, right?
189.06 -> So why is that important in
the incidents response process?
193.8 -> A, in the response process, right?
195.66 -> We might want to go to a
different region in case we need,
199.32 -> in the case of something has an incident
201.96 -> and something happened, right?
203.52 -> We're often seen that compromised
accounts has been used
207.54 -> in regions where people are
usually not using the regions.
211.53 -> Or they try to hide them in those regions.
213.57 -> So, it's really important
and we come to that
215.79 -> in more detail to have those
region concept in mind.
219.09 -> The next thing we want to take look
220.34 -> at is basically AWS account, right?
223.08 -> An AWS account, it's basically
a natural security boundary
227.443 -> for billing and security access
to your resources in fact.
233.07 -> So within account we
have resources, right?
237.3 -> And this could be
databases, virtual machines,
240.51 -> higher level services, storage objects
242.91 -> and so on and so forth.
244.5 -> And then we have the virtual private cloud
247.59 -> or our network infrastructure.
249.81 -> Where we have basically subnets
252.57 -> and basically replaced the
traditional network terminology
256.855 -> in the cloud-native
functionality basically also
259.68 -> with the scale of the cloud.
261.57 -> And then there are
couple of other services
263.52 -> in those regions such as gateways
266.43 -> and other kind of functionality.
268.77 -> And within an account, we can spread
271.44 -> across many regions again, right?
273.02 -> So it's important an account
is by default allowing us
275.88 -> to go all of the native regions
in a commercial platform.
279.57 -> There is one thing,
280.98 -> and it's worth mentioning
here, which is going
283.35 -> across region, and this is the identity
285.51 -> and access management and
we will look into that later
288.72 -> on why this is important to
get control, who is allowed
291.51 -> to do what in which kind of region,
294.18 -> but also on which kind
of resources, especially
297.9 -> if you want to go with
isolation technologies
300.87 -> in the later stage.
302.76 -> So while the account concept
is our isolation layer,
306.12 -> what we're seeing on our customer side,
307.8 -> the customers are
running multiple account.
309.57 -> And it's basically something
we're guiding customers through
312.51 -> because typically we are
saying this is a good way
314.37 -> of isolating your stuff.
316.41 -> And then basically customers
are running hundreds
319.41 -> and sometimes thousands of account.
321.45 -> So in order to get that under
control, we then started
324.45 -> with a service called AWS organizations.
327.03 -> Which is basically an
account management service
329.82 -> that enables you to
organize and manage accounts
332.85 -> across your entire stage.
335.79 -> And in order to do that, we
have one specific account,
339.27 -> which we call the management account.
341.46 -> And then we have things
like organizational units
343.824 -> and then we have sub-organizational units
346.53 -> and we're having accounts within
those organizational units.
349.17 -> So this is basically the structure
and how you can build it.
352.26 -> Still keeping
353.093 -> in mind though, the isolation
boundary is still the account,
356.64 -> but what we can do in
this, the account structure
359.43 -> and using organizations, we
can have a centralized control
362.76 -> over identity and access management.
365.58 -> And we're having a concept
of service control policies
368.43 -> and we will come to that in
a later stage, which helps us
371.73 -> to control what can happen
in certain accounts.
374.52 -> So we can really use service
control policies later on
377.52 -> in order to build forensic
environments, even if
380.43 -> in your own infrastructure if you want to.
382.59 -> And service control policy will
limit us on what can happen
385.47 -> in those accounts and so on and so forth.
387.66 -> There is another item
which is not on the slide,
390.36 -> but it's also quite important
and we'll look into that
392.43 -> in more detailed.
393.6 -> The logging can also be
centralized across all
395.99 -> of those accounts.
396.99 -> Especially in the case of
an incident, you might want
399.09 -> to have your logs all in one place.
402.12 -> So that's basically, you know,
403.77 -> the global infrastructure
elaborating what is different.
407.88 -> So what remains the same, right?
409.32 -> I mean quite a lot of the
task, like the general process
411.81 -> for performing incident
response remains the same.
414.48 -> Though the life cycle Margo show is coming
417.21 -> from older days and it's still valid.
419.85 -> We will see during the course
421.05 -> of the present day,
there is more iteration
423.03 -> in the cloud, definitely.
425.227 -> You still need your subject matter expert,
427.62 -> especially when it comes
to forensics, right?
429.57 -> You need to have the people
which have this kind of skill.
431.91 -> There is no way you can
go without those things.
434.905 -> However, there are things
437.49 -> like the collection of native's logs
439.17 -> and endpoint to be captured.
440.49 -> Which are traditional
things you had to do all
441.893 -> through the past.
443.58 -> This is where the cloud can
really help and automate them.
446.34 -> Capturing the logs from your endpoint,
448.02 -> but also making snapshot and
restoring stuff and so on
450.537 -> and so forth.
451.37 -> It's something we're
digging deeper in, either
453.99 -> in the containment phase but also
455.7 -> in the remediation phase
457.2 -> where the cloud-native service tools
460.92 -> uses remaining processes.
463.62 -> All right, so let's start
with our first part here.
466.347 -> And this is basically how
cloud-native services are related
470.91 -> to this cycle, right?
472.56 -> And I won't go in all details
474.48 -> on what the services are doing.
476.01 -> We just wanna highlight
477.18 -> that there are different
services which we are covering
479.61 -> in a later stage related to
different phases of the cycle.
483.66 -> And if you start
484.493 -> with AWS CloudTrail where
you can capture user activity
488.94 -> and API activity.
491.07 -> Well this falls into
multiple areas already.
493.32 -> So obviously detection
495.21 -> and analysis is pretty obvious on that.
498.24 -> But also in the
preparation phase, you need
500.01 -> to worry about it because you need
501.33 -> to take a look that it setted up.
503.07 -> Then we have things
503.903 -> like Amazon GuardDuty,
classical threat detection.
506.49 -> Well it falls into the detection part.
509.49 -> Then we are using AWS security hub.
512.52 -> This one will basically fall
514.02 -> into triage collection pretty much.
517.38 -> But also the analysis phase, right?
520.2 -> Where you're having things
like systems manager.
523.26 -> And systems manager comes
up in multiple spaces here.
527.46 -> In the containment phase,
it can definitely help us
530.1 -> to orchestrate things during
the analysis, especially
533.07 -> in the forensic analysis,
535.2 -> but it also then comes into
place into the remediation
538.32 -> and restoration phase at the later stage.
540.12 -> So it's basically the
service which helps us
541.74 -> to automate, especially on instances.
545.04 -> Then we have Amazon Detective,
547.32 -> this is basically our analysis services,
550.56 -> which we will dig in deeper.
552.84 -> Which is definitely used in containment,
554.82 -> but it's also used in
analysis and it might be used
558.63 -> and we're really recommending this
560.04 -> in the post-incident activity, right?
562.02 -> You might have already
fixed your problems,
564.42 -> but you still want to
take a look to other stuff
567.45 -> around which you might want
to capture for further steps
570.09 -> in the future.
571.2 -> So not wondering why this is coming up
573.87 -> in the post-incident activity.
575.82 -> And lastly on this slide, it's AWS config.
578.05 -> And AWS config also can help us
580.38 -> to A; measure the state
583.08 -> of our environment, though
is it configured properly,
585.48 -> but it can also help us
586.86 -> to trigger automated
remediation task based on this.
590.07 -> So that's why also comes up in detection,
592.493 -> but later on, also in remediation.
595.35 -> Now, as I said, we will
dig deeper in all of this
598.44 -> in order to save a bit time.
600.3 -> Let's go with the first
phase of the cycle.
604.08 -> And in the preparation phase,
605.79 -> and I'll make this first
one pretty short, right?
608.76 -> You have to keep in mind that the majority
610.56 -> of the cloud-native services,
612.06 -> but also the services you might
use from third parties needs
615.57 -> to be configured or at least
existing or enabled before.
620.88 -> Some of them you might wanna
be able to turn it on later on,
624.12 -> but it's really pretty much important
626.04 -> to make sure these services are enabled
627.99 -> if you want to use it at a later stage.
630.75 -> As I said, some of them have an exception,
632.76 -> but the majority, keep in mind, they need
634.26 -> to be configured, enabled depending
636.57 -> on the use case, whether
you use all of them
638.52 -> or not, it's a different question.
641.13 -> So the same thing is basically true
643.32 -> for the log data, right?
645.27 -> And we have quite a lot of
additional log data compared
648.84 -> to the traditional product.
650.49 -> Don't wanna name all of them, right?
652.26 -> But you need to keep in mind, depending
654.6 -> on the service, you might
need to enable the log data.
657.51 -> AWS CloudTrail, which
strikes the user activity
660 -> and the API logging, we
are enabling it by default
664.62 -> for 90 days.
665.76 -> But if you want to have the data longer
667.77 -> or you want to look it for years
669.96 -> and even multiple years, right?
671.22 -> You need to make sure
you have it configured.
673.68 -> The same thing is true, for example,
675.03 -> for VPC flow logs, which
is our net flow data.
678.06 -> If you want to have this
data available, you need
680.04 -> to configure this and need
681.69 -> to make sure it is stored
somewhere where it's accessible
684.84 -> in the case you need it.
686.34 -> Another one I just wanna mention here,
688.17 -> because it's sometimes
really overseen, very
690.51 -> often load balancer logs.
692.67 -> I mean especially
693.503 -> in incident response
processes, they are important
697.26 -> because they are guide
you to the real resources
699.57 -> behind the load balancer, right?
701.19 -> But they are the entry door.
702.9 -> So you really wanna make sure you have
704.64 -> the load balancer logs
and at least store them
706.83 -> for a certain amount of time.
709.32 -> Not really considered as
a security functionality,
712.05 -> but it's really important
if you have stuff
713.7 -> behind load balancers.
715.86 -> Also worth mentioning WAF logs.
717.72 -> I mean if you, everything
on the edge, right?
719.85 -> You might want to take
a look to those things
722.07 -> like CloudFront or WAF
logs for that purpose.
725.43 -> It all depends on what
you're using on our platform,
728.01 -> but it's really important to
have those logs enabled front
731.22 -> because if you turn it on
732.33 -> after the incident happen,
it's not gonna help.
735.63 -> All right.
736.463 -> And lastly here, prepare
your forensic environment.
739.41 -> And I talked about a bit in
the organization structure.
742.59 -> So you might have an account environment
746.1 -> or an OU depending on
how you want to go ready
748.8 -> for forensic, right?
750.36 -> And you might wanna really
limit what can happen
752.61 -> in those accounts or make sure you use all
755.1 -> of our isolation capabilities.
757.14 -> Though you can safely do forensic
759.24 -> and investigation in those accounts.
761.37 -> As I said, the cloud will
really help you with that.
764.1 -> But again, you need to be
ready and have it prepared.
767.22 -> It's not saying this
needs to run all the time,
769.23 -> but you need to have the
process in order to get it done.
772.44 -> Same thing is true for containment, right?
774.9 -> If you wanna isolate
machine on the network
777.15 -> and those kind of things, you
need to make sure you know how
780.03 -> to do it and at which
level do you want do it.
781.86 -> We'll look into that a bit deeper.
784.47 -> The forensics tools, right?
785.97 -> Systems manage and things can
help you to roll that out.
788.49 -> But again, you need to have
it ready as a run command
792.36 -> or something like this in order to use it.
795.09 -> And then the last one,
796.05 -> and this is something I
really, it's very close
798.33 -> to my heart, right?
799.59 -> Have your log analysis ready.
801.45 -> I mean we are seeing it way too often
803.22 -> that customers had something, right?
806.58 -> They wanna do a log analysis
808.14 -> that they had all the logs configured.
809.7 -> They had it for three years,
four years, five years
812.37 -> in S3, in glacier or in a seam, right?
815.13 -> They had no idea, no process,
how to analyze it, right?
818.575 -> Make sure you are ready for that analyzer.
821.22 -> It's not saying you need
822.12 -> to have all the stuff
running all the time,
824.34 -> but have a process to get it up quickly
826.59 -> because otherwise you can
still build it afterwards.
829.92 -> It's nothing which you
can't do after the fact,
832.53 -> but it will cost you a lot of time.
834.24 -> And sometimes we've seen
customer delays of days
837.39 -> before they had something
up and running in order
839.19 -> to look to their log.
840.18 -> This is really something.
842.13 -> There's a lot of stuff
existing outside on samples
845.28 -> on GitHub and other places
where you can grab queries
847.98 -> and all those kind of things.
849.24 -> Just make sure you have it
up and running and ready.
851.4 -> No matter which service
you use by the way.
853.98 -> Okay, that's about on
the preparation phase.
859.23 -> We're taking over to detection
860.64 -> and I'll moving back to Margo.
862.86 -> - Thanks Armin.
864.96 -> So, we've carried out our
preparation activities.