AWS re:Invent 2022 - AWS security services for container threat detection (SEC329-R1)

Aug 16, 2023

AWS re:Invent 2022 - AWS security services for container threat detection (SEC329-R1)

Containers are a cornerstone of many AWS customers’ application modernization strategies. The increased dependence on containers in production environments requires threat detection that is designed for container workloads. To help meet the container security and visibility needs of security and DevOps teams, new container-specific security capabilities have recently been added to Amazon GuardDuty, Amazon Inspector, and Amazon Detective. In this session, learn about these new capabilities and the deployment and operationalization best practices that can help you scale your AWS container workloads. Additionally, the head of cloud security at HBO Max shares container security monitoring best practices.

Learn more about AWS re:Invent at https://go.aws/3ikK4dD.

Subscribe:
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4

ABOUT AWS
Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.

AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

#reInvent2022 #AWSreInvent2022 #AWSEvents

Content

0.39 -> - All right.

1.77 -> Good evening, everybody.

2.67 -> Thank you for taking the time out of your day,

5.37 -> especially late in the day,

6.33 -> to come out here for this session.

7.86 -> We appreciate your time,

9.78 -> and hopefully we can give you a good session here,

12.45 -> and some good content to take away.

14.88 -> My name is Scott Ward.

15.81 -> I'm a principal solutions architect at AWS.

19.05 -> I work on our External Security Services product team.

22.29 -> So this is a team that owns security services,

25.2 -> Security Hub, Macie, Detective, Inspector, GuardDuty,

29.19 -> the new AWS Security Lake service.

32.43 -> I work with our customers to help them understand

34.35 -> how our services work,

35.85 -> help them understand how they can integrate

37.65 -> our services into their environment,

39.51 -> and I also work with the product engineering teams

41.34 -> to work with them on, what do customers need,

43.98 -> and what should the services or features do

47.25 -> in order to allow our customers to be able to use them

49.65 -> and meet their security goals?

51.78 -> I'm joined here by today with Mrunal Shah,

54.24 -> who's the head of container security

55.77 -> for Warner Bros. Discovery and HBO Max.

58.89 -> Mrunal's gonna join us a little bit later

60.54 -> to tell the HBO Max story about what they're doing

63.63 -> when it comes to containers and threat detection.

66.99 -> To start off with, we're just gonna start talking about,

69.51 -> what are the AWS services that exist

72.24 -> to help you when it comes to threat detection

74.34 -> and vulnerability management for your containers,

76.92 -> talk through what those services do

78.57 -> and how you might use them,

80.61 -> and then that'll back up what Mrunal talks about,

83.16 -> and how they're actually using them,

84.15 -> and the value they're getting out of these services.

88.23 -> So to start with,

89.063 -> let's just highlight some of the challenges

90.69 -> that customers have been facing

92.46 -> when it comes to containers and security.

95.34 -> Scale's a big one.

97.35 -> When you're running containers

98.43 -> to back your applications or your microservices,

101.28 -> there's generally a lot more containers running

103.23 -> than you might have

104.16 -> with traditional infrastructure components.

106.86 -> And even when you go down to just

107.94 -> the level of a single EC2 instance,

109.71 -> there's many containers running on that instance.

112.47 -> So being able to understand, which container is the problem,

115.53 -> and where is that container running,

117.72 -> and is that container actually the problem or not,

120.42 -> can be a challenge.

122.07 -> Containers are short lived.

123.69 -> Containers might run for a couple of minutes, maybe an hour.

126.93 -> Being able to detect a threat for a short-lived container

129.57 -> and actually be able to identify that

131.61 -> in a quick amount of time is very challenging.

135.36 -> Containers are just different than what you might see

137.76 -> with traditional infrastructure.

138.87 -> They have different configurations,

140.34 -> and understanding what's a good configuration

143.04 -> and what's a secure configuration is challenging.

147.06 -> There are various repositories out there

150.15 -> that offer up container images

151.95 -> that you can use to run an application

154.35 -> or build on on top of to run your own applications,

158.4 -> but really knowing where that container image came from,

162.6 -> and is that a trusted source or not,

164.07 -> and is the software in there something

165.48 -> you could actually feel comfortable

167.04 -> running in your environment can be a challenge.

170.01 -> There's a general lack of expertise out there.

171.69 -> So we all know that the security industry as a whole

174.18 -> is lacking personnel and bodies to hire.

178.77 -> Container knowledge and insight

180.21 -> and the ability to gain that knowledge can be a challenge.

186.42 -> Network configurations don't always account

189.24 -> for the presence of containers,

191.13 -> and that lack of container awareness

193.77 -> from a network perspective

195.48 -> can result in a malicious container

197.82 -> being able to reach out and compromise other containers

200.55 -> or parts of your environment.

202.89 -> And then, just a general lack of visibility,

204.54 -> the ability to understand,

206.61 -> what containers do I have, where are they running,

209.67 -> how are they interacting with each other,

211.98 -> can be a challenge as well.

213.54 -> Traditional tools don't account for containers,

215.85 -> and incorporating that level of visibility

218.64 -> with things that are so short-lived

220.92 -> and have such massive scale can be challenging.

224.76 -> So when we talk to our customers,

226.62 -> that often leads to discussions around, you know,

229.38 -> what are the best practices that I can be following

231.75 -> when it comes to securing my container workloads on AWS?

236.13 -> So I'm gonna take some time and talk through

237.39 -> some specific services and features that we have within AWS

240.48 -> that are gonna help with this part of the discussion.

243.84 -> So to start off with, I'm gonna focus on Amazon GuardDuty

246.48 -> and its EKS protection feature.

249.87 -> So this feature launched about a year ago for GuardDuty.

254.58 -> It's an extension of the existing GuardDuty service.

257.52 -> And for a quick overview of what GuardDuty does as a whole,

261.18 -> GuardDuty is our threat detection service for AWS.

264.93 -> It's one-click enablement.

266.25 -> With a click of a button or a single API call,

268.11 -> you can turn on GuardDuty.

270.03 -> As soon as you turn on GuardDuty,

271.83 -> it will immediately start consuming data

274.74 -> from the log sources that it supports

276.66 -> based on the features that you've enabled.

279.45 -> Those log sources include CloudTrail, VPC Flow Logs,

283.38 -> Route 53 DNS query logs,

286.11 -> S3 data event logs,

287.79 -> and now EKS audit logs.

291.09 -> GuardDuty consumes that information,

293.7 -> and then it's going to generate findings

296.91 -> in a couple of different ways.

298.02 -> One, it's gonna use threat intelligence,

300.72 -> AWS's own internal threat intelligence based on what we see

304.2 -> from the AWS and the Amazon platform.

307.41 -> And then we also have threat intelligence

308.91 -> coming from our third-party providers,

310.23 -> CrowdStrike and Proofpoint.

313.08 -> We also generate findings based on anomaly detections

316.77 -> that are backed by machine learning.

318.15 -> So GuardDuty understands, how does your environment run,

320.73 -> what's considered normal,

322.5 -> and then can generate findings

324.03 -> when it sees unusual behavior or patterns

327.48 -> that are not typically observed in your environment.

331.59 -> These are all there to help you

333.09 -> take action at the end of the day.

334.44 -> GuardDuty generates findings

335.79 -> with a high, medium, or low severity.

338.07 -> You can go consume those in the GuardDuty console,

340.8 -> or consume them through an Amazon EventBridge,

343.385 -> which is our message bus service,

345.48 -> Amazon Security Hub, or the GuardDuty APIs.

350.275 -> So let's talk more on the Kubernetes side

352.23 -> and why this is needed.

353.73 -> So every Kubernetes cluster that you launch

357.78 -> has a control plane API,

360.12 -> and you typically interact with this

361.53 -> through the kubectl CLI,

363.577 -> but you can also interact with it

364.65 -> through straight REST calls.

366.66 -> And so this API is what all the, is what the cluster uses,

371.73 -> what the pods and containers in a cluster use

373.74 -> to communicate within that cluster,

376.5 -> and it's also the API you use

378.18 -> to manage the configuration of the cluster,

380.58 -> to understand how things are configured

382.77 -> or to make configuration changes.

384.39 -> So this is a very powerful API.

387.09 -> You can do a lot with your cluster.

389.01 -> But it's also really important to understand

390.87 -> what's happening with this API

392.31 -> to identify if there's any malicious activity happening.

395.55 -> So that's where Kubernetes audit logs come in.

397.59 -> So these are sequential control plane log files

401.52 -> that help you with insight into,

404.07 -> who made a change in the cluster,

405.48 -> what API calls did they make,

407.01 -> what IP address did they come from,

409.38 -> what were the parameters they passed,

411.09 -> were they actually successful or not?

413.4 -> I look at this as CloudTrail for your Kubernetes clusters.

420 -> So GuardDuty consumes

422.94 -> these audit logs out of your EKS cluster.

427.02 -> GuardDuty consumes these automatically,

428.64 -> just like all the other log sources.

430.08 -> So whether you're running one EKS cluster

432.84 -> or hundreds of clusters,

434.67 -> GuardDuty will automatically consume those log files

437.1 -> and start looking through them and generating findings.

439.98 -> And we generate findings along three main areas,

442.8 -> policy, malicious access, and suspicious behavior.

447.45 -> Policy's gonna be looking at things

448.98 -> like granting anonymous access

452.85 -> or changing the dashboard for the cluster

455.52 -> so it's now public.

457.29 -> Malicious access is gonna be looking

459 -> at interaction from a known threat actor,

462.75 -> traffic coming from a Tor exit node

464.67 -> where somebody might be trying to hide their true identity,

468 -> or that anonymous access actually being able

471 -> to successfully connect to your environment.

474.93 -> And then the suspicious behavior aspect,

476.73 -> looking for things that are not expected

479.79 -> to actually happen in a Kubernetes environment

481.98 -> or a container-based environment.

483.42 -> So running executions in an actual Kubernetes pod,

487.53 -> a container being launched with very privileged access,

490.89 -> access that might even allow it to get

492.66 -> to the root operating system

494.4 -> of the EC2 instance that it's actually running on.

498.69 -> So from that high level,

499.86 -> we then generate findings in GuardDuty.

501.777 -> and those are aligned along the findings types

504.66 -> that you see at the bottom of the slide here.

506.1 -> These aligned to the MITRE ATT&CK framework,

508.83 -> So when you get a finding, you can understand,

511.32 -> in what phase of that attack framework

513.51 -> is this finding related to,

516 -> and what the actor or the suspicious behavior

519.24 -> might be related to,

520.32 -> so this will help guide you in how you might respond

523.02 -> or remediate that particular finding.

526.68 -> When it comes to the findings themselves,

528.57 -> on the right-hand side,

529.59 -> that's a sample GuardDuty EKS finding.

532.11 -> I don't expect you to read that.

533.19 -> I'll just talk, walk you through the details

535.44 -> of what you're gonna get.

536.34 -> You're gonna get a description of what that finding is,

539.25 -> and some remediation guidance,

541.17 -> and I'm gonna talk about that guidance here

542.67 -> in just a minute.

544.14 -> You're gonna get a severity, so based on the activity

546.63 -> and what we believe the importance of that is,

549.36 -> we're gonna give you a high, medium, or low severity.

552.24 -> We'll give you a first and last observed.

553.8 -> So first is the first time

555 -> we actually saw this behavior happening,

557.55 -> and last observed is,

558.81 -> if we continue to see that behavior

560.73 -> happening on this particular cluster,

562.89 -> we're going to update that finding,

564.48 -> so you'll get some sense of,

566.13 -> if the first update and the last update are different,

568.53 -> this potentially could still be happening

571.17 -> in your environment.

572.94 -> We'll give you the resources impacted,

574.32 -> so it'll help you understand the EKS cluster,

576.63 -> the workload within that cluster,

578.4 -> and the user associated with that cluster

580.38 -> where this activity might be related.

582.78 -> And then, understanding the action.

584.19 -> So this would be the actual

585.12 -> API interaction that's happening.

587.01 -> So what was the API call, what were the parameters,

589.95 -> was it successful or not,

591.84 -> were items that we'll be giving you in those findings.

597.21 -> So when you get the actual finding,

598.71 -> you wanna respond to it and you wanna remediate it.

600.96 -> And with GuardDuty for EKS,

603.21 -> we've published remediation guidance to help you figure out

607.2 -> what you need to do with these findings.

608.76 -> So the very top link is the GuardDuty public documents,

612.45 -> and every finding that we generate,

614.7 -> we'll have a link to this public document

617.31 -> that highlights that specific finding,

619.17 -> so we'll help you understand, what does this finding mean,

622.5 -> why did we generate this particular finding,

625.23 -> and also give you some guidance on how you might go

627.36 -> remediating that in your Kubernetes cluster.

631.14 -> Often, we will point to the second link here,

633.48 -> which is a EKS security best practices.

636.45 -> So this is a public repository that AWS maintains around,

640.47 -> what are the best practices

641.55 -> to actually maintain that cluster

643.65 -> or change the configuration of your cluster

646.14 -> to address a particular security item?

648.12 -> And so things you might find in here,

650.25 -> we might walk you through how to actually

651.66 -> make that cluster endpoint public

653.4 -> or how to implement whitelisting so that you're minimizing

656.01 -> the interactions with that endpoint,

658.98 -> rotating credentials for a user

661.29 -> that maybe has been compromised, backing out changes,

664.74 -> terminating containers, patching containers,

667.11 -> things that would help you be able to remediate that.

670.83 -> Guidance I would give you is,

671.85 -> as you go through this process,

673.41 -> if you do have to deal with findings

675.06 -> and walk through the remediation,

677.25 -> think about where you might automate

678.69 -> the steps that you're doing in the future.

680.4 -> If you find yourself going through

681.417 -> the same steps again and again,

683.97 -> that might be a chance to investigate,

685.83 -> could we actually programmatically implement some of this?

688.56 -> Even if it's not the ultimate remediation,

691.38 -> automating some of the investigation

692.94 -> to get you the information you need

694.29 -> to answer the question of,

695.88 -> is this a true threat and how am I gonna respond to it?

699.06 -> But also, if you are making changes

701.43 -> to your community's clusters

702.69 -> that you actually would now consider to be best practice,

705.03 -> and how you want all your clusters

706.68 -> to be configured in the future,

708.9 -> can you fold that back in before anything is deployed

711.69 -> or any changes are made in the future future

713.22 -> to actually validate that a cluster configuration

715.98 -> is aligning with your prescribed best practices?

719.46 -> So validating that in a potential deployment pipeline

723.45 -> or scanning it when it gets into a repository

725.67 -> before it's deployed.

729.03 -> Now, where could you use GuardDuty for EKS protection?

733.89 -> In my opinion, it could be across all of your AWS accounts.

736.89 -> I think there's great use cases

738.63 -> in many different situations.

740.58 -> For customers that...

742.08 -> There are customers who will give

744.24 -> every one of their engineers their own AWS account.

746.67 -> It's a sandbox environment.

747.9 -> It's where they're gonna do some experimentation,

750.12 -> do their initial development work.

752.16 -> You could have GuardDuty turned on

753.54 -> for that particular account, watching the EKS activity,

756.6 -> just looking out for a heads-up

757.83 -> of where an engineer might be putting a cluster

760.65 -> into a configuration state that might not be so desired,

763.95 -> and allow you to maybe get in front of what they might

767.64 -> be trying to actually move forward into production.

771.3 -> We have lots of customers who also have deployment pipelines

774.24 -> where they're going to build and deploy container images,

776.73 -> and they'll often move those through various environments

779.55 -> before they get into production.

781.59 -> So you could actually have GuardDuty turned on

783.57 -> in your development and your testing environment,

786.9 -> looking at clusters that are being deployed or updated,

789.96 -> and watching the activity of those clusters

792 -> while there may be some testing or validation going on

795.9 -> to determine if there are any potential threats

798.81 -> that are happening due to the way

800.97 -> that cluster is being configured,

803.25 -> and then also watching it

804.63 -> when it's under steady state in production,

807.3 -> looking for additional threats that would only be seen

810 -> from a production workload.

813.21 -> And then, because GuardDuty integrates

814.95 -> with AWS Organizations,

816.3 -> you can have a single delegated administrator account

818.85 -> that can have visibility around the findings

820.89 -> from all your other GuardDuty accounts that are enabled.

823.89 -> So in this example, I've got a security team account

827.01 -> that's enabled for GuardDuty,

828.27 -> and it has the ability to see all the findings

830.37 -> from all these other accounts.

831.87 -> It allows that centralized security team

833.82 -> to be able to take action across any one of those accounts

837.75 -> or be able to route the findings

839.16 -> onto the individual application owners

841.59 -> so they can go and resolve the issue,

843.48 -> but still ensuring that central security team

845.31 -> has the right level of visibility.

849.84 -> So moving on, but staying with the EKS story,

852.87 -> I'd like to talk about Detective

854.91 -> and its support for Amazon EKS

856.8 -> and what you can do with that functionality.

860.58 -> So to start with, Detective is our threat-hunting service.

864.66 -> It automatically consumes log files, just like GuardDuty.

868.89 -> It extracts time series events

871.02 -> and information about particular resources,

873.3 -> such as login attempts, API calls,

876.57 -> and the resources that are involved

878.64 -> in a particular log file.

880.62 -> It consumes data from CloudTrail, VPC Flow Logs,

885.75 -> GuardDuty findings, and EKS audit logs.

889.05 -> It's then gonna take that information and

892.08 -> put it into a graphical database

894.96 -> to help capture the relationships

897.39 -> between all of these resources

899.25 -> that are extracted from these log files,

901.62 -> and then we're going to give you

902.88 -> a visual representation of that information

906.18 -> so you can go and ask questions

908.4 -> and look at that time series information to understand,

911.76 -> is this operating normally, what's its current baseline,

916.62 -> is this spike that I'm seeing expected

919.02 -> or unexpected for this particular resource,

922.02 -> what other interactions has this resource had

925.23 -> with this particular IP address

927.09 -> or this particular EC2 instance?

930.93 -> Specific to EKS, it's all based on those EKS audit logs,

935.07 -> again, those are automatically consumed

936.96 -> when you have this feature turned on,

938.88 -> you're gonna get a unified view of your EKS cluster.

941.49 -> So you're gonna be able to see the cluster,

942.75 -> you're gonna be able to see the pods,

944.25 -> you're gonna be able to see the containers

945.57 -> that are deployed on those pods,

946.74 -> as well as the users that are assigned

948.96 -> in that particular cluster.

950.91 -> You're gonna be able to do an investigation

953.01 -> for your EKS clusters,

954.21 -> either through coming in

955.11 -> and just starting a straight search,

956.97 -> searching on a particular cluster,

958.41 -> a pod, a container, an IP address,

961.14 -> or you could start from a GuardDuty finding.

963.39 -> So you can start from GuardDuty and pivot to Detective,

965.91 -> or you can go into Detective

967.53 -> and find a particular GuardDuty finding

969.51 -> to start your investigation.

971.79 -> And then there are two new tabs

973.89 -> that have been added to Detective

975.51 -> to help you with the Kubernetes investigation.

978.39 -> Tabs are Kubernetes activity and Kubernetes API calls.

985.11 -> So with Detective, these were the current resources

987.21 -> that were available pre-EKS.

989.07 -> These are resources you could investigate,

990.81 -> understand how they're configured in their interactions,

993.81 -> and with EKS, we've given you these four extra ones,

996.69 -> EKS cluster, the pod, the image,

999.33 -> and the Kubernetes subject or the user.

1001.37 -> And so you can investigate these just with these four,

1004.34 -> or you can investigate interactions

1006.53 -> with the other resources that Detective has.

1010.28 -> And I think the best way to show this to you

1012.35 -> is to actually walk you through a brief investigation

1016.19 -> in Detective on an EKS cluster based on a GuardDuty finding,

1020.36 -> and kinda show you how you would go through

1021.86 -> identifying the root cause, identify how this happened,

1025.61 -> and understand, what's the impact,

1027.68 -> how far has this spread in my particular environment?

1032.09 -> So I'm gonna start off here.

1033.313 -> This is a screenshot from Detective.

1035.42 -> This is based on a GuardDuty finding.

1037.55 -> So the first thing I can see here is,

1039.38 -> I'm seeing some of the resources that are associated

1042.56 -> with this particular GuardDuty finding.

1044.9 -> And the first thing that pops out to me is,

1046.55 -> you can see at the bottom there,

1047.63 -> in the Actor, there's an IP address.

1050 -> I can see that this particular IP address was used

1054.59 -> to grant anonymous access into the cluster

1057.77 -> using the EKS admin user.

1061.22 -> So right away, that seems out of the ordinary.

1063.86 -> That's the administrative user granting anonymous access.

1066.38 -> So I'm gonna start an investigation.

1068.06 -> And so the first thing I'm gonna do

1069.83 -> is I'm actually gonna start

1071.03 -> and I'm gonna look at that EKS admin user.

1073.88 -> And so I can bring up the profile

1076.28 -> for that particular user in Detective.

1078.89 -> It's gonna help me understand

1080.57 -> the cluster that it's associated with

1082.1 -> and some key information about that particular user.

1085.52 -> But then, if I scroll down,

1087.05 -> this is where I'm gonna start using those new tabs.

1089.45 -> And so in this particular example,

1091.58 -> I'm using the Kubernetes API calls tab,

1094.76 -> and I can actually see for that particular IP address

1098.51 -> that was in question here, the 192.168.30.135,

1103.31 -> take a look at the IP API calls that were actually made.

1105.017 -> And in this case, I can actually see,

1106.91 -> there was a create API call

1109.01 -> to create a cluster role binding,

1110.75 -> which is creating that new anonymous user

1113.9 -> in the Kubernetes cluster.

1115.85 -> So at this point, there seems to be a compromise

1119.33 -> related to this particular administrative service account,

1122.36 -> so I'm gonna wanna investigate a little bit further

1124.34 -> to understand how this actually happened.

1128.63 -> So I'm gonna next pivot to looking

1130.34 -> at the IP address in question to understand,

1133.37 -> what is this IP address actually associated with?

1135.8 -> So I can go and choose that IP address,

1138.11 -> bring up its profile page, scroll down,

1140.72 -> and I can now actually go look at the Kubernetes activity.

1144.44 -> What is the activity of this IP address

1146.15 -> as it relates to my Kubernetes clusters?

1148.39 -> In this case, I can see that it's actually associated

1150.95 -> with a Kubernetes pod in my environment.

1153.95 -> Specifically, it's this Debian reader pod.

1157.25 -> So we have a pod named Debian reader assigned to this.

1160.94 -> We want to investigate this a little bit further now

1163.07 -> to understand what's up with this pod

1165.77 -> and why is this happening.

1167.36 -> So clicking on that pod will actually bring me

1169.85 -> to the profile page of that pod.

1172.94 -> So now I have the details about that pod

1174.89 -> and how it's configured,

1175.91 -> and the one thing that stands out here

1177.5 -> is the service account.

1179.36 -> It's been assigned this default reader account.

1181.82 -> That's supposed to be a read-only account,

1185.69 -> but I have this pod making executions

1188.9 -> to actually grant anonymous access,

1191.54 -> which is not what I would expect from a read-only account.

1194.39 -> So this is a potential indicator of compromise.

1197.21 -> I've got some things happening

1198.29 -> that I'm not expecting to happen

1200.24 -> in this particular pod or in my particular cluster.

1206.06 -> So what I'm gonna do next,

1207.2 -> I'm gonna investigate a little bit further.

1208.58 -> I'm gonna scroll down and I'm actually gonna take a look

1210.59 -> at the API activity in that particular pod.

1215.3 -> I can go and look in at the activity

1217.13 -> for that particular IP address,

1219.14 -> and I can see that there are

1220.52 -> some list activities that stand out, and the first one is,

1223.64 -> I can see that that pod is actually listing secrets.

1227 -> Now, on its own, that may not be,

1229.7 -> you know, that might be okay,

1231.38 -> because it does need that particular reader role

1233.84 -> to be able to authenticate itself with the pod.

1236.72 -> There may be some secrets that it needs

1238.7 -> to be able to do its job,

1241.37 -> so I need to dig a little bit deeper.

1242.81 -> And so what I can do is I can actually put a filter in,

1245.45 -> searching for the word "secret"

1247.67 -> to see if maybe there's some other items

1249.62 -> that would help me better understand

1251.45 -> the scope and what's actually happening here.

1254.72 -> And by doing that, I actually see that I now have a get call

1258.35 -> that's related to a secret, too,

1259.82 -> and I can clearly see here that this reader pod

1262.85 -> was actually able to get the token for that EKS admin user,

1267.92 -> which then allowed it to be able to make the call

1269.84 -> to grant anonymous access.

1271.4 -> So I'm getting a pretty good sense here

1275.06 -> that something attached to that pod

1277.16 -> is the ultimate indicator of what's happened here.

1281.39 -> What we can do here is, from the pod view,

1284.15 -> we also show you the containers

1286.37 -> that were deployed in that particular pod.

1288.38 -> And so in this example,

1290.09 -> I found that there's only one container

1292.43 -> that was deployed in the pod.

1293.51 -> It was this Debian pod,

1295.52 -> Debian container that was running in the pod.

1298.01 -> So this seems to be

1300.65 -> the suspicious source of this particular activity

1305.33 -> that I've identified.

1309.08 -> So what I wanna do now is I wanna see, you know,

1311.6 -> what did this anonymous user actually do?

1313.97 -> Did they do anything?

1314.9 -> They were granted access but, you know,

1316.46 -> have they actually been interacting with the environment?

1318.41 -> So from the Detective search functionality,

1321.35 -> I can actually bring up this particular user,

1324.32 -> bring up that user's profile page,

1326.657 -> and I can go and start taking a look at the API calls

1330.05 -> that that user might have made.

1331.97 -> And what I see here is I've actually got

1333.8 -> a couple of calls that stand out,

1335.93 -> the patch and the create call.

1338.02 -> So the patch call,

1339.65 -> this anonymous user was actually able to call

1341.84 -> the Kubernetes patch API for a webapp pod,

1345.38 -> and that allows it to actually deploy

1347.75 -> updates to a Kubernetes pod

1349.22 -> without actually having to redeploy the pod.

1352.19 -> And then, the create option actually was able

1354.77 -> to successfully create a CronJob inside of a container.

1360.23 -> So I understand that they're actually starting to do things

1364.19 -> in this environment that I'm not okay with

1366.29 -> that I need to investigate further,

1367.46 -> so I need to understand what's the impact.

1369.14 -> How far have they actually gone? How far has this spread?

1372.77 -> So I wanna go in and I wanna investigate

1376.294 -> that patch call to that webapp pod that they made.

1378.62 -> So I can actually go in and search for that particular pod,

1382.28 -> bring it up, and I can actually see

1383.93 -> that there are some containers here

1386.33 -> that are xmrig containers.

1389.06 -> These are containers that are used for crypto mining.

1392.42 -> So I now have a better understanding

1395.57 -> of what's actually happened in this environment

1399.5 -> and where I might need to go in and remove

1402.74 -> or remediate some particular items

1404.9 -> or start to build a plan on what I need to be able to do

1408.14 -> to address what this anonymous user's been doing.

1412.61 -> So overall, through that quick investigation,

1415.67 -> I've been able to actually identify that root cause,

1418.58 -> which was that Debian container that was creating the access

1423.02 -> and causing the unusual behavior.

1425.54 -> We can see how that scenario happened.

1427.28 -> We actually saw that the container was actually able

1429.29 -> to get the secret for an administrator account

1432.11 -> and actually create that anonymous user.

1434.33 -> And then we were able to start identifying that impact.

1437.54 -> We were able to see it created a CronJob.

1439.34 -> We were able to see it deployed some extra containers

1441.8 -> that were actually used for crypto mining.

1443.3 -> And so we can start to figure out what our remediation plans

1447.92 -> are gonna be based on this information.

1452.09 -> Now, what I just did was an investigation

1454.7 -> based on a single GuardDuty finding.

1458.99 -> Within Detective, we have a new feature

1460.94 -> that was recently launched called finding groups

1463.22 -> that helps you see activity that is related

1466.85 -> to similar findings against a common activity.

1469.88 -> And so what you see here is a screenshot of a finding group

1474.08 -> that is focused on findings that are doing,

1477.86 -> along the MITRE ATT&CK framework,

1479.63 -> discovery and impact actions.

1483.11 -> And if I scroll down in those finding groups,

1486.26 -> I see a lot of the same things

1487.52 -> I just showed you in that investigation,

1489.23 -> but I also see that there are two other GuardDuty findings

1493.01 -> that are related to this same group and type of activities.

1498.89 -> I can also see the IP address used for the anonymous access,

1501.92 -> the anonymous user,

1503.9 -> the IP address associated with that container or that pod,

1508.7 -> the fact that eks-admin was granting anonymous access,

1511.46 -> and that there was an attempt to get persistent access

1514.85 -> inside my EKS cluster.

1517.67 -> So these finding groups can be a really benefit

1522.08 -> as far as figuring out, what do I need to investigate,

1524.69 -> or what's actually happening

1526.88 -> inside my environments or within my Kubernetes cluster,

1530.42 -> and may actually help you better establish,

1532.7 -> where do I need to go next

1534.23 -> when it comes to an investigation?

1537.17 -> Looking at things in a bigger perspective,

1539.51 -> this has got three GuardDuty findings versus one

1541.64 -> that may actually even make more priority as far as,

1543.98 -> where do I need to start my investigation?

1546.47 -> It may help you figure out what you need to do

1548.15 -> rather than starting from a single finding

1550.49 -> or searching across a single resource.

1556.67 -> Let's go back to GuardDuty for just a couple of minutes

1559.79 -> and talk about malware protection.

1561.77 -> So the malware protection feature was launched in GuardDuty

1566 -> July of this year at the re:Inforce conference.

1569.42 -> It's just another feature of GuardDuty,

1571.73 -> and because it's GuardDuty,

1573.8 -> the first thing that stands out here is

1576.17 -> it's a fully-managed feature.

1577.94 -> You are just responsible for accepting

1580.04 -> the service link role for this feature and turning it on.

1583.73 -> GuardDuty takes care of all the management and orchestration

1587.75 -> of this malware protection as part of the service.

1591.98 -> So what is it focusing on?

1593 -> It's focusing on right now on scanning

1595.67 -> AWS Amazon EC2 instances,

1598.64 -> and it's gonna be looking at instances

1600.23 -> that are related to a GuardDuty finding.

1603.14 -> So there are 29 different GuardDuty EC2 finding types

1608.3 -> that will trigger a malware scan of your EC2 instances.

1614.78 -> We're focusing not only on EC2

1618.2 -> but everything that's going on on EC2

1620.27 -> from a container perspective.

1621.65 -> So this service is container aware.

1623.78 -> So whether you're running your own Docker containers

1626.33 -> and you're self-managing them,

1627.98 -> you're running Amazon EKS or ECS on EC2,

1632.69 -> This service can identify malware

1634.61 -> that is related to your containers

1635.84 -> and report back that information to you,

1637.79 -> and I'll show you an example of that in a few minutes.

1642.08 -> When malware is detected,

1643.64 -> this is gonna show and help you identify it

1646.55 -> as a brand-new finding in GuardDuty.

1648.23 -> We'll be very clear that you have an additional finding

1650.99 -> around malicious files found on your EC2 instance.

1656.006 -> And we do this detection through a combination

1658.7 -> of our own threat intelligence,

1661.4 -> machine-learning algorithms,

1662.99 -> and third-party engines

1665.18 -> that we incorporate into the service.

1669.26 -> And one of the biggest things about this,

1671.03 -> it's a managed feature and it's agentless.

1673.46 -> You do not have to install or configure or bootstrap

1676.88 -> any security configuration or any software

1680.6 -> in order to be able to make this happen

1682.88 -> or to take advantage of it.

1685.37 -> So let me walk you through how this actually works

1688.22 -> so you have a better sense

1689.27 -> of what this managed service is doing for you.

1692.18 -> The first thing I wanna highlight.

1693.23 -> I mentioned earlier that you have to accept

1694.653 -> a service link role in order to be able

1696.53 -> to use the GuardDuty feature.

1698.45 -> That role is giving GuardDuty extra permissions

1701.75 -> to be able to manage and orchestrate

1704.33 -> some extra details within your account

1706.49 -> so that it can successfully scan and identify malware.

1710.27 -> And one of the key things here is around KMS keys.

1713.87 -> So if you are using KMS to encrypt your EBS volumes

1717.47 -> attached to your EC2 instances,

1719.81 -> that service link role is giving the GuardDuty service

1723.56 -> read permissions to be able to use those keys

1726.53 -> so they can then, you know, re-encrypt and view that data,

1730.55 -> or decrypt and view that data

1732.89 -> when it is doing a malware scan.

1736.07 -> So let's walk through a scenario

1737.57 -> where you have launched an EC2 instance.

1739.58 -> You've got a couple of EBS volumes

1741.62 -> that are encrypted with a KMS customer-managed key.

1746.39 -> GuardDuty generates a finding around that EC2 instance,

1749.12 -> and that finding is one of the 29 finding types

1752.51 -> that would trigger a malware scan.

1755.57 -> First thing that GuardDuty is gonna do

1757.16 -> is it's going to create,

1758.57 -> call the APIs to create a snapshot

1761.39 -> of each of the EBS volumes,

1763.67 -> and then it's going to share that snapshot

1765.92 -> with the GuardDuty service team account.

1770.24 -> We're then going to end the GuardDuty service account,

1773.66 -> launch compute, and create new EBS volumes,

1776.96 -> and attach those to the compute,

1778.19 -> and those volumes are gonna be based on those EBS snapshots

1780.95 -> that were shared with the GuardDuty service team account.

1784.04 -> We'll use the KMS key that's been shared with the account

1787.46 -> to be able to decrypt and be able to read that data.

1791.03 -> If any malware is identified,

1792.89 -> we will then report that back as a new finding in GuardDuty.

1797.09 -> We'll then delete those snapshots from your account,

1800.72 -> and then we'll retire or terminate

1802.7 -> the compute infrastructure used

1805.07 -> to be able to do that scanning.

1808.25 -> So what do you get after all that's happened?

1810.32 -> What do you get out of that?

1811.4 -> So these are examples of findings.

1813.89 -> On the left is an EC2-related finding,

1816.26 -> focusing on an instance that we observed

1818.9 -> Bitcoin mining occurring.

1821.3 -> And then the right-hand side is a malware finding,

1824.84 -> malicious file finding type.

1827.21 -> And you can see at the bottom

1828.68 -> we're highlighting the threats detected.

1830.36 -> We'll actually highlight what was the malware that we found

1833.57 -> and where is it located on that particular EC2 instance,

1837.74 -> and we'll help you correlate that.

1839.39 -> You can see on the left and the right,

1840.74 -> it's the same EC2 instance,

1842.387 -> and so you have a stronger

1846.32 -> sense of what might be causing

1848.63 -> that particular EC2-related finding,

1850.58 -> because now you're also seeing malware

1852.56 -> related to that same EC2 instance.

1856.52 -> We'll also help you if you wanna understand specifically

1859.52 -> which GuardDuty finding triggered that malware event.

1863 -> Maybe you have multiple findings

1864.35 -> come in for the same instance.

1866.18 -> If you look on the malware finding,

1868.61 -> you will see the triggering finding ID,

1871.46 -> and you'll be able to use that to match up

1873.44 -> to a specific GuardDuty finding

1875.09 -> so you know which one of those triggered that malware scan,

1879.74 -> and be able to once again have more information

1881.96 -> that might give you more data

1883.25 -> on why a particular finding is occurring.

1888.35 -> And then we talked about this feature being container aware.

1892.13 -> So whether you're running self-managed ECS or EKS on EC2,

1897.17 -> we will report back the details about those containers

1900.44 -> as part of the finding types.

1901.64 -> So this is an example of an ECS cluster

1904.25 -> where we're reporting back the name of the cluster,

1906.92 -> the tasks,

1908.12 -> the task in the cluster that this is associated with,

1910.58 -> and the actual container that this malware was identified on

1914.45 -> so that you have very specific information

1918.09 -> for an investigation or a remediation

1920.33 -> related to this particular malware

1922.4 -> that's been identified in your containers.

1927.47 -> Okay, let's move on to Amazon Inspector,

1930.26 -> and what Amazon Inspector can give you

1932.21 -> when it comes to containers and being able to identify

1936.38 -> vulnerabilities in your container images.

1939.71 -> So Inspector,

1941.42 -> we relaunched this service at re:Invent last year.

1944.57 -> It is a vulnerability management service focused on doing

1948.2 -> continuous assessments of your EC2 instances

1951.92 -> and your container images that are stored in Amazon ECR,

1955.28 -> the Elastic Container Registry.

1959.27 -> We produce findings

1962.45 -> for these vulnerabilities in the Inspector console.

1964.73 -> You can consume them there directly from the console.

1967.16 -> If you're using Amazon Security Hub,

1968.57 -> we will push a copy of those findings to Security Hub.

1971.21 -> You can consume the events through EventBridge.

1973.85 -> We have lots of third party partners

1976.13 -> that are integrated with Inspector

1977.45 -> to consume those and help you

1978.68 -> with your vulnerability management through those partners.

1981.86 -> And for containers,

1983.81 -> you can also go see the findings in Amazon ECR.

1986.78 -> If you have developers, engineers,

1988.94 -> who are pushing container images

1990.2 -> and need to see what those vulnerability findings are,

1993.11 -> you don't wanna give them access necessarily

1995.06 -> to the Inspector service,

1996.41 -> 'cause that's got a lot more about other vulnerabilities

1998.99 -> that are present in the organization.

2001.21 -> They could just go to the Inspector,

2002.68 -> the Amazon ECR console, or call the ECR APIs,

2005.86 -> and actually get the findings

2007.06 -> for their containers that they care about.

2011.62 -> So when it comes to the Inspector scans for containers,

2015.43 -> what we're doing is that every time

2017.38 -> an image is pushed to Amazon ECR,

2020.83 -> Inspector will actually get a copy,

2022.24 -> pull down a copy of that particular container image.

2025.9 -> We'll extract each of the layers from that container image

2029.5 -> so we can look at 'em individually

2030.85 -> and help identify vulnerabilities individually.

2035.26 -> We're gonna first look at the actual operating system

2037.96 -> that's associated with that container

2039.97 -> and look at the packages that are installed

2042.52 -> through the package manager of that OS.

2045.43 -> But then we're gonna go a lot further.

2046.75 -> We're actually going to look through

2048.52 -> the entire file system of that container

2051.82 -> to identify other places where you might have dependencies

2056.29 -> or libraries that are vulnerable.

2058.27 -> Inspector has support for 10 different programming languages

2061.78 -> so that we can look at those additional files and understand

2064.3 -> where you might have additional vulnerabilities

2066.55 -> that are present on that container image.

2069.627 -> We're then gonna take everything that we extracted there,

2071.467 -> and we're gonna compare it

2072.46 -> against our vulnerability database

2074.5 -> to determine if you have any vulnerabilities present

2077.92 -> on that particular container image.

2082.06 -> So when you're using Inspector for container scanning,

2085.93 -> there's a couple of different ways you can configure it.

2087.25 -> The first one you can configure it is for scan on push.

2090.25 -> Every time you push a new container image to ECR,

2093.01 -> Inspector will grab it and do those scanning steps

2095.05 -> that I just talked about.

2096.69 -> In ECR you can actually add filters

2098.77 -> as far as which repositories

2100.72 -> you would actually like to have scanned on push.

2103.45 -> You can also enable continuous scanning

2106.18 -> and implement filters on which repositories that applies to.

2109.21 -> And that continuous scanning means that every time,

2112.03 -> after Inspector has scanned initially,

2115.03 -> every time the vulnerability database

2116.89 -> for Inspector is updated,

2118.12 -> it will re-scan those container images

2120.07 -> to see if there are any new vulnerabilities

2121.75 -> present on that particular image.

2124.78 -> Now, the way you can configure that continuous scanning

2128.59 -> is you can set it a finite timeframe, 180 days or 30 days.

2133.45 -> We have some customers who are regularly

2136.21 -> deploying new container images,

2137.65 -> and they don't want the older ones looked at.

2140.47 -> Or you can set it for lifetime.

2141.88 -> So as long as that container image is there in ECR,

2145.09 -> we will continue to re-scan it for new vulnerabilities.

2148.69 -> Now, some things to keep in mind,

2149.74 -> if you've set a finite timeframe, the 180 or 30 days,

2153.55 -> when that timeframe expires,

2155.62 -> we'll set the the status

2157.51 -> for that particular image to "inactive"

2159.91 -> and we'll set the reason code to "expired"

2161.68 -> so you know exactly why we are no longer scanning

2164.47 -> that particular container image.

2166.45 -> We'll also flip the findings status

2169.12 -> for all the findings for that image to "closed,"

2171.79 -> and after 30 days we'll delete

2173.11 -> the closed findings from Inspector.

2177.82 -> Now, when you're actually in the Inspector console,

2179.83 -> right away from the dashboard,

2181.06 -> we're gonna give you a couple of key pieces of information

2183.19 -> to help you identify where you should be spending your time.

2186.58 -> We're gonna give you the list

2188.32 -> of the most critical vulnerabilities identified,

2190.75 -> both for repositories in ECR

2193.9 -> as well as individual container images.

2196.51 -> So this is intended to help you understand,

2198.43 -> where might I start,

2199.75 -> where am I gonna have the most impact

2201.67 -> when it comes to addressing my container vulnerabilities?

2205.72 -> But you can also go in and look at an individual container

2208.99 -> within Inspector to understand its vulnerabilities more.

2211.87 -> So we're gonna give you details about that container,

2214.45 -> summary of the various vulnerabilities identified,

2217.48 -> and you can filter through or look

2219.1 -> at every single one of those vulnerabilities

2221.86 -> and the details related to them.

2225.07 -> You could also choose the by layer tab,

2227.11 -> where we'll actually break down

2228.79 -> each layer of the container image

2230.71 -> and tell you which vulnerabilities exist

2233.05 -> in each one of the layers.

2233.98 -> And this can be important if you're trying to figure out,

2236.38 -> where am I going to fix my Docker build,

2239.2 -> which parts of that do I need to actually update

2241.75 -> in order to be able to remediate those vulnerabilities?

2246.04 -> Now, sticking with the remediation,

2248.68 -> one of the recent additions we've had to Inspector

2250.78 -> is we're providing you more remediation guidance

2253.3 -> when we're telling you that you have vulnerable packages

2256.09 -> as part of your container images,

2258.01 -> We'll now tell you, what are the affected packages,

2260.41 -> we'll tell you, what's the affected version

2262.15 -> that we identified,

2263.44 -> and what version is it fixed in,

2266.17 -> and we'll also give you remediation steps,

2268.96 -> steps you would run to actually upgrade that package

2272.17 -> so that you are now running on a non-vulnerable version.

2275.92 -> This is really helpful if you are looking

2278.56 -> to assign a ticket to a particular team.

2281.44 -> You might be able to include this information in that ticket

2283.63 -> so they clearly know where they're gonna go

2285.88 -> to be able to remediate that.

2287.38 -> Or if you're building some sort of automation

2289.09 -> to be able to automatically address those vulnerabilities,

2292.21 -> you can use this information to help orchestrate

2294.64 -> that remediation automatically.

2298.72 -> Now, when it comes to the consumption of Inspector findings,

2302.11 -> there's a couple of different ways you can go about that.

2303.94 -> The first one we've talked about already

2305.56 -> is through the console itself,

2307.27 -> where you can go and look at the summary

2309.01 -> or the details about specific container images.

2312.31 -> You can also use the Inspector APIs to query that

2315.07 -> and get information about active vulnerabilities

2317.89 -> for across all of your container images

2320.89 -> or a particular container image.

2323.44 -> And the other ways that I really like

2324.85 -> and I encourage customers to take a look at

2326.68 -> are first of all the event-based approach.

2329.77 -> Every finding that Inspector generates,

2331.84 -> it will send a copy of that to Amazon EventBridge,

2334.24 -> which is our message bus service.

2336.31 -> You can then author rules in EventBridge

2339.34 -> with patterns to match all Inspector findings

2343.24 -> or a certain severity level

2344.74 -> or certain types of vulnerabilities,

2347.14 -> And then, when a pattern is matched,

2349.12 -> you can actually route that onto a specific target,

2351.61 -> and that can be 20-plus AWS services

2355.18 -> as well as a third-party API endpoint,

2358.18 -> which allows you to be able to integrate that

2361.06 -> into your operational processes

2363.76 -> or to orchestrate a, you know, a backend response

2367.81 -> that you might be taking for that particular vulnerability.

2371.02 -> The other thing I'd like you to consider

2372.76 -> and I point customers at

2374.08 -> is the generation of an actual findings report.

2376.66 -> There are customers who want a point-in-time report of,

2380.14 -> what are my current active vulnerabilities?

2383.02 -> Within Inspector, you can generate a findings report,

2385.18 -> either through clicking a button in a console

2387.13 -> or making an API call.

2389.35 -> That findings report will be produced

2391.3 -> as either a CSV or a JSON file

2393.91 -> written out to an S3 bucket of your choice.

2396.94 -> You can then take that in as your current report

2399.01 -> of active vulnerabilities in your environment.

2403.06 -> Now, when it comes to remediation, we've talked about,

2405.28 -> ultimately, you need to go in and update,

2409.03 -> patch the software that's part of your particular Docker,

2414.46 -> Docker configuration,

2415.69 -> and rebuild and redeploy that particular container image.

2418.9 -> Now, if you delete the container image from ECR

2421.29 -> as part of that remediation,

2423.55 -> Inspector will immediately close those findings

2425.95 -> so that you're not looking at them as still being active,

2428.65 -> 'cause that container image no longer exists.

2431.35 -> And then, when you publish an updated image,

2433.21 -> Inspector will re-scan that container image

2435.85 -> that you've updated.

2438.58 -> Now, one of the things our customers do,

2441.46 -> as I talked about earlier, is they have pipelines

2443.98 -> to be able to build and deploy container images,

2446.95 -> and with Inspector,

2449.02 -> one of the things you can actually do

2450.67 -> is integrate that as part of that pipeline.

2454.15 -> So in this example here,

2455.35 -> I have users pushing into a code commit repository.

2458.92 -> It's triggering a build in code pipeline.

2462.28 -> It's pushing that to Amazon ECR.

2465.01 -> It has a stage where it actually waits for approval.

2470.02 -> Inspector's gonna scan that container image out of ECR.

2473.98 -> It's going to receive a "scan complete" message

2476.92 -> in Amazon EventBridge.

2477.96 -> In this case, I have it calling a Lambda function

2480.91 -> that's actually evaluating that "scan complete" message

2483.49 -> against thresholds that are defined

2485.89 -> as part of that function.

2486.79 -> So it's looking for critical, high, medium,

2489.28 -> and low thresholds to confirm, is it in or out of bounds?

2493.96 -> We'll make an approve or reject decision

2495.97 -> based on those thresholds

2497.74 -> and then send that response back to the pipeline.

2500.5 -> If it's approved, that container image will actually

2502.57 -> be deployed into the next phase of the pipeline,

2506.74 -> or if it's rejected,

2507.76 -> that pipeline will complete with a rejection status.

2513.34 -> So how do you do all that?

2515.08 -> So we now have a blog out

2517.6 -> that will walk you through in detail,

2520.93 -> what were all those steps that were just there

2523.45 -> in that architecture diagram,

2525.58 -> give you a cloud formation template

2527.05 -> to be able to deploy that solution in your own environment,

2530.17 -> as well as the code that goes with the Lambda functions

2532.96 -> to be able to actually do that investigation

2535.06 -> of your container images and make those decisions.

2537.64 -> So you can use this as a starting point to understand,

2540.01 -> what is the interaction,

2541.03 -> or what is the data that Inspector produces,

2543.67 -> and how might you incorporate that into your own pipelines

2547.51 -> to more automate being able to make

2549.61 -> the right security checks

2550.99 -> before your container images

2552.52 -> are actually deployed into your environments.

2556.39 -> Okay, with that, I'm gonna invite Mrunal up,

2559.09 -> and he's gonna talk more about HBO Max

2560.98 -> what they're doing from a container perspective.

2569.26 -> - Thank you, Scott.

2573.13 -> I'm Mrunal Shah.

2574.18 -> I'm the head of container security

2576.28 -> for Warner Bros. Discovery HBO Max.

2579.58 -> My team is in charge of securing

2581.89 -> the containers and the clusters

2583.69 -> across the whole Warner Bros. Discovery portfolio,

2587.23 -> including HBO Max.

2589.33 -> For those who are not familiar with HBO Max,

2591.76 -> we're one-of-a-kind streaming service,

2594.34 -> delivering unique stories, complex characters,

2597.46 -> and immersive new world.

2598.99 -> We're home to some of the world's most loved content.

2603.19 -> We have an amazing sizzle for all of you here,

2606.07 -> so please enjoy.

2608 -> (dramatic music)

2612.925 -> - [Corlys] Our worth is not given.

2616.99 -> It must be made.

2618.882 -> (ominous music)

2621.43 -> - You have no idea what loss is.

2624.425 -> - [Speaker] It's very intoxicating.

2627.22 -> - To bend the rules, color outside the lines.

2631 -> - Would you be interested in having an affair?

2633.154 -> - Oh, wow! Surprise!

2635.373 -> - This is gonna be one of the best days of my life!

2641.5 -> - What are we doing?

2642.333 -> - [Everyone] Changing the world.

2643.84 -> - That's right.

2644.71 -> We're pushing forward!

2646.362 -> (dramatic music)

2647.524 -> (dog barks)

2651.34 -> - [Lizzo] I'm always chasing the music.

2653.5 -> - [Chef] I have the drive. I have the passion.

2655.75 -> - Wow!

2659.198 -> - Awesome! - Cool!

2662.848 -> (dragon roars)

2671.86 -> - All of this awesome content

2673.9 -> runs on massive AWS infrastructure.

2676.9 -> We have hundreds of clusters,

2679.45 -> which is a mix of native Kubernetes clusters

2682.33 -> and EKS clusters running hundreds of microservices

2686.26 -> that scale up to tens of thousands

2688.3 -> of containers during primetime.

2691.21 -> We receive more than a billion requests per day,

2694.33 -> and we generate petabytes of logs per year.

2697.06 -> It's truly a massive scale.

2700.69 -> Some of the common container risks and threats

2703.36 -> faced by all containerized workloads

2706.18 -> are vulnerable application code.

2709.12 -> Application running in containerized environment

2712.39 -> may have direct or indirect vulnerabilities

2715.6 -> through open source projects

2716.86 -> or older versions of software libraries.

2721.96 -> We could have poorly-configured containers and pods

2725.23 -> which are misconfigured to run as root,

2727.36 -> or allow privileged escalations

2729.04 -> within your containerized workloads.

2733 -> Insecure networking where you have services

2736.3 -> allowed to speak with all other services

2738.4 -> without any authentication or authorization.

2742.21 -> This could cause one point of failure

2744.19 -> in case one of the services is broken into.

2748.84 -> And vulnerable underlying hosts, unpatched vulnerable hosts,

2752.02 -> are always the biggest risks.

2757.6 -> Additionally, the are the risks of overprivileged services

2762.52 -> and users on the cluster,

2765.61 -> and secrets that could be exposed

2768.25 -> if they're stored within the containers or in plain text

2771.79 -> to the volumes attached to the pods.

2777.1 -> To monitor and mitigate these threats,

2780.4 -> the traditional threat detection model includes

2783.46 -> taking all the cloud plane logs and EKS control plane logs

2788.56 -> and the service logs and sending them to a central SIEM

2793.12 -> where security engineers build and run

2796.42 -> complex threat-hunting queries,

2798.85 -> and they scan across a massive amount of data

2802.39 -> to be able to detect if there's any anomalies

2805.72 -> happening within our environment.

2807.52 -> Additionally, every so often,

2809.62 -> you'll have service operators reach out to the engineers

2813.43 -> if they suspect there's anomalies within their services.

2819.4 -> There's challenges with this type of threat detection.

2822.61 -> The first challenge is, it's not always easy

2824.95 -> to detect anomalous or malicious pattern.

2831.07 -> There could be several data points that make it hard

2834.85 -> to distinguish a real traffic from malicious traffic,

2838.24 -> especially with massive

2841.78 -> datasets that these engineers are scanning.

2847 -> Security engineers have to build complex queries

2849.58 -> on massive disjointed datasets

2853.57 -> and be able to see if there's any anomalies

2858.58 -> To run all of this infrastructure is complex,

2862.51 -> and you need complex infrastructure

2864.4 -> to be able to support constant threat detection,

2867.46 -> especially when you're a multi-cluster environment.

2872.32 -> And real-time alerts are not always achievable.

2875.59 -> Security engineers are reactively searching for anomalies

2879.31 -> instead of proactively detecting them and fixing them.

2885.61 -> Keeping these drawbacks in mind,

2887.68 -> we fully embraced GuardDuty to protect our EKS clusters.

2893.83 -> GuardDuty seamlessly integrates with our environments,

2897.31 -> and it auto-ingests VPC Flow Logs,

2900.73 -> Amazon EKS audit logs, and DNS logs,

2905.02 -> and runs machine learning on top of these logs

2908.11 -> to be able to detect any anomaly

2909.79 -> and proactively send us alerts if something is detected.

2915.55 -> Additionally, GuardDuty helps us scale across the board,

2920.74 -> because there's no additional

2922.9 -> integration needed inside the cluster.

2927.31 -> Once an anomaly is detected,

2929.35 -> we pivot onto AWS, Amazon Detective.

2933.67 -> Amazon Detective, just like Amazon GuardDuty,

2937.06 -> collects and aggregates logs

2938.71 -> across different variety of data source for us

2941.89 -> and provides us with a set of linked data

2946.09 -> to allow us to do faster and more efficient investigations.

2951.28 -> We're able to, once an anomaly's detected,

2954.01 -> we're able to pivot at cluster, container, and pod level

2957.79 -> to be able to quickly detect the root cause

2961.18 -> and quickly remediate them.

2965.65 -> But we don't always wanna in a position of

2969.535 -> reactively fixing issues.

2970.93 -> We wanna proactively prevent and reduce risks

2974.26 -> within our environment.

2976.33 -> And we're using GuardDuty to help us get there.

2980.92 -> Security engineers would commit

2983.2 -> their hardening scripts in a GitHub location,

2988.36 -> from where the pipeline picks up those hardening scripts

2992.89 -> and uses a combination of Packer and Ansible

2995.2 -> to bootstrap very secure host.

2999.49 -> These hosts are then scanned by Inspector,

3002.37 -> and we extract the AMI,

3004.71 -> and we publish it to a central AWS location,

3008.01 -> from where

3009.72 -> it's made for,

3011.16 -> Those AMI are made available for consumptions to developers

3015.63 -> using ISE of their choice, such as Terraform.

3024.39 -> We also use Inspector for helping us fix the images.

3029.57 -> So the first step of fixing images

3031.53 -> is being able to detect where the problems lie.

3035.1 -> We use Inspector to detect

3036.72 -> all the vulnerable images within our environment,

3040.26 -> and once we detect those images,

3042.96 -> we go and identify which layer is introducing

3045.93 -> the vulnerabilities within that image.

3050.22 -> Once we have the full picture of the,

3052.68 -> which are the vulnerable images and where the,

3056.49 -> which layer is introducing that vulnerability,

3058.98 -> we take remediation steps,

3061.05 -> such as swapping to a more secure base image,

3063.72 -> or upgrading the packages,

3065.49 -> or removing the build-time packages

3067.23 -> that are not needed for the runtime environment.

3071.88 -> And once those images are fixed

3073.95 -> and pushed to a registry, such as ECR,

3077.22 -> we're able to do a continuous verification

3080.67 -> that there's no additional vulnerability

3083.07 -> present in that image.

3086.46 -> Additionally, we also use multi-stage Docker file

3091.05 -> on top of Inspector

3093.21 -> to help provide clean container images to our developers.

3099.15 -> And the way it works is...

3101.64 -> This is the traditional single-stage Docker build.

3105.72 -> If you look at the Docker file,

3107.25 -> you can see that it's picking up an open source

3112.14 -> base image from a Docker hub,

3116.16 -> and then it goes about bootstrapping the application

3118.44 -> by adding the additional packages needed and adding,

3122.07 -> and building binaries for the application to run.

3126 -> Once we run a scan on an image such as this,

3128.82 -> we'll find out that there's tons of vulnerabilities

3133.61 -> that were pushed into this container through the base image.

3137.61 -> So for this specific container,

3139.86 -> we would at least find 431

3143.13 -> vulnerabilities that need to be fixed.

3146.67 -> So the team uses distroless secure base images,

3150.84 -> which is an open source project by Google

3153.99 -> for providing a very slim base image which has no packages

3159.39 -> besides the packages that you only need

3161.4 -> for building your application.

3164.28 -> And all the build-time packages

3166.68 -> that are not needed for the runtime

3168.84 -> are removed from the application.

3170.64 -> And the way it works is,

3172.14 -> we allow you to use any image that you need

3175.59 -> for building your application,

3177.15 -> and then we take the build artifact

3179.16 -> and we push them onto the secure base image,

3181.98 -> which has nothing but just enough,

3185.37 -> just enough packages for your software to run.

3188.25 -> It doesn't even have enough permissions

3190.47 -> to be able to execute within that base image

3193.5 -> or be able to download additional packages,

3195.96 -> so there's no package managers

3197.22 -> or no bash or shell within these images.

3201 -> And when we scan such an image,

3202.32 -> you'll find out that most of the vulnerabilities

3204.54 -> or all of of them are gone.

3208.47 -> Now that we once provide clean AMIs

3211.41 -> and clean base images to our developers,

3215.19 -> we wanna be able to continuously scan our environments.

3218.85 -> We wanna be able to scan our host

3220.5 -> and we wanna be able to scan our containers

3224.43 -> in case any vulnerability out in the wild,

3227.94 -> such as Log4Shell, comes into the picture.

3230.16 -> We wanna be able to inventorize all of the packages and CVEs

3235.5 -> that are running within our environment,

3237.36 -> and we're using Inspector effectively

3239.22 -> to be able to do that within a single AWS account.

3242.82 -> But we don't stop there.

3245.34 -> We're multi AWS...

3248.46 -> We have multi AWS footprint, and all of...

3252.78 -> We're using AWS Organization to funnel

3255.39 -> all the logs from GuardDuty, Inspector, and Detective

3260.37 -> into one consolidated AWS account,

3262.92 -> which is open to the security engineers

3265.53 -> to be able to query any host and image

3268.95 -> to detect the vulnerabilities present in them.

3271.35 -> This gives us a holistic view of everything

3273.33 -> that's happening in our environment.

3275.91 -> With that said, thank you for sticking such late.

3282.926 -> And yeah, we will have a survey passed after this,

3286.65 -> so please fill out that survey to let us know

3289.29 -> how the presentation went for you.

3291.3 -> Thank you.

Source: https://www.youtube.com/watch?v=KIGTCJiPrVI