AWS re:Inforce 2023 - How Okta empowers devs to find & fix security issues with Snyk (APS206-S)
AWS re:Inforce 2023 - How Okta empowers devs to find & fix security issues with Snyk (APS206-S)
As a cybersecurity company, Okta understands the importance of developing an iterative DevSecOps process to manage application risks across its container environment, empowering developers with the tools they need to integrate automated security across the application stack, decreasing false-positives, and fixing vulnerable containers and open-source code directly from the repository. In this session, learn how Okta integrated Snyk’s developer-first security controls across their development workflow and how you might use Snyk’s risk-based approach to prioritization to holistically assess risk and then prioritize and fix issues at scale. This presentation is brought to you by Snyk, an AWS Partner.
ABOUT AWS Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.
AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.
#reInforce2023 #AWSEvents
Content
0 -> - All right, well, welcome everybody,
2.85 -> and thank you for joining us
3.87 -> for one of the final
sessions of AWS re:Inforce.
8.4 -> We're honored, of course.
10.14 -> We know that every time, you know,
12.06 -> at these kinds of conferences,
the best is saved for last,
14.58 -> and so we're honored,
that re:Inforce has put us
17.1 -> in this spot just before we
break for the food trucks.
20.97 -> And I heard somebody out there
singing some Guns and Roses,
24 -> and so we know there's
some excitement coming
26.19 -> shortly after this presentation.
28.44 -> We have a fairly short presentation here,
30.33 -> so we'll get you on your way,
but my name is Jim Armstrong.
33.214 -> I run the product marketing
organization at Snyk.
36.273 -> I've been with Snyk for about
three and a half years now.
39.99 -> I was at Docker before that,
and at VMware before that.
44.73 -> But this isn't about me.
45.9 -> This is really about
Okta, and how they run
49.89 -> container security, and how they use Snyk.
51.72 -> So I've got Zaher with
me, who's from Okta.
54.6 -> I'll let him introduce himself.
56.61 -> - Hey, I'm Zaher Jarjoura.
58.65 -> I'm a staff cloud
security engineer at Okta.
62.22 -> I work on the customer
identity cloud product,
64.32 -> which many of you may know as Auth0.
67.2 -> I've been in the tech industry
for about 10 years now,
70.11 -> and I feel like I've
done a bit of everything,
72.181 -> from systems engineering
to corporate security,
75.45 -> and for the last four years or so,
77.25 -> I've been in cloud
infrastructure security.
79.83 -> - Yeah, I think Okta's
one of those companies
81.57 -> that probably most people
are extremely familiar with.
84.24 -> Probably most of you either use Okta today
87.27 -> at your organization, or you
log into some other service
90.27 -> that uses Okta, and so most
people are probably familiar
93.568 -> with what Okta does, but what
we wanna dive in today is
97.5 -> a bit about their architecture,
99.69 -> and Auth, they have a
really neat architecture
102.39 -> for how they deploy things to AWS
104.82 -> in a very automated fashion.
106.2 -> A lot of those things are
containerized running on EKS,
110.28 -> so really exciting kind of
architecture that scans to,
114.42 -> or scales to support
thousands of deployments
118.942 -> if they need to, all
containerized, and of course,
121.44 -> a big part of that is making
sure that for developers,
124.8 -> it's simple enough that they
can do this very easily,
127.41 -> and do it very quickly, and
do it in a repeatable fashion.
130.56 -> Part of simplifying that
includes simplifying
133.68 -> the security process as well,
135.21 -> and that's what we're gonna
spend a lot of time in,
138.06 -> pretty much all of our time today talking
140.07 -> about is simplifying that
security process that they have,
143.7 -> and specifically, we're gonna talk a lot
145.23 -> about container security in this session.
147.57 -> All of this architecture
that we'll dive into here
149.875 -> in a little bit is containerized,
153.09 -> and like I said, it's running on EKS.
155.01 -> We'll talk about how they
handle container security,
157.47 -> and how they sort of abstract
that and make it simple
159.75 -> for developers to deploy
these applications
162.51 -> and ensure that they're secure,
164.52 -> and a little bit about the future plans,
166.44 -> what they're doing and thinking
about for the future at Okta
170.1 -> to make container security even simpler,
172.44 -> to make it simpler for
developers, and make it simpler
174.48 -> for the security team
as well to handle this
177.24 -> and scale up the container security,
179.13 -> and I'll talk a little bit
about what Snyk is doing,
182.46 -> and some of the things that
we've done in our product
184.32 -> to help make that simpler as well.
187.14 -> So that's our plan for today.
192.168 -> You know, when it comes
to people using Snyk,
193.59 -> I know a lot of our
customers want to have choice
196.71 -> as to how they use Snyk.
198.12 -> It's a SaaS just like Okta
and Auth0 are a SaaS service,
200.94 -> but a lot of people want to
have it in a particular region
204 -> of the world, or they want
to have a private instance,
207.18 -> and you've got customers who ask
209.1 -> for the same thing at Auth0 as well.
212.49 -> What are some of the, you know,
213.63 -> other reasons that customers
look for your service
216.75 -> to be sort of a private instance?
219.06 -> - Yeah, so there are a
number of reasons in addition
221.34 -> to those, you know,
few that you mentioned.
223.5 -> We offer greater performance guarantees
225.6 -> above what we could offer for
our public cloud deployments
228.36 -> when it comes to throughput,
230.7 -> and then you also have the added benefits
232.41 -> and peace of mind of
having your own, you know,
234.24 -> isolated infrastructure
dedicated to only your use case,
238.14 -> and of course, customers like
the fact that they could have
240.66 -> a private link back to their
own VPC so they can keep
243.57 -> their Auth0 traffic off
the public internet.
248.04 -> - And I think, you know, that's a really,
249.757 -> you know, popular option, right?
251.73 -> I know a lot of people
do use the SaaS service,
254.04 -> but the private service is
really popular as well, right?
256.887 -> You know, hundreds or thousands
258.42 -> of these things can be
deployed very quickly.
260.58 -> - Yeah, it's really popular,
so obviously, that means it,
263.85 -> you know, it all has to be automated.
265.44 -> We've set it up so that new
environments can be provisioned
267.84 -> for customers on the fly.
269.982 -> This is a really simple
diagram of the architecture,
272.702 -> but the main thing for today is that
274.89 -> the Auth0 app components
are all containerized
277.77 -> and running on EKS when
customers choose AWS,
281.64 -> and for me and my team, that
means that we need to help
283.68 -> our developers keep
their containers secure.
286.14 -> - Yeah, and one thing I'll
say too, there's a great blog.
288.12 -> In the resources at the
end, there's a great blog
290.79 -> that the Auth0 team
has authored that talks
293.43 -> about this architecture in detail,
295.38 -> and how they automate all of this, right?
297.39 -> So there's a control plane
in the middle that, you know,
300.3 -> that handles all of these
different deployments,
302.13 -> and you know, anytime a
new customer signs up,
304.32 -> it's very, you know, you
fill out a form, basically,
306.57 -> and you can get one of these
things deployed very quickly.
309.029 -> So it's great from an operational
standpoint, but again,
312.12 -> obviously, we wanna make
that as simple as possible
315.24 -> for developers to handle
all of that as well.
317.97 -> Now, like I mentioned at the beginning,
319.95 -> I used to work at Docker,
and you know, very familiar
322.74 -> with a lot of people trying
to secure containers.
325.86 -> Everybody that's scanned
containers before has probably seen
329.4 -> the list of vulnerabilities
that comes up with containers.
332.4 -> A lot of folks, this was, you
know, two or three years ago.
335.01 -> Hopefully, most folks
have sort of advanced
338.9 -> their action plans, but a
lot of folks I still see have
341.91 -> a plan where they scan a
container, they check a box,
344.22 -> and say, "Okay, we scanned a
container, and we'll just hope
346.32 -> and pray that the next time
we build the container,
348.21 -> some of those
vulnerabilities will go away.
350.76 -> Maybe Docker or Red Hat
or AWS or whoever supplies
354.33 -> sort of our foundational
images will just fix things,
356.55 -> and they'll just magically
disappear the next time."
358.83 -> We build our image and we ship it off.
361.05 -> But I know at Auth0, you're a
little bit more sophisticated
363.6 -> than this plan, and you're doing more work
368.16 -> to sort of keep those
containerized apps secure,
370.8 -> so kind of explain a little bit
372.15 -> about how you go about it today.
375.24 -> - Sure, and it's not just about security.
377.58 -> We want to abstract away
complexity for our developers
380.31 -> anywhere we can, so to start with,
382.53 -> deployments are
abstractions on top of EKS.
385.62 -> Developers just modify the
deployment details in YAML,
388.71 -> and our control plane actually takes care
390.45 -> of everything else.
392.49 -> The Docker file details, Kubernetes, YAML,
394.597 -> Terraform configurations, et cetera,
396.69 -> all that is taken care of,
you know, under the hood.
399.854 -> - So you essentially have a platform team
402.72 -> that sort of takes care
of all this, right?
404.97 -> They set up all the
basics, and the templates,
407.91 -> and all this for happen, so it's automatic
410.94 -> for all the developers.
411.93 -> - Right, yeah, it's super awesome.
414.27 -> It's a one stop shop for anything
416.07 -> that we would have to do in the platform.
418.11 -> It makes it really easy to deploy
419.76 -> new environments or services.
425.19 -> And then, you know, we of
course, also need to simplify
427.29 -> and abstract security for our developers
429.175 -> and for our security team,
430.86 -> especially for
containerization at our scale.
434.19 -> I don't know if you've
ever heard the story
435.48 -> of how they maintain
the Golden Gate Bridge.
437.077 -> You know, they gotta keep
a coat of paint on it,
439.489 -> you know, to prevent
rust and stuff like that
441.72 -> from all the salt water that's around it.
444.6 -> I always thought it was true.
445.89 -> Somebody told it to me when I was a kid,
447.63 -> but it turns out that it's just a myth,
449.73 -> but I'll tell it anyways
because it fits in really well
452.01 -> with what we're talking about.
454.35 -> So they say that they start
painting from one side,
456.72 -> and then by the time they
get to the other side
458.58 -> of the bridge, they just turn
around and start painting
460.62 -> in the other direction.
462.27 -> So they're always playing catch up,
463.62 -> and it used to be a little
bit like that for us.
466.62 -> We were definitely doing
more than, you know,
468.45 -> what you described, but still,
470.28 -> a lot of the work had been manual,
471.69 -> and fell on the security team.
473.91 -> We would go through the scan results,
475.56 -> figure out if the container's
still running in production,
477.96 -> we'd assess the vulnerabilities,
figure out, you know,
480.42 -> how severe they are, and
if the vulnerabilities were
483.54 -> critical or high, and the
container was still running
485.94 -> in production, we'd cut tickets
for developers to fix it.
490.47 -> It still took a lot of
manual research and effort
492.57 -> for the developers, and of course,
494.34 -> you know, us as the security team,
495.57 -> we'd have to go back and
then verify those fixes.
501.42 -> So the goal is to have a
single-click PR that takes care
504.81 -> of all the critical and
high severity issues
506.67 -> in our most important images.
508.53 -> We want to automate as much as possible
510.66 -> so instead of playing catch
up, we can stay ahead.
513.6 -> We also need accurate reporting
for our compliance teams
516.27 -> and for our executives.
517.89 -> - Yeah, and I think reporting's
another one of those things
519.69 -> too where, you know, if
you're doing all this manual,
524.49 -> I imagine the reporting is terribly,
527.79 -> maybe just terrible is the way to say it.
530.31 -> - Yeah, well, just like everything else,
532.5 -> it was unfortunately, manual,
535.47 -> spreadsheets and screenshots, basically.
537.78 -> It took a lot of time
that was better spent
539.55 -> on more important things.
543 -> So here's where we are today.
544.86 -> Images are added to
Snyk when a new version
547.29 -> of the abstraction platform
release manifest is committed.
550.38 -> You know, Snyk obviously scans the images,
552.21 -> and then surfaces any vulnerabilities.
554.82 -> From there, the data about the images
556.68 -> and the vulnerabilities is pulled
557.94 -> into our centralized
asset management system.
564.15 -> Then we have a job that
enriches all the image
566.94 -> and vulnerability data in
the asset management system
569.19 -> with a bunch of, you know,
different information,
571.59 -> such as where the images are running,
573.93 -> and an image source code URL.
576.45 -> - So this is pretty complex, right?
578.58 -> Because I think what you're doing is
580.26 -> you're essentially looking
in your deployment pipelines,
582.51 -> you figure out where something's deployed,
584.61 -> the fact that it is actually
deployed and running right now,
587.43 -> and mapping that location as well,
590.4 -> and then also going the other
direction and mapping back
593.22 -> to the source code repo so
that you have that ability
595.95 -> to connect the dots from
sort of the source of this
598.65 -> to its destination, and
the fact that right now,
601.32 -> it's actually running,
602.43 -> and you can track down
all those code owners.
603.96 -> That's a pretty tough
and complex things to do.
606.72 -> - Right, so all this
information is available
609.63 -> in one way or another
in the control plane.
611.88 -> So the enrichment job just
asks the control plane
614.1 -> for all these bits and pieces,
and then it puts it together
616.47 -> in the asset management system.
618.48 -> And this makes it really
easy to, you know,
620.82 -> have just like, a view
of all of our images,
623.52 -> any vulnerabilities they might have,
625.05 -> what environments might be
affected, and who the owners are.
630.99 -> And then tickets are open
for the vulnerabilities using
633.33 -> the owners of the images GitHub repo
634.95 -> to find the right assignee,
636.63 -> and then the responsible team will patch
638.55 -> any of the vulnerabilities
that are assigned to them.
641.46 -> Once that is done, there is a reconciler
643.77 -> that will actually bump the image version
645.57 -> in the release manifest based
647.34 -> on the latest image
pushed to the registry.
649.2 -> So like as a developer, you know,
650.43 -> you fix something, you push a new image,
652.26 -> it will actually just update
that in the release manifest,
655.02 -> so then the next time a deploy runs,
657.09 -> the images are deployed,
and then the cycle will,
659.04 -> you know, start all over again.
660.21 -> They get added to Snyk, they get scanned,
662.1 -> and then, you know,
everything else happens.
665.97 -> So you know, we talked a little bit
667.47 -> about where we are today,
669.048 -> and now I wanna look a
little bit into the future.
672.48 -> So this is gonna sound pretty cliche,
674.43 -> but we want to add more scanning left,
676.89 -> during the build and the
promotion of prod stages.
679.77 -> This way, we are actually
preventing the vulnerable images
682.11 -> from being deployed in the
first place, not just, you know,
684.69 -> doing what the Golden Gate
Bridge folks are doing,
686.25 -> and you know, playing catch
up, and just being reactive
688.5 -> to what's already deployed in production.
691.05 -> So basically, we want to
fix the hole in the boat
692.76 -> instead of just bailing
out water, so to speak.
698.37 -> We already maintain a set of
preferred golden base images
701.37 -> for our developers.
702.48 -> These are, you know, transparent to them,
704.04 -> maintained behind the scenes.
705.6 -> They're part of that build
in like, the control plane
707.46 -> that we mentioned earlier.
709.17 -> This means that many
710.003 -> of the running images share common layers.
715.23 -> So ideally, we would like to automate
717.27 -> as much of the patching as
possible for simple changes
719.91 -> as we start seeing patterns
in the vulnerabilities,
722.76 -> such as, you know, bumping
a base image version
724.68 -> or a package version, for example.
727.2 -> We have a ton of image scan results,
728.91 -> but really we only have about
600 or so unique images,
732.21 -> and those results relate
to different older tags
734.97 -> for those image, and
then instead of seeing
737.34 -> the same vulnerability reported
738.87 -> over and over for every image,
740.37 -> we wanna identify these build
patterns and common layers.
746.7 -> Then we can fix the
issue in the shared base.
749.61 -> Automation can open a PR
to the proper repo based
752.61 -> on that image source code that
we mentioned earlier that,
755.31 -> you know, it's populated
by that enrichment job,
757.5 -> and then the tickets
can go from, you know,
759.27 -> asking the developers for a fix, like,
760.597 -> "Hey, you have this vulnerable
container, please fix it."
763.08 -> It'll be more of just
having them review a PR,
766.08 -> test the changes and then
merge it when they're ready.
770.4 -> - Yeah, and I think it's
a pretty common pattern.
773.19 -> I think I'm seeing it, you
know, emerge more and more
775.47 -> at customers where they do maintain
777.15 -> this set of internal base images.
778.65 -> It's one of the things
that Snyk's been working.
780.84 -> I know Okta's not using this yet,
782.94 -> but Snyk's been working on
as one of our new features is
785.28 -> that ability to map those custom
based images that you have,
789.3 -> and the idea there is,
you know, as you think
791.97 -> about these containers
being built up on top
795.27 -> of other containers, that we can identify
797.347 -> the previous container before it, right?
799.89 -> And also for each of these teams, right?
801.39 -> There's sort of this shared
responsibility model,
803.55 -> so you know, a lot of people
take something from Docker,
806.46 -> or maybe it's from Red Hat,
or maybe it's from AWS,
808.71 -> or whoever sort of your first vendor is,
812.37 -> but then they add their own
custom stuff on top of it,
814.88 -> common frameworks, common tools
for monitoring and security,
819.33 -> and things that go into every image,
821.1 -> and then the app team gets it.
822.45 -> Well, the app team's
responsibility isn't to worry
824.94 -> about any of that stuff
that comes before it, right?
827.55 -> That's the platform team's responsibility,
829.38 -> or Docker's responsibility,
or somebody else, right?
832.124 -> So why give them a report
with 1,000 vulnerabilities,
836.19 -> when 999 of those have nothing to do
839.22 -> with anything they put into the container,
841.65 -> and nothing that's their
responsibility to fix?
844.05 -> So with this model, essentially,
845.88 -> what the app team gets
is a report that says,
847.837 -> "Well, as long as you're on the most,
850.537 -> you know, recent version
852.15 -> of the internal base
image, you're doing okay.
854.58 -> If you're not, we'll give
you a fix PR to get you
856.8 -> to that most recent internal base image."
859.29 -> Now, if you add something
bad to that container,
861.9 -> then we're gonna tell you
about the thing that you added
864.03 -> to the container that has a vulnerability,
866.16 -> but as a developer,
867 -> that's what's your responsibility to fix.
869.85 -> Same for the platform team, right?
871.26 -> Just a step before, so
anything that they're adding,
873.39 -> all those common tools and things,
874.74 -> that's their responsibility
to stay up with.
877.5 -> And maybe they will find some
things in the Docker images
880.32 -> or the Red Hat images
that they want to adjust
882.775 -> or they want to fix,
but generally speaking,
885.54 -> most people will sort of
trust Red Hat or Docker
888.51 -> or whoever to maintain that,
and just grab a new image
891.15 -> as often so that they're
staying on top of those things
893.731 -> that are coming from those vendors.
895.68 -> But again, for the platform team,
896.88 -> it's important to know this
is the stuff you've added,
899.85 -> the stuff, you know, that comes
from Docker will identify,
902.28 -> so that's what Snyk is essentially doing.
903.84 -> Our container scanner as part
of its logic is building up
908.61 -> based on your own sort
of custom logic here.
911.1 -> Instead of saying, "There's
1,000 vulnerabilities,
913.5 -> good luck fixing them,"
and saying, you know,
915.963 -> "You built your containers this way,
918.87 -> and this is the part
that's your responsibility.
920.82 -> As long as the next step
921.87 -> before you is up to date, you're good,"
924.81 -> so it also helps with exactly
what you talked about,
928.83 -> where you've got the common base image.
931.02 -> If the problem is there,
932.73 -> then every image that a developer builds
935.31 -> that's on top of that is the same problem,
937.92 -> it's just repeated over and over again,
939.48 -> but we don't need to fix it 1,000 times.
941.13 -> We can fix it once,
and then scale that fix
943.29 -> across every image that every
app team has that's out there.
947.04 -> So we don't need to go
fussing with that full list
950.1 -> of vulnerabilities and trying
to triage that and figure out
953.43 -> which of the criticals
is the most critical
955.41 -> or the most important.
957 -> All we need to focus on
is that I'm following
959.31 -> those guardrails that are
coming from the platform
961.5 -> in the security team that are
handling those other images
965.76 -> that I'm building on top of,
966.87 -> and then take care of my
own stuff that goes into it,
968.91 -> including the application pieces, right?
970.62 -> Like, the container
vulnerabilities are important,
972.72 -> but really as a developer,
especially in your instance,
975.18 -> like, the developers
aren't doing a whole lot
976.83 -> with the container itself,
977.991 -> but the code that goes into the container,
980.94 -> the packages that are part of the code,
982.59 -> that's their responsibility as well,
984.06 -> so we wanna make sure that
they know about those things
986.025 -> in the container too.
988.8 -> The other thing you
mentioned, which is really,
990.435 -> you know, it's pretty advanced,
992.61 -> I think a lot of customers
are thinking this way,
994.41 -> but it's really hard to
do the work of connecting
997.08 -> all those dots, which
is the ability to say,
999.66 -> we've scanned a container,
we've scanned it, you know,
1003.62 -> maybe the developer scanned it,
and we've got those results,
1005.72 -> maybe we scan it in, you know,
1007.67 -> in a pipeline and we've got those results.
1009.32 -> Maybe it's built 15 times,
1010.79 -> but only one of those is deployed,
1012.83 -> and the thing that's deployed is the one
1014.96 -> that's the most important from
the operational standpoint
1018.2 -> and probably from a security standpoint,
1019.76 -> 'cause the other ones aren't
gonna be attacked, right?
1022.7 -> And so keeping track of that,
1025.07 -> knowing which ones are deployed,
1026.72 -> is really a complex task as well, right?
1028.7 -> And again, not just the container image,
1030.92 -> but knowing what code is
inside that container image,
1034.31 -> what development team owns that code
1036.53 -> and can go fix those things.
1038.45 -> Yes, I want to know that the
container has vulnerabilities,
1040.73 -> but I also want to know
like, if this app has
1042.83 -> a SQL injection vulnerability in it,
1044.69 -> or one of the packages that
my developers add suddenly has
1047.99 -> a zero day, I need to
know all those things too,
1050.96 -> and I need to know if those are deployed,
1052.85 -> and where they're deployed
1054.35 -> so that I can prioritize
things correctly, right?
1057.04 -> And so Snyk added a new capability.
1059.6 -> We just, I know it's another
one that Okta's not using,
1062.12 -> we just launched it like week last week,
1063.8 -> but it's called Insights,
and it provides exactly that.
1066.65 -> So I'm really excited to see
kind of what you've built,
1068.788 -> but also kind of excited to see, you know,
1072.26 -> how this kind of stacks
up, and if we can augment
1074.84 -> some of that data that you have,
1076.4 -> and kind of build this nice, you know,
1079.19 -> kind of application graph
so people can see from sort
1082.13 -> of source through pipelines
all the way into production,
1085.55 -> and then use that to prioritize
and go from, you know,
1088.82 -> hundreds of thousands or tens of thousands
1090.68 -> of vulnerabilities down
to just a small number
1093.26 -> that actually are impactful,
so really excited about that.
1098.21 -> All right, well, like I said,
1101.18 -> this is a short and sweet presentation.
1102.658 -> We really appreciate you joining us.
1104.75 -> Thank you, Zaher.
1105.59 -> Anything else you want to add?
1107.63 -> - [Zaher] No, I think you covered it all.
1108.74 -> - All right, fantastic.
1110.51 -> Well, again, we appreciate
everybody joining us.
1113.03 -> I mentioned a couple of
blogs that Auth0 has.