AWS re:Invent 2022 – Build & run it: Streamline DevOps capabilities with machine learning (DOP205)

AWS re:Invent 2022 – Build & run it: Streamline DevOps capabilities with machine learning (DOP205)


AWS re:Invent 2022 – Build & run it: Streamline DevOps capabilities with machine learning (DOP205)

While organizations have vastly improved how they deliver and operate software, development teams still run into issues when performing manual code reviews, looking for hard-to-find defects, and uncovering security-related problems. Developers have to keep up with multiple programming languages and frameworks and often have their productivity impaired when they have to search online for code snippets. Additionally, developers now require expertise in observability to successfully operate the applications they build. Join this session to learn how to use machine learning–powered tools like Amazon CodeWhisperer, Amazon CodeGuru, and Amazon DevOps Guru to boost your applications’ availability and write software faster and more reliably.

Learn more about AWS re:Invent at https://go.aws/3ikK4dD.

Subscribe:
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4

ABOUT AWS
Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.

AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

#reInvent2022 #AWSreInvent2022 #AWSEvents


Content

0.06 -> - You're probably already using some DevOps capabilities,
2.64 -> some DevOps practices in your teams,
6.33 -> and as you mature your processes and tools,
9.24 -> you start to look for means
10.44 -> to streamline your software deliver workflow,
13.29 -> such as improving your coding, productivity,
17.43 -> having better means to do code reveals,
20.07 -> or to operate applications at scale.
23.49 -> Today in this session we'll talk about how
25.65 -> to use artificial intelligence and machine learning,
28.92 -> ensure that software delivery workflow
31.11 -> from the moment you write the code of your applications
33.72 -> to when you operate them in production.
36.39 -> My name is Rafael Ramos.
37.38 -> I'm a specialty solutions architect on the developer
39.84 -> acceleration team on AWS.
42.782 -> - I'm Shivansh Singh,
43.615 -> I'm a principle partner solution architect.
46.68 -> - Hi, I'm Jared Reimer.
47.97 -> I'm the founder and president of Cascadeo.
50.64 -> We're an APN premier tier partner
52.65 -> and managed-services provider.
54.6 -> Gonna share some customer stories with these tools.
59.37 -> - Cool.
60.93 -> This is our plan for today.
62.73 -> We start with a quick recap
64.44 -> about the software deliver life cycle as it is as of today,
68.877 -> and some of the challenges we face when developing software.
72.57 -> Then we'll talk about artificial intelligence for DevOps
75.69 -> in different areas such as when you're writing code,
78.15 -> when you're doing code reviews,
79.65 -> when you're assessing the performance
81.15 -> of your applications in runtime,
83.61 -> and also when they're monitoring your applications
85.287 -> and cloud resources.
87.6 -> We are also going to have a live demo,
89.883 -> and Jared from Cascadeo is gonna talk
92.97 -> about his own journey
94.02 -> and how his team is operating applications
96.51 -> at scale in production.
101.91 -> All right, this is the typical software delivery life cycle
105.87 -> as is, as of today, right?
111.709 -> So we usually,
113.34 -> when you're practicing DevOps,
115.08 -> you have something similar to that
116.97 -> and usually you have it automated as a CICD pipeline,
120.63 -> so that when the developer writes codes
124.11 -> in their local environment,
125.58 -> someone, some peer might do some peer in review
128.61 -> and then this developer would push the code
130.853 -> into the (indistinct) into their repository, the main line,
137.55 -> and then a pipeline would kick it off
140.04 -> to build the application and test it,
142.26 -> and then deploy in your various environments,
144.06 -> and then you start to observe those applications,
146.46 -> so that you can keep improving them by fixing bugs,
150.18 -> enhancing the performance, or implementing new features.
153.39 -> But there are some things we are doing manually here
155.58 -> in this whole process that prevents us to be as agile.
159.6 -> For example,
161.28 -> when they are writing the code of your applications,
164.1 -> it's very common that you manually search online
166.35 -> for code snippets of a corner case
168.09 -> that you're implementing or maybe a programming language
170.97 -> that you're not very familiar with.
174.84 -> When you're doing code reviews,
177.36 -> also, you usually, you do that manually,
179.88 -> looking for bugs or inefficient code in your applications.
184.62 -> Another thing that we usually do manually
187.32 -> is looking for performance bottlenecks
189.33 -> into our applications running production.
191.67 -> You also have to monitor those applications,
193.68 -> so that you have means you troubleshoot them
196.38 -> when an incident occurs.
198 -> That involves collecting metrics, defining thresholds,
201.06 -> creating alarms, dashboards, and so on.
205.62 -> From all those improvement opportunities,
207.51 -> let's start talking from the beginning,
209.31 -> which is when you're actually
210.36 -> writing the code of your applications.
213.99 -> A big challenge,
215.01 -> most of us as developers face today
217.65 -> is dealing with multiple programming languages.
220.62 -> In one of the projects I worked for as a developer,
223.5 -> we had to refactor one of our main systems
225.99 -> and adopt a microservices based architecture,
229.02 -> and we wanted to use the right tool
230.49 -> for the right job there, right?
231.87 -> That's what we always strive for.
233.7 -> So some of those microservices we decided
236.31 -> to implement in Python,
237.72 -> whereas others we implemented in JavaScript
240.39 -> running on no JS.
243.39 -> But for most of my career over (indistinct),
245.58 -> I was a Java developer
247.47 -> and at that point I had to deal
249.03 -> with those multiple programming languages.
251.01 -> That was part of my daily life.
253.08 -> And if there are other Java developers here,
255.24 -> I think it would relate to that,
257.25 -> because many times when I would
259.71 -> implement something in Python,
261.39 -> I would then search online for,
262.77 -> hey, what is the Python-ic way to do that?
266.04 -> For instance,
266.873 -> what's the Python-ic way to apply function
268.56 -> to all the items in a function,
272.22 -> all the items in a list?
273.45 -> And another challenge we have is dealing
275.31 -> with multiple frameworks in libraries.
279.06 -> It could be that you're not very knowledgeable
280.77 -> about some of those frameworks or those libraries
282.903 -> that you have to use to build your applications,
285.51 -> but it could also be that you struggle
286.92 -> to remember certain features,
288.66 -> like a familiar lib that you use on a daily basis,
293.28 -> maybe because you're actually implementing a corner case.
296.277 -> And there is also a number of third party APIs
299.4 -> and cloud services that we have
301.95 -> to use to build our applications.
304.56 -> Like you may be using Amazon DynamoDB
307.5 -> to restore and retrieve data,
309.33 -> or it could also be that if your application
311.46 -> is using streams,
312.87 -> you are using Boto3 and Python 2
315.18 -> to interact with Amazon Kinesis.
317.91 -> Searching for this all the time is time consuming,
321.48 -> so why not having a buddy that would help you
324.54 -> with code recommendations to increase your productivity?
329.34 -> That's where Amazon CodeWhisperer comes into play.
332.67 -> This services a machine-learning powered coding companion
336.12 -> that we just announced this year in purview.
338.49 -> And the idea of CodeWhisperer is
339.86 -> to improve your development productivity
341.969 -> by generating code recommendations
344.34 -> based on the existing code you've got in your IDE
347.25 -> and the comments you write in plain English.
352.65 -> CodeWhisperer is powered by machine learning model
355.68 -> that we trained on billions of lines of code
358.32 -> and the model would generate code similar
360.33 -> to how you would write it on your own.
361.92 -> It's not a simple copy-paste from the web.
364.47 -> And the benefit of that is that you
366.087 -> get ready-to-use code that is already customized
368.97 -> for you based on your context.
372.9 -> CodeWhisperer understands natural language.
375.24 -> It means that you write in plain English
377.61 -> what you want to do and it generates entire blocks
380.61 -> of codes like function implementations.
383.67 -> In this case, you see on the picture, I wrote,
385.812 -> "Hey, write a function to upload a file to S3,"
389.37 -> and it generated all those lines at once
391.5 -> with the implementation I needed
395.76 -> This is how CodeWhisperer works:
398.07 -> you install it as a plug-in in your ID and activate it.
401.97 -> Then as you create new code
404.16 -> or write comments in plain English using natural language,
407.49 -> CodeWhisperer then sends you the backend,
409.86 -> your edits being the code you just created
412.65 -> or the comments you wrote,
414.6 -> and it also sends some contextual information
417.06 -> to the backend,
418.05 -> like the libs you have imported on that specific file
421.98 -> and also the versions that you're using
424.26 -> and some of the existing code you've got there.
428.31 -> On the backend then,
429.51 -> CodeWhisperer takes all that in consideration
431.85 -> to generate those coding recommendations
433.86 -> based on your context and based on your intent.
439.081 -> In terms of programming languages,
441.18 -> we already support Java, JavaScript, and Python,
444.39 -> and we just announced support for C# and TypeScript.
448.47 -> As far as IDs, we already support VS Code,
451.38 -> also the JetBrains family,
452.88 -> including PyCharm, IntellJ, WebStorm.
456.18 -> We also support AWS Cloud9,
458.19 -> and you can also benefit from CodeWhisperer
460.14 -> directly from AWS Lambda console
462.18 -> if you're editing code from there.
465.6 -> CodeWhisperer also comes with a reference tracker,
468 -> so that you have transparency about where exactly
470.55 -> that code came from.
472.65 -> The way the reference tracker works is once it
476.04 -> detects that a piece of code recommendation CodeWhisperer
479.34 -> is about to give you, is similar to training data
484.89 -> that we use to train our model,
486.93 -> it'll provide you with that reference.
489.78 -> With that reference tracker then,
491.25 -> the idea of that is you get informed
493.484 -> about whether you can have,
496.71 -> you can make informed decisions
498.3 -> whether you feel comfortable or not using code
500.85 -> from that specific origin on that license.
506.464 -> CodeWhisperer also comes with a security scan
508.62 -> that's powered by Amazon CodeGuru,
510.24 -> which we are gonna talk in a bit,
512.04 -> and the idea of that is,
513.6 -> in the ID itself,
515.76 -> you can get informed about possible security vulnerabilities
519.75 -> you may have in your code.
524.73 -> And this is how your workflow would look like
527.34 -> with CodeWhisperer acting as your AI-based coding companion.
533.79 -> Switching gears, now,
534.75 -> let's talk about how to improve the quality of your codes
537.27 -> and the performance of your applications.
542.67 -> Traditionally, code reviews may happen
544.869 -> in the local environment of the developer
547.38 -> and that's where everything works, right,
549.78 -> in your local environment,
551.22 -> or it can also happen somewhere
553.05 -> between the building test stages,
554.52 -> during the pipeline execution.
556.65 -> But it's very common,
557.76 -> especially if you're using a feature branch approach,
560.55 -> that your releases get blocked waiting
564.09 -> for peer code reviews.
568.2 -> Even though necessary,
569.58 -> code reviews are also time consuming,
572.04 -> especially for less experienced developers.
574.77 -> And even if you're a more tenured developer,
577.26 -> it's not easy to catch certain defects
579.9 -> like data corruption, threat concurrency,
582.96 -> resource leaks, and so on.
585.48 -> Keeping consistency of those code reviews
587.76 -> is also a challenge,
588.87 -> because some folks are ultra careful,
591.6 -> whereas others, maybe because of time constraints,
594.57 -> they'll just take a quick look.
599.37 -> Another challenge when doing code reviews
601.38 -> is to find performance bottlenecks, agents,
604.972 -> and ideally you want to evaluate the performance
607.89 -> of your applications as early as possible
610.11 -> in the development life cycle because,
612.78 -> it can reduce the customer escalations you may have,
616.08 -> but also potentially reduce the overall operational costs.
621.9 -> To help you with those challenges,
623.73 -> Amazon CodeGuru comes into play.
626.25 -> CodeGuru is yet another ML based static code analysis tool
630.42 -> that's tailored for Java and Python.
633.57 -> It helps you improve the quality of your codes
636.69 -> by identifying hard to find defects,
639.09 -> as well as critical security issues.
642.45 -> Because of the way CodeGuru machine learning model works,
645.66 -> it just doesn't point you to the defects
648.57 -> or to the possible improvement opportunities
650.88 -> you may have there.
652.05 -> It also gives you intelligent recommendations
654.33 -> about how you can fix those issues.
658.17 -> CodeGuru comes with two different functionalities.
660.58 -> The first one is called CodeGuru Reviewer,
663.54 -> and in a nutshell,
664.53 -> CodeGuru Reviewer is a functionality
666.15 -> that gives you automated code reviews
668.07 -> and static code analysis,
670.98 -> and here are some of the areas that CodeGuru
673.26 -> can help you with.
674.55 -> It helps to follow AWS best practices
677.58 -> and also some anti-patterns to avoid.
680.01 -> It also covers correct implementation
682.02 -> of concurrency constructs and resource handling,
685.2 -> and there is also a number of security issues
687.45 -> that CodeGuru can help you with,
689.1 -> like sensitive information leaks
690.87 -> or common web application breaches.
695.37 -> CodeGuru Profiler is the second functionality
697.77 -> of CodeGuru,
698.73 -> and what it does is to evaluate the performance
701.07 -> of your applications in run time
702.703 -> to identify the most expensive lines of code.
708.84 -> Those functionalities are applied in different stages
711.27 -> of your software delivery workflow.
713.58 -> The CodeGuru Profiler is applied when your application
716.974 -> is already deployed and running on your compute instances,
720.87 -> and CodeGuru Reviewer is applied on the left side
724.44 -> of the software delivery life cycle.
726.66 -> It can be applied directly,
728.01 -> you can use that directly on your ID
730.44 -> when you're developing code.
732.06 -> You can also integrate it as part of your port request.
735.734 -> You can also automate it as part of your CICD pipeline.
740.46 -> The way it works is
741.81 -> when the developer pushes code to the GitHub repository,
745.38 -> in this case here GitHub,
746.927 -> GitHub actions will pick it up, build the artifact,
750.12 -> and then upload to an S3 bucket the artifact
752.94 -> along with your source code.
755.16 -> And it's also going to invoke the CodeGuru service,
757.38 -> so that it can scan the code and the artifacts,
759.87 -> looking for defect and security vulnerabilities.
764.07 -> CodeGuru then provides you with recommendations
766.47 -> in case there are any,
768.24 -> and the developers can access those recommendations
770.342 -> from the CodeGuru console or also as comments
774.721 -> on the port request.
778.59 -> This is how your workflow would look like
780.6 -> with CodeGuru Reviewer helping you with code analysis
783.66 -> and CodeGuru Profiler helping you
785.85 -> to identify the most expensive lines of code.
791.25 -> Before jumping to our next topic,
793.56 -> let's have a demo showing how CodeWhisperer works.
809.58 -> Hope you can see back there.
812.64 -> All right, before this session started,
815.1 -> I bootstrapped in my AWS account a serverless application,
819.93 -> and what I want this application to do
821.85 -> is to get an image and from that image,
824.55 -> identify the labels on that,
826.47 -> meaning which objects are on that image,
830.28 -> and everything in this application is,
833.55 -> happens asynchronously.
834.75 -> So, it's based also on Lambda functions,
837.6 -> and the users,
838.62 -> they can interact with this application
840.45 -> in two different ways.
842.22 -> They can,
843.75 -> there is an API endpoint
845.64 -> where they can list the existing images,
847.71 -> the images that we processed,
849.84 -> and for each image which labels we found.
853.05 -> And also there is an endpoint where the users
856.35 -> can provide the image they want to process.
860.31 -> And the bulk of this implementation happens here
862.89 -> on the code of this Lambda function that I have here.
866.52 -> What I'll do to implement it,
868.56 -> I'm gonna use Amazon Rekognition to detect the labels.
872.61 -> I already initialize this one by importing some libraries
876.03 -> and also I initialize some variables here.
879.807 -> I'm getting also here the event that comes from SQS,
884.43 -> from Amazon SQS,
885.99 -> and from this event I get the bucket name and the key,
889.2 -> meaning where the image
890.19 -> that I want to process is stored,
893.19 -> and I'm gonna use CodeWhisperer
894.99 -> to implement this functionality.
899.31 -> In using natural language, I'm just gonna type,
902.555 -> (typing) detect labels on image using Rekognition.
910.62 -> That's how I'm telling CodeWhisperer what I want.
913.92 -> When I type that here,
917.22 -> CodeWhisperer understands that what I want
919.16 -> to do is a function implementation,
921.66 -> and it gives me some suggestions.
924.3 -> If I navigate using the left and right arrows here,
926.94 -> I can see the different suggestions it's giving me.
930.87 -> In the case of the second one, as you can see here,
933.09 -> it says reference code under MIT.
936.09 -> It means that CodeWhisperer identified that piece of code
939.63 -> is similar to the training data we use to train the model.
943.98 -> If I click here, I would see,
948.18 -> I would see more details about that,
949.65 -> like the license of the repo again,
951.21 -> and which repo this code came from.
955.11 -> And then you decide,
957.27 -> I'm good with using this code or not.
961.08 -> Let me come back here.
967.2 -> I'm actually gonna take this one here.
970.08 -> I'll hit TAB to accept it,
972.3 -> and this is what it does.
973.71 -> It's using Rekognition, Amazon Rekognition,
976.02 -> and it's detecting the labels here on,
977.85 -> that's from the image that's stored on this location.
984 -> All right.
985.47 -> I will then create a variable here called labels,
989.61 -> And I'm gonna,
991.62 -> yeah, detect labels.
993.3 -> I'm gonna invoke the method that I just created,
995.19 -> the function I just created.
996.66 -> The next thing I want to do is to restore the results
999.12 -> of this on Amazon DynamoDB,
1001.55 -> so that I can retrieve later, right?
1004.1 -> The user using one of those endpoints can retrieve.
1007.22 -> I would simply type,
1010.108 -> (typing) save Rekognition labels to Dynamo DB.
1018.14 -> Def, again.
1027.83 -> Yeah, here are the method implementations.
1029.87 -> If I navigate in this case, there are multiple here,
1033.71 -> but I will stick with this first one here.
1036.2 -> I'm just gonna tweak a little bit.
1038.06 -> I'm gonna type, I'm gonna use,
1039.74 -> call it DynamoDB,
1041.06 -> and I'm gonna call this index as image.
1045.291 -> CodeWhisperer kind of guessed what I wanted to do.
1047.81 -> So it already transformed the labels in string,
1050.78 -> so that I can store on DynamoDB.
1056.048 -> And, what I'll do,
1057.65 -> instead of storing all the labels, the whole labels object,
1062.12 -> I want just the names of each of those labels.
1065.27 -> So, I'll type here,
1068.294 -> (typing) get label names.
1078.74 -> In this case here,
1079.73 -> CodeWhisperer is obviously not generated an entire function,
1082.55 -> but just this one line here, doing what I want to do.
1086.33 -> It's creating a new collection containing all the names
1090.653 -> for each label in the label's collection.
1095.12 -> Now I'm gonna call that,
1096.5 -> save labels to DynamoDB,
1100.85 -> but I don't want the labels,
1102.08 -> I want the label names.
1110.96 -> Okay, the last thing I'll do in this implementation
1113.96 -> is I don't want this event,
1116.81 -> this SQS event to be processed multiple times.
1119.63 -> So since I already processed here,
1121.4 -> I'll delete this object from the queue.
1124.406 -> (typing)
1139.16 -> Again, some implementations here,
1141.92 -> I'll hit TAB to accept this one.
1145.04 -> I'm gonna call that function,
1147.956 -> (typing) delete message.
1151.58 -> Yes, so I have all the implementation I think I need,
1156.02 -> so I'm detecting the labels using Amazon Rekognition.
1158.75 -> I'm starting the results on DynamoDB.
1160.7 -> I'm deleting the message from SQS
1162.53 -> to avoid the application in the processing of that.
1166.76 -> I'm gonna do something here
1168.02 -> that you shouldn't do in production.
1169.73 -> I'm gonna copy this one and paste here
1173.09 -> on the Lambda console directly.
1175.19 -> Again, don't do that.
1176.18 -> You'll see ICD pipelines instead,
1178.73 -> and I'm gonna deploy my function here.
1185.15 -> Okay, the new code that CodeWhisperer created for me
1187.88 -> is deployed and to test it,
1190.122 -> I'm gonna use this image here.
1192.83 -> Yes, we have dogs.
1194.3 -> So this is Keef,
1195.41 -> my best friend who passed early this year,
1197.66 -> and I'm gonna use his image to test it.
1202.61 -> As I said, there are two different endpoints here.
1206.15 -> This first one,
1211.73 -> this first one would list all the objects I've got,
1216.74 -> all the images that I previously processed,
1219.621 -> and for each image, the labels.
1222.95 -> The system should be empty here.
1225.02 -> Yes, it's returning an empty array.
1229.055 -> Now I'm gonna invoke the other endpoint,
1232.04 -> and from here I'm passing Keef's image,
1235.28 -> and also the name that I want to give to this image.
1238.46 -> I'm gonna hit ENTER,
1241.1 -> and it's sending the image to be processed.
1245.84 -> Now I'm gonna call that again,
1249.83 -> and the image is here, Keef,
1252.17 -> and this is the labels that Amazon Rekognition recognized.
1257.42 -> Pug is here, no dog, pug, yeah, I'm not sure.
1260.09 -> I'm not sure why it's thinks my dog is a lion,
1262.67 -> but anyway.
1264.83 -> So, okay, yeah, that's it.
1267.92 -> That's how CodeWhisperer can help you
1269.84 -> to increase your productivity,
1271.31 -> so that you can stay focused on your ID
1275.09 -> and implement whatever functionality you need.
1278.57 -> Before jumping to,
1280.55 -> now I hand over to Shivansh,
1282.8 -> who's gonna talk about our next topics.
1284.603 -> Shivansh, can take over.
1286.73 -> The clicker is here.
1301.147 -> (whispering) Oh, sorry, yeah, push here, take.
1312.83 -> - All right,
1313.663 -> so let's take a look
1314.496 -> at the software development life cycle again.
1317.33 -> Now in this software development life cycle,
1319.67 -> as we saw previously,
1321.47 -> now you have all your code written
1323.69 -> in whatever your preferred language is,
1325.79 -> and hopefully it has already been reviewed,
1328.49 -> both by Amazon CodeGuru Reviewer,
1330.77 -> as well as some of your peers.
1332.9 -> Now you are ready to deploy your code
1335.3 -> in some of the different environments,
1337.16 -> like end-to-end environment,
1339.14 -> integration testing environment,
1341.09 -> some other pre-prod environments,
1342.86 -> and eventually to production end environments.
1345.98 -> Now, it's critically important to make sure
1349.22 -> that your code does not have any bottlenecks
1352.16 -> before it goes into production.
1355.37 -> You also wanna make sure that if it has any issues,
1358.43 -> you wanna catch it in those pre-production environments
1361.31 -> before it goes into production,
1363.14 -> and starts impacting your customers.
1366.2 -> So having a good observability for your application
1370.13 -> and the resources on which your application
1372.38 -> is being deployed is equally important.
1378.41 -> Now, raise your hand,
1379.67 -> how many of you
1380.57 -> are actually an application operations engineer,
1383 -> or you have been in the past, or a DevOps engineer?
1388.01 -> Yeah, yeah.
1389.18 -> There are several hands raised here, including myself,
1392 -> where I was working previously
1393.41 -> as an application operations engineer.
1395.6 -> And there are quite a number of challenges
1397.291 -> as you might have faced yourself
1399.083 -> in those different roles.
1401.42 -> So let's take a look at some of those challenges
1403.91 -> and how machine learning based service
1406.55 -> can actually help you.
1410.87 -> So, one of the biggest challenges
1413.84 -> is the large volume of data,
1415.64 -> which is coming from the disparate sources.
1419.12 -> You had to worry about all these volumes of metrics
1421.91 -> and logs coming across from different variety
1424.91 -> of applications.
1426.05 -> It could be distributed applications,
1427.94 -> it could be microservices,
1429.62 -> or you have multiple components of your application
1432.44 -> which are deployed across different resources.
1435.38 -> So, collecting all that information
1438.26 -> from all these applications is a huge pain,
1442.73 -> because of this disparate information,
1445.22 -> which is available across the different resources.
1451.52 -> Additionally, as the data is spread across resources,
1455.27 -> associating that data to the correct resource
1458.3 -> or the current issue which is going on,
1461.12 -> is actually an important challenge as well.
1464.39 -> The data association becomes very difficult
1467.81 -> when you have your application, which is distributed.
1471.08 -> And when you are having an issue,
1473.3 -> being able to correlate which data belongs
1476.27 -> to which application,
1478.01 -> and correlate those different logs and metrics together
1481.98 -> is a big challenge.
1488.39 -> And if not, that was not enough,
1491.39 -> as the application continued to update
1493.94 -> throughout its life cycle,
1496.16 -> you have to make sure that your application monitoring
1499.01 -> and the infrastructure monitoring is being updated as well.
1501.98 -> And you have to make sure that it is keep,
1503.81 -> you are keeping up to date with those application.
1507.05 -> And if you are introducing
1508.97 -> any new infrastructure resources,
1512.03 -> then you have to ensure that
1513.32 -> those new infrastructure resources
1515.578 -> are also being monitored.
1517.88 -> Therefore, it's a constant challenge
1520.04 -> to keep up the monitoring up to date
1522.89 -> with respect to the application
1524.36 -> and the infrastructure updates.
1529.67 -> Now raise your hand.
1530.51 -> How many of you think that there's too many alarms
1533.93 -> and notifications you receive?
1536.63 -> Yeah, yeah.
1537.463 -> I think it's part of life,
1538.7 -> as application operations engineer, right?
1541.46 -> There are too many tools available.
1543.11 -> There are too many number of alarms you have set up
1548.14 -> just to cover all the bases of your application,
1552.89 -> so that you can monitor and you don't miss anything.
1556.43 -> Now, this causes alarm fatigue
1558.44 -> and an inability to identify what matters the most.
1563.87 -> Alarms and notifications from multiple tools
1566.608 -> can cause alarm fatigue
1568.58 -> and your inability to actually differentiate
1572.27 -> between the root cause and the side effects.
1577.583 -> So, that needs to reduce the noise.
1581.21 -> Bring to surface the most critical issue
1583.67 -> that matters the most
1585.41 -> and help you identify the root cause quickly
1588.71 -> to reduce the MTTR
1590.78 -> is what Amazon DevOps Guru seeks to deliver.
1594.17 -> Amazon DevOps Guru is a fully-managed,
1596.39 -> machine-learning powered service
1598.46 -> that monitors your application automatically
1601.34 -> and improve the availability and reduces the downtime.
1608.48 -> So, how can you get started with Amazon DevOps Guru?
1612.23 -> There are a couple of ways you can do.
1614.6 -> You have to define an application boundary
1617.06 -> for your application,
1618.35 -> so that Amazon DevOps Guru
1619.97 -> can start collecting logs and metrics
1623.15 -> and show it together in a single dashboard.
1626.42 -> And you can do it either using tags,
1629.3 -> or you can define application boundary
1632.03 -> using AWS CloudFormation Stack.
1635.48 -> And if you want,
1636.41 -> you can also enable Amazon DevOps Guru
1638.57 -> to your entire AWS account,
1640.49 -> so that it starts collecting logs and metrics
1643.55 -> for all the resources you have deployed in your AWS account.
1651.08 -> Optionally, you can also connect Amazon DevOps Guru
1654.11 -> to an SNS topic or you can use EventBridge
1657.98 -> to communicate any insights to you.
1660.98 -> You can filter these insights by CloudFormation Stack,
1664.22 -> or you can do it through any AWS services as well.
1670.88 -> Now let's look at how DevOps Guru works behind the scenes.
1674.72 -> Now in this diagram,
1676.43 -> this is a architecture of a serverless application.
1680.15 -> You have Amazon API Gateway
1683.33 -> that invokes a a Lambda function
1684.74 -> to process the business logic.
1686.54 -> And then you have a DynamoDB table
1688.82 -> where you're storing the data of your application.
1693.62 -> After it's deployed,
1694.922 -> hopefully you have deployed using CloudFormation Stack,
1698.3 -> and you have,
1699.29 -> the next thing you have to do
1700.34 -> is define the application boundary for Amazon DevOps Guru.
1707.12 -> So let's say that someone accidentally
1709.07 -> updates the DynamoDB table to reduce its read capacity.
1714.74 -> At the same time, there is a surge application traffic.
1721.19 -> Eventually your DynamoDB table will be throttled,
1724.19 -> because it cannot keep up with the new number
1726.77 -> of re-requests coming from the users.
1733.1 -> So, in this hypothetical scenario,
1735.14 -> the admin or the operator
1736.91 -> would see all three components failing.
1739.13 -> The DynamoDB will be throttled,
1741.05 -> the Lambda function would have failed executions,
1744.26 -> and the API gateway endpoint would respond
1746.69 -> with 500 SGT errors.
1749.24 -> To resolve this issue,
1750.45 -> the application or admin or the operator would need
1754.88 -> to understand the root cause of the issue
1757.22 -> and what the side effects are.
1759.71 -> In this case,
1760.7 -> even though the Lambda function
1762.08 -> and the API gateway endpoint are both failing,
1765.26 -> they are just the side effects.
1767.45 -> The root cause is the DynamoDB table.
1771.65 -> And after the operator has resolved the incident,
1775.099 -> they have to spend time on postmortem
1778.88 -> to identify that the read capacity
1781.64 -> of the DynamoDB table was the root cause.
1788.45 -> This can take several hours
1790.25 -> or even days to get to the root cause.
1793.43 -> This is where DevOps Guru
1794.99 -> can help you with machine-learning based anomaly detectors.
1799.4 -> It monitors your application by ingesting operational data
1803.33 -> coming from Amazon CloudWatch and CloudTrail.
1807.35 -> In this example,
1808.46 -> DevOps Guru checks the relevant event on CloudTrail,
1812.12 -> and compares them with the operational metrics
1814.49 -> from CloudWatch.
1816.74 -> Based on this data, when it detects an anomaly,
1819.89 -> it creates an insight,
1821.93 -> which presents the relevant metrics,
1824.3 -> so the operator can quickly find what
1826.52 -> was the root cause of the incident.
1830.42 -> Taking it a step further,
1831.95 -> DevOps Guru can actually identify
1833.78 -> that someone has changed the DynamoDB table read capacity
1837.83 -> just before the anomaly was detected
1840.38 -> and recommend rolling back that particular change.
1844.37 -> So, it is not only pinpointing the root cause,
1847.64 -> but it is also providing you a recommendation
1850.31 -> to how to fix that issue.
1856.64 -> So there you have it, Amazon DevOps Guru,
1859.19 -> which is a fully machine-learning based service
1862.73 -> that helps you increase the availability
1865.19 -> of your application and reduce the downtime.
1868.31 -> It continuously analyzes the stream of disparate data
1871.94 -> and monitors relevant metrics
1873.83 -> to establish normal operating patterns.
1879.92 -> So, putting all the services into context
1882.35 -> of the entire development life cycle,
1885.44 -> you have Amazon CodeWhisperer,
1887.54 -> which helps you write code right into the IDE.
1891.53 -> You have Amazon CodeGuru Reviewer
1893.69 -> to help you with the static code analysis
1896 -> and find the most expensive lines of code.
1899.48 -> And for observability, you have Amazon DevOps Guru
1903.86 -> and Amazon CodeGuru Profiler,
1906.08 -> which can help you provide,
1908.222 -> observe your application at runtime
1910.91 -> and provide you with recommendations,
1913.285 -> how to fix those issues.
1916.76 -> Amazon DevOps Guru does not only allow,
1919.22 -> show you the reactive insights,
1921.23 -> but it also gives you proactive insights
1924.05 -> before the issue has occurred,
1926.06 -> so that you can reduce the downtime of your application
1929.18 -> and catch it early on,
1930.77 -> before it starts impacting your customer experience.
1937.37 -> Now let me hand it over to Jared who's gonna talk
1939.86 -> about how they're using Amazon DevOps Guru in production.
1943.58 -> Jared?
1945.59 -> - Thank you.
1947.03 -> So, my name's Jared Reimer.
1948.59 -> I'm the founder and president of Cascadeo.
1951.44 -> We have operations in Seattle and Manila.
1954.28 -> (microphone feedback)
1956.9 -> Wow, that'll wake you up.
1960.634 -> (indistinct) quote from several, many years ago,
1964.46 -> and it really stuck with me.
1965.66 -> So I've always worked in critical infrastructure operations,
1969.08 -> dating back to the bulletin board and dial-up ISP era,
1973.16 -> all the way up to today.
1975.17 -> And the quote at first seems a little bit odd.
1979.43 -> He said, you know,
1980.263 -> "We stopped hiring firefighters,
1981.59 -> because in the end some of them became arsonists."
1983.9 -> And I said, "Wait, wait a minute.
1984.92 -> I was the firefighter."
1986 -> I was the one who always would get up
1988.19 -> at 2:00 in the morning,
1989.023 -> fix the problem,
1990.02 -> I'd come into the office exhausted,
1991.67 -> I'd get a pat on the back,
1992.84 -> good job, you saved the day.
1995.3 -> I was perversely rewarded and incentivized
1999.2 -> for perpetuating things that were fundamentally broken,
2002.38 -> because I would just come in and fix them.
2004.45 -> And in the early days of operations,
2006.22 -> that was how the world worked, right?
2007.66 -> There really wasn't a better answer.
2009.07 -> We had monitoring systems that looked for faults.
2011.83 -> They looked for thresholds to be exceeded.
2014.11 -> When a threshold got exceeded,
2015.52 -> the disc was too full or the processor too busy,
2018.43 -> someone would get paged,
2019.63 -> someone would try to figure it out,
2021.04 -> and go in there and correct it.
2024.16 -> And it turns out that is entirely the wrong way
2026.62 -> to operate infrastructure in this day and age.
2028.93 -> Now, you may have legacy systems
2031 -> that you still have to operate that way.
2033.76 -> We would suggest that you modernize them.
2035.89 -> But for systems going forward,
2037.51 -> for things that are cloud platform native,
2039.34 -> for things that are built the right way
2041.35 -> on AWS platform services,
2042.618 -> there's a completely different way of doing this
2045.73 -> that does not involve firefighting.
2048.1 -> And that's what we're gonna talk about today.
2050.98 -> So, who are we?
2052.42 -> Cascadeo, as I mentioned,
2053.68 -> is a premier-tiered partner to AWS.
2055.66 -> We have been for a very long time.
2057.97 -> We also have a strategic collaboration agreement
2060.61 -> where we're developing our labor force
2062.38 -> and our customer base together with Amazon.
2064.75 -> They've been a fantastic partner and instrumental
2067.39 -> in the growth of the company we started 16 years ago.
2070.78 -> We're a managed-services provider.
2072.07 -> That's a huge part of our business.
2073.81 -> By this I mean, we
2075.117 -> operate other people's poorly implemented cloud deployments.
2079.845 -> Usually we inherit things that were done incorrectly,
2081.49 -> like lift and shift,
2082.55 -> move all the VMs,
2083.92 -> and get them to boot up.
2085.93 -> And we try to help customers iteratively improve
2088.51 -> and make decisions about where to invest.
2090.82 -> So, do you invest in modernization?
2093.4 -> Do you invest in deprecation, like getting rid of it?
2096.46 -> Do you leave it behind and not move it to the cloud?
2099.22 -> So lift and shift, just to be clear,
2101.35 -> is the last resort, not the first resort.
2103.883 -> You heard it here first,
2105.28 -> because many of the other companies in our field
2107.188 -> do that as a kind of quick and easy way
2109.99 -> to get the customer into the cloud
2111.97 -> without really thinking through
2113.59 -> what the downstream ramifications of that are gonna be.
2117.13 -> I'm particularly proud of the fact that we made it
2119.08 -> into the Gartner Magic Quadrant
2120.67 -> for public cloud IT transformation services,
2123.25 -> two years in a row.
2124.36 -> So, this is a new magic quadrant that Gartner put out.
2127.24 -> And we're a relatively small firm.
2129.58 -> Most of the other firms that are in that magic quadrant
2131.92 -> are orders of magnitude larger than we are.
2134.8 -> And the reason that we qualified really is
2137.41 -> because we focus on transformation, okay?
2139.54 -> So, we focus on helping companies build net new
2142.66 -> or refactor or modernize around AWS,
2146.23 -> instead of just moving VMs and hosting.
2150.4 -> Our company, as I mentioned, was based in Seattle.
2152.29 -> We have an awesome delivery org in Manila.
2154.51 -> It turns out the Philippines
2155.62 -> has some fantastic computer science departments,
2158.74 -> and so a huge percentage of our professional services
2161.647 -> and managed services teams are in the Philippines.
2165.1 -> I guess the last thing I'd say,
2166.238 -> our major client, Globe Telecom,
2168.31 -> which is the largest telco in the Philippines,
2170.26 -> became our major investor two years ago.
2172.21 -> So they are by far the largest AWS customer
2175.66 -> in the Philippines by a mile.
2177.79 -> They're an enormous consumer of all sorts of AWS services.
2182.2 -> We were brought in to help them with that
2183.838 -> because of our delivery capabilities there.
2186.22 -> And now they are our majority investor,
2188.2 -> so fantastic cool company,
2190.57 -> about 90 million subscribers,
2192.01 -> so, roughly Verizon scale.
2194.89 -> Okay, so what do we talk about transformation strategy?
2198.048 -> This is,
2198.881 -> this should be hopefully obvious to everyone in the room,
2200.89 -> because it seems like a lot of you have developer
2202.99 -> and operations backgrounds.
2205.03 -> First and most important,
2206.408 -> you want to build cloud net new and net native, right?
2209.23 -> You don't want to just lift and shift VMs,
2211 -> that's the last resort.
2212.74 -> You don't wanna do it with every application,
2214.45 -> because you can't afford to.
2215.74 -> It takes too much time and too much money.
2217.72 -> You can't afford to fix everything all at once.
2220.6 -> Sometimes it's not worth fixing.
2222.52 -> If there's a legacy app that's running just fine,
2225.76 -> maybe it's better to just leave it alone.
2228.04 -> And I know that is not a popular opinion,
2230.26 -> because a lot of people will say,
2231.347 -> "Oh no, you have to go all in, cloud first,
2233.74 -> cloud everything."
2235.06 -> I think that sort of misses the point.
2237.07 -> The point is to get business agility, okay?
2239.68 -> To make your business better able and equipped to deal
2242.965 -> with competitive changes, with economic changes,
2246.31 -> with disruptions, with geopolitical problems,
2250 -> with pandemics, with whatever happens next.
2252.82 -> So why do this?
2254.23 -> Because you get business agility,
2256.15 -> but you have to be focused about it.
2257.92 -> So, our idea is we're gonna work with a client
2260.74 -> to figure out what needs to be modernized,
2262.96 -> what needs to be fixed,
2264.01 -> and maybe we're gonna leave a lot of it behind.
2266.44 -> At the end of the day, what really matters is the data.
2269.77 -> The infrastructure is gonna
2271.09 -> become increasingly less interesting and important
2273.91 -> as we adopt platform services.
2276.04 -> And that is why the suite of tools that we just heard about
2279.37 -> is so important.
2280.66 -> Those tools are useful with legacy stuff.
2283.33 -> They are infinitely more powerful
2285.242 -> when you use them in conjunction with AWS platform services,
2288.91 -> and that's what I'm gonna share with you next.
2291.97 -> So, we have MSP customers that sign up a lot of MSPs.
2296.2 -> It takes 60 days to get them onboarded.
2298.27 -> We can get them on in a single day.
2300.13 -> How do we do this?
2301.36 -> We have an API-integration with DevOps Guru.
2303.88 -> If the customer can sense, which they almost always do,
2306.76 -> we can click through within a matter of minutes,
2309.34 -> activate DevOps Guru,
2310.99 -> and within 24 hours,
2312.34 -> we start receiving those insights
2314.11 -> into our platform, which we call cascadeo.io.
2317.44 -> This immediately gives our managed services team
2320.56 -> something to work with,
2321.82 -> even if the customer has no monitoring
2323.95 -> or a bunch of legacy Nagios implementations
2327.01 -> that have been running for 10 years unattended,
2329.2 -> or they wait for someone to call and say that it's broken.
2332.219 -> We can do this in under a day,
2334.24 -> and sometimes we can do it in under an hour,
2336.307 -> because the API integrations are so tight
2339.7 -> and literally within a matter of minutes,
2341.95 -> we can start to get results out of it.
2345.04 -> We have diverse customers of all sizes and shapes.
2348.04 -> One of them is a large water utility company.
2350.11 -> One of them does ERP systems,
2351.79 -> one of them is a giant telco.
2353.41 -> And then we ourselves, as it turns out,
2355.42 -> are also a major customer of these tools,
2357.52 -> because we develop a platform called cascadeo.io
2360.7 -> that we use to operate other people's cloud deployments.
2364.3 -> So, I'm gonna show you first
2366.52 -> major enterprise customer production service incident
2369.13 -> with DevOps Guru.
2371.05 -> This is a critical authentication, authorization,
2373.96 -> and accounting system
2374.95 -> that serves a very, very large number of customers.
2378.19 -> As you can see clearly in the graph,
2380.14 -> there was a period of great instability
2383.344 -> and a period of sort of moderate instability around that.
2387.79 -> Now, normally some threshold would get breached
2390.37 -> or some customers would scream,
2391.9 -> someone would get paged outta bed,
2393.55 -> and the typical triage process begins.
2397.27 -> It's been demonstrated over the years
2399.04 -> that 80% of most service incidents and outages,
2402.28 -> 80% is getting signed in,
2405.58 -> figuring out what's wrong,
2407.08 -> correlating the different signals,
2408.76 -> what's the root cause,
2409.87 -> what's the right course of action?
2411.67 -> 20% is remediation.
2413.92 -> So, if you can cram that 80% down,
2416.2 -> if you can reduce that,
2417.7 -> you can radically reduce the amount of time
2420.82 -> that the service is impaired or down.
2424.03 -> So DevOps Guru allows you to become almost psychic.
2429.421 -> In this case, very early on, DevOps Guru said,
2433.157 -> "Hey, there's something weird with your Nat Gateway.
2435.58 -> For some reason,
2436.54 -> the number of active connections is anomalous."
2439.39 -> Notice it didn't say it exceeded a threshold.
2442.39 -> It didn't say it's too high or too low.
2444.67 -> It said it's different.
2446.08 -> It's different from the way that it normally behaves.
2449.26 -> But where it gets really powerful
2451.99 -> is when you start to correlate different signals,
2454.93 -> so that you can dive deep and do it fast.
2457.36 -> So when you are under the gun,
2458.71 -> and many of you said you've worked
2460.09 -> in infrastructure operations,
2461.65 -> you know what that's like.
2462.76 -> People are upset.
2463.72 -> People are, you're losing money.
2465.4 -> You have to figure out very quickly what
2467.89 -> are all the things that are broken
2469.36 -> and what are the relationships between them,
2471.37 -> and figure out what the root cause is,
2473.2 -> rather than downstream symptoms.
2475 -> So one of the most amazing things about DevOps Guru
2478.21 -> is that it does this automatically,
2479.95 -> so that the minute you sign in,
2481.81 -> you get a visual representation
2483.73 -> showing you the different services,
2485.32 -> when they were impacted,
2486.4 -> and the apparent relationships between them.
2489.34 -> You can see here that there were huge numbers
2491.83 -> of problems of various forms with an elastic load balancer,
2496.06 -> and you can also see that they tended to cluster
2498.19 -> around the same time windows.
2500.74 -> These are the kinds of things
2502.45 -> that normally a human would be doing,
2504.46 -> maybe in a post-mortem or maybe under the gun
2506.65 -> when you're trying to figure out what the heck is going on.
2509.17 -> All of this happens instantaneously.
2511.87 -> So, when we have a managed-services customer
2514.42 -> and there's a fault, when our operators log in,
2517.42 -> they're not kind of fumbling around in the dark,
2519.55 -> because it's someone else's environment.
2521.59 -> They get this level of correlation and drill down
2525.28 -> and kind of visualization
2526.96 -> of what's going on instantaneously.
2530.62 -> You also get recommendations.
2532.72 -> So not only does it tell you what's broken,
2535.45 -> when it was broken,
2536.29 -> what it's correlated with,
2537.52 -> it actually tells you how to fix it.
2540.16 -> In many cases, it'll say,
2541.45 -> look, if you fix this problem with your ELB,
2545.23 -> that will solve the root of the of the issue, right?
2548.71 -> There's also guides on troubleshooting 500 errors
2552.46 -> in LBS and latency issues in ELBs.
2555.34 -> They have runbooks that are built right into it.
2557.71 -> So even if you yourself don't have a runbook recipe
2560.44 -> for this problem,
2561.52 -> they give you one that your engineer, your operator,
2564.01 -> or your developer can run with right out of the gate.
2568.24 -> Next thing I'm gonna talk about is our own platform.
2570.28 -> This is software we have developed in-house
2572.5 -> over the last four or five years.
2574.51 -> It's a large scale SaaS platform
2576.55 -> that our entire company runs on.
2578.68 -> It's built entirely on AWS platform tech
2581.74 -> with the sole exception being influx data,
2584.38 -> which we use
2585.213 -> for very large scale time series data analytics.
2588.34 -> So the SaaS platform for us
2590.47 -> is how we operate all of these customers, okay?
2593.23 -> Because otherwise every customer is a one-off.
2596.32 -> We, as an experiment,
2597.64 -> turned DevOps Guru on in our production environment
2601.364 -> and instantly learned more about our application
2605.093 -> and ways that it was underperforming or suboptimal,
2609.82 -> and it's scaling.
2610.78 -> We got a whole list of things and I thought,
2613.33 -> well, those can't all be right.
2614.71 -> And so we gave this to the developers and sure enough,
2617.38 -> these were really useful insights.
2619.27 -> These are reactive insights,
2620.62 -> meaning after the fact.
2622.3 -> But what's even more amazing
2623.92 -> is that you get proactive insights.
2625.99 -> The proactive insights tell you what's going to break next.
2629.8 -> They anticipate faults like,
2631.72 -> hey, you're trending dangerously close to a threshold,
2634.36 -> or the load on this thing is creeping up,
2636.97 -> and instead of just a reactive,
2638.663 -> oh, it's broken,
2640.18 -> you actually get an alert that something is amiss.
2642.94 -> So, maybe someone should take a look before it breaks.
2646.06 -> This is truly transformative.
2648.13 -> It is also how the world's most sophisticated companies
2651.1 -> operate their infrastructure.
2652.72 -> If you wonder how Netflix achieves the degree
2655.81 -> of uptime that they do,
2657.37 -> it is because of predictive analytics.
2659.53 -> It's not just monitoring, alerting, and incident response.
2662.89 -> It's because they're able to get ahead of trouble,
2665.17 -> because they can predict the future based on past behavior.
2668.38 -> DevOps Guru does this for you, right outta the box.
2671.426 -> All you have to do is turn it on
2673.6 -> and you will start to get these insights.
2677.293 -> This was one that we found
2678.97 -> in our own production service platform.
2682 -> A lot of Lambda functions were taking too long,
2684.55 -> and we did not know this because it wasn't really broken.
2687.22 -> The app seemed to work, people weren't complaining,
2689.8 -> but sometimes it would fail in a way
2691.48 -> that was customer-facing, but not all the time.
2693.94 -> And it was, it's hard to debug those, right?
2695.59 -> It's hard to troubleshoot and understand why it
2697.75 -> works some of the time and not all of the time.
2699.79 -> Here, right outta the gate,
2700.96 -> it shows you exactly what the issue is,
2703.39 -> what its severity is, when it started and when it stopped,
2706.39 -> and what other signals correlated with it.
2710.95 -> Our platform delivers these insights to the customer
2715.15 -> in whatever venue they want.
2716.92 -> So on the left,
2717.753 -> you can see a Zendesk ticket
2719.98 -> that goes to our operation center.
2721.75 -> That's the complete insight from DevOps Guru,
2724.36 -> where our ops team will take that
2725.89 -> and start trying to debug and troubleshoot and triage.
2728.59 -> We might have to escalate to the customer,
2730.48 -> but at least we're coming in with something
2732.16 -> instead of just, it's broken.
2734.23 -> And on the right you can see Slack notification,
2736.54 -> because we do Slack chat ops with customers
2739.27 -> who are friendly to that.
2740.631 -> We also have teams in Office 365 for failover scenarios
2744.37 -> in case those primary systems for interfacing
2746.41 -> with customers go down.
2749.26 -> One thing that just came out
2750.61 -> that I think is really, really important,
2752.669 -> DevOps Guru now ties directly to EventBridge,
2757.06 -> which lets you do things like route these insights
2759.85 -> to Lambda functions that can fix things
2762.28 -> or route them to something like SNS and Chatbot,
2765.85 -> so that they can be delivered to Slack.
2767.95 -> And you might be thinking, well, wait a minute,
2769.54 -> you just showed me how you deliver to Slack.
2771.85 -> That is our software and our platform.
2774.13 -> What if our platform is broken?
2776.8 -> The worst thing you can do
2778.81 -> is have your monitoring system monitoring itself,
2781.72 -> because at some point that will go horribly wrong, okay?
2785.41 -> So in-band monitoring of your own system with itself
2789.61 -> will work most of the time right up until it doesn't.
2792.49 -> This allows you to create completely out-of-band,
2795.46 -> a neutral witness that is studying the application
2797.86 -> of your infrastructure,
2799.06 -> generating these insights,
2800.47 -> and delivering them to you,
2801.94 -> in whatever venue and format helps you the most.
2805.87 -> So, this is production operations for cascadeo.io SaaS.
2810.49 -> These insights come in through Slack, out-of-band,
2814.66 -> outside of our system,
2816.205 -> go to the ops team for triage and for escalation,
2820.3 -> and they actually provide a (indistinct)
2822.37 -> that lets you click through to drill immediately down
2825.31 -> into the remediation instructions,
2827.83 -> the correlations, all of that work.
2829.6 -> So, inband monitoring is not inherently worrying,
2832.84 -> but unsatisfactory things can and will happen.
2836.02 -> That's putting it very mildly.
2837.64 -> Don't count on your monitoring system
2839.68 -> to monitor its own health.
2842.775 -> Another thing,
2843.74 -> DevOps Guru team chose to focus on AWS platform services.
2849.13 -> They really spent a lot of time tuning this
2853.39 -> and optimizing it,
2855.37 -> not for the simple case of a single EC2 instance
2858.85 -> that's running Windows
2859.75 -> and running some old application, okay?
2861.82 -> That's useful, but it's not really the future.
2864.97 -> The integration with higher order complex AWS services,
2869.77 -> Lambda, Dynamo, queuing services, notification services,
2873.79 -> that is amazingly powerful,
2875.68 -> because it can be hard to troubleshoot these things
2877.69 -> when somebody else owns and operates most
2879.52 -> of the infrastructure.
2880.72 -> So, we think the serverless future is bright.
2883.72 -> We are a huge proponents of that.
2885.22 -> We try to get customers to adopt serverless
2887.2 -> as much as we can.
2888.55 -> And DevOps Guru is a fantastic companion to this,
2892.03 -> because they've done so much work
2893.98 -> integrating those two products.
2896.83 -> So, quick summary, this is not an exaggeration.
2900.19 -> DevOps Guru has revolutionized the way we do business.
2903.55 -> It helps our professional services customers
2906.16 -> know what to do next.
2907.45 -> It helps managed services customers
2909.55 -> know what to fix, when to fix it, how to fix it,
2911.95 -> what's gonna break next.
2913.57 -> It's used to monitor our own infrastructure
2915.76 -> and our own business operations.
2917.71 -> And here's the secret,
2919.09 -> it's remarkably inexpensive.
2921.64 -> It's very, very, very inexpensive
2923.83 -> for what you get out of it.
2925.21 -> So, there really isn't any reason not to try it.
2927.52 -> It's out-of-band. It doesn't break anything.
2929.56 -> And if you don't like it, you can turn it off.
2931.66 -> But you will find, I think,
2933.28 -> it costs very, very little
2934.81 -> and it helps immensely, in every possible regard.
2938.62 -> So I would like to thank you all for hearing our story,
2941.98 -> and I'm gonna hand it back for the conclusion.
2945.464 -> (audience applauding)
2954.517 -> - All right, as Shivansh already showed,
2957.43 -> this is how your software delivery life like
2960.43 -> with AIML-based tools
2962.11 -> to streamline your DevOps capabilities.
2966.13 -> Those tools would help you with things
2968.14 -> like context-based code recommendations,
2970.783 -> detecting hard-to-find defects,
2972.97 -> and also optimizing the code.
2974.74 -> And they are even able to identify operational anomalies
2978.37 -> in your applications,
2979.6 -> simply based on the way those applications
2981.7 -> behave in production.
2983.38 -> And all that comes with intelligent recommendations,
2986.8 -> so that you immediately know how to act.
2991.78 -> What's next?
2992.92 -> You can get started with Amazon CodeWhisperer
2995.68 -> and sign up for the free preview, as of today.
2998.86 -> Also, I would suggest that you identify a workload
3001.95 -> and enable DevOps Guru for that.
3004.38 -> You have a free tier,
3006.12 -> so that you can get a taste of what the service
3008.58 -> can also help you with
3010.11 -> and what kind of trouble it can identify.
3013.05 -> And I also suggest you to get started with Amazon CodeGuru.
3015.48 -> It also helps,
3017.82 -> it also has a free tier.
3020.04 -> There is a number of sessions going on
3022.41 -> during this reinvent.
3023.73 -> For instance, tomorrow and on Thursday,
3025.83 -> there will be a builder session about CodeWhisperer,
3027.99 -> so that you can get hands-on with that already.
3031.65 -> There is also a workshop tomorrow about DevOps Guru
3034.86 -> and AI ops, in general,
3036.12 -> so that you can also experiment
3037.86 -> and get your hands started with a service.

Source: https://www.youtube.com/watch?v=LJ3h37ThT3k