AWS Innovate 2022 - Ensure Reliability and Uptime with Observability Solutions | AWS Events

AWS Innovate 2022 - Ensure Reliability and Uptime with Observability Solutions | AWS Events


AWS Innovate 2022 - Ensure Reliability and Uptime with Observability Solutions | AWS Events

Running a business in a 24-by-7 world requires you have an observability solution for finding and fixing application and system issues quickly. With the growth of distributed systems running in the cloud, the need for a comprehensive open source observability solution is more acute. In this video, learn how you can use AWS Analytics and observability solutions, including machine learning services, to detect and resolve anomalies, and deliver exceptional customer experiences and service availability.

Learn more at: https://go.aws/3smA0Tm

Subscribe:
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4

ABOUT AWS
Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.

AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

#OpenSearchService #AWS #AmazonWebServices #CloudComputing


Content

0 -> (cheerful, electronic music)
9.38 -> - Hi everybody, my name is John Handler.
11.44 -> I'm a solutions architect with AWS
13.46 -> and I cover our OpenSearch and service
18 -> and the OpenSource project.
20.13 -> Today we're gonna talk about reliability and uptime
23.96 -> and how to build observability with various solutions.
28.66 -> So let's start out by talking about what is observability.
33.68 -> So observability is collecting,
36.92 -> analyzing data from applications
41.23 -> so that you can understand what the issues are,
44.8 -> get alerts for those issues,
46.9 -> troubleshoot those issues and resolve those issues.
50.34 -> Observability really refers to the collection
53.66 -> of instrumentation and other metric data
57.25 -> around your application
58.73 -> that enables you to solve these issues.
63.84 -> So why do we really need observability?
66.36 -> Well, anytime we're building some kind of software solution,
70.39 -> there are various problems that you're gonna have.
73.78 -> There's denial of service, service outages,
77.23 -> cost overruns, component dependencies,
79.74 -> networking connectivity, all kinds of challenges
83.62 -> that you're gonna face.
85.42 -> Ultimately, those are gonna lead to downtime
88.67 -> for your application, for your end-users,
91.59 -> for your customers and really
94.15 -> there's a cost to that to your business.
96.94 -> Not only is there a financial cost,
99.64 -> which can be as much as 80 to $100,000 a day,
104.14 -> there's also a cost in fatigue for your developers,
109.71 -> in loss of trust with your customers,
113.52 -> and eventually, this downtime
116.31 -> is gonna lead to poor business outcomes.
120.53 -> So what are the goals for doing observability?
124.797 -> You really want to improve the overall end-user experience
131.03 -> for your applications and services.
133.49 -> And observability is the suite of tools
136.28 -> and the data you use to provide that better experience.
144.01 -> Getting specific, observability really rests
147.14 -> on three different kinds of information.
150.96 -> The first information is log information.
154.9 -> Anybody who's ever done any debugging knows
157.95 -> that when something goes wrong,
159.9 -> the first thing you need to do is go look at the logs.
163.09 -> The logs have the information that lets you know
167.11 -> what happened during the execution of the application
171.01 -> that ended up causing the problem.
174.13 -> The second kind of data is metrics.
176.78 -> Metrics give you the ability to monitor what's going on,
180.84 -> how your application is performing, and with trend analysis,
184.61 -> or to be able to predict and find trends
187.98 -> that will let you know
188.87 -> that you're headed in a poor direction.
192.13 -> Traces are logs that are produced from your application
197.75 -> that capture the course of the processing
201.5 -> of the single event through your system.
204.56 -> And with these three kinds of information,
208.12 -> you can go in and figure out exactly what's going on.
212.48 -> We have (mutters) services
214.25 -> that address these various concerns
218.24 -> and we're gonna go into depth on all of them.
224.69 -> You really want to, ultimately,
227.73 -> influence and improve both your operational
230.7 -> and your business outcomes.
232.78 -> So on the operational side,
234.74 -> by employing these tools to look at the information,
239.53 -> you'll achieve better visibility into
243.01 -> the underlying functioning of your application.
249.8 -> This is gonna enable you to troubleshoot problems
253.89 -> in near realtime.
256 -> Many of these systems, most of these systems,
257.93 -> can flow data in in near realtime,
260.38 -> they can alert you when problems are occurring
263.69 -> or when problems, hopefully, are about to occur,
267.8 -> and you can go in and use these tools
270.97 -> to look at your log data, to look at your traces,
273.67 -> to look at your timing and to figure out
276.24 -> what it is that you need to fix.
279.62 -> On the business side, observability is gonna give you
283.3 -> a more resilient application.
285.53 -> By using these tools,
287.82 -> you'll be able to build out your applications
290.1 -> in a better way and keep them up and running longer.
296.18 -> End-goal, as I said, is to improve customer experience.
300.26 -> So your application is serving
302 -> whatever business need you have
303.95 -> and keeping your customers happy
306.26 -> by running a low-latency, low error-rate,
309.73 -> seamless, kind of application will, ultimately,
313.38 -> give your customers the best experience.
318.38 -> We have a number of different workloads and use cases
321.75 -> where observability is really emerging
323.82 -> as a very important tool.
326.33 -> Microservices and containers,
328.84 -> and especially the move to a service-oriented architecture,
332.05 -> is driving a lot of need to understand
335.82 -> how your application is performing
337.48 -> and how your services are performing
339.74 -> within that application.
341.22 -> So we see a lot of usage in the microservices architecture.
347.47 -> We also see a lot of use cases
349.16 -> around digital experience monitoring,
351.03 -> so when you have a web or a mobile application
355.64 -> that is providing some service to your end-users,
359.3 -> you wanna be able to dig in and understand
362.15 -> what are the dependencies and how is it working.
367.02 -> On the operational side, and of course,
368.6 -> observability really flows out of the dev ops,
371.53 -> kind of, methodology.
373.31 -> So when we're looking at an operational situation,
376.76 -> observability is really gonna, again,
378.21 -> help you dig in, troubleshoot, and fix issues.
382.77 -> And finally, looking at data lakes
385.44 -> enables you to understand how your information
388.18 -> is flowing in and how it's related to one another.
395.92 -> So, on the mechanics, with observability, it enables you,
402.81 -> again, to detect when either issues are happening
407.06 -> or issues are about to happen so that you can investigate,
412.86 -> and investigate, here, is a kind of broader bucket,
417.63 -> where even if it's not a realtime problem that's happening,
421.93 -> you still have the ability to dig in
424.67 -> and figure out, over the past, what has happened
429.27 -> in order to remedy and improve your software.
434.44 -> And finally, again, remediating and improving the software
437.94 -> is one of the ultimate goals.
440.56 -> So to be able to remediate issues
443.9 -> and bring back that high quality user experience.
451.53 -> So this is a complicated question,
454.09 -> is there one observability tool to rule them all?
457.96 -> Sadly, the answer is no.
460.37 -> At the moment, this is an emerging segment
463.19 -> and we're seeing many different technologies
466 -> that enable this observability, kind of, use case
470.5 -> where there're competing possibilities,
474.09 -> a lot of different, you know,
476.16 -> both AWS services and open source technologies
479.57 -> that give you the fundamental capabilities
481.94 -> that you're looking for,
483.76 -> but you need a way to pick a single choice to go forward.
490.17 -> We're gonna review all of the different tools
493 -> that are out there and try and highlight
494.62 -> some of the pros and cons to help you make that decision.
501.25 -> At a high level,
502.14 -> what you need to accomplish with observability
505.21 -> is really an end-to-end pipeline
508.48 -> that's going to start out with instrumentation
512.049 -> to pull or collect the logs, metrics and traces
518.048 -> into a single stream.
520.82 -> That stream, you're gonna ingest that stream
524.1 -> into some kind of collection and storage layer
527.08 -> and then that storage layer can employ
530.38 -> monitoring and alerting for realtime interaction
535.07 -> with the underlying data,
537.32 -> but you also wanna be able to search and index that data
540.89 -> in order to go and find the problems that you're having.
544.01 -> And then, finally, we end up with a visualization layer
548.35 -> where you can build dashboards
550.55 -> that enable you to correlate data
552.14 -> that's coming in through this pipeline
554.58 -> and also figure out what's going on.
558.05 -> Peeling back the onion just a tiny bit,
560.61 -> we have a number of our different technologies here.
564.64 -> So on the instrumentation side,
566.85 -> we have our CloudWatch agent, we have an X-Ray agent,
570.46 -> we have AWS Distro for open telemetry.
573.86 -> We also have Fluentbit that enables you, again,
578.96 -> to pull in those data sources and start to work with them.
584.74 -> Of course, if we have data in CloudWatch,
587.17 -> if you have data in CloudWatch,
588.53 -> there's Amazon CloudWatch service lens
591.57 -> that enables you to get container insights,
594.95 -> Lambda insights, a bunch of contributor insights,
599.04 -> and other synthetic kind of data.
602.45 -> In the collection and storage layer,
604.23 -> we have CloudWatch metrics, CloudWatch logs,
607.69 -> X-Ray, Prometheus, and manage-service for Prometheus,
612.51 -> as well as Amazon OpenSearch service.
614.3 -> So these are, kind of, the containers
616.42 -> where you can flow this data.
619.03 -> On the dash-boarding side, then,
621.07 -> we have Grafana and Amazon managed Grafana,
624.45 -> as well as OpenSearch dashboards
626.45 -> that enable you to build visualizations
628.95 -> and dig in with service maps and trace graphs
633.94 -> and the more richer kind of experiences around this data.
640.22 -> So let's talk through some of the opensource alternatives.
644.57 -> First of these we're gonna talk about is OpenSearch
646.61 -> and OpenSearch dashboards.
649.686 -> OpenSearch is a community-driven, open-source,
653.18 -> search and analytic suite.
654.71 -> It's derived from Apache 2.0
657.48 -> licensed Elasticsearch 7.10.2 and Kibana.
662.93 -> The OpenSearch project is really the way that we are
668.03 -> carrying forward that core search and analytics capability
672.81 -> into the open-source world.
676.95 -> The project itself consists of OpenSearch,
680.09 -> which is a search engine and visualization engine,
685.11 -> as well as OpenSearch dashboards,
686.97 -> which is a UI and visualization layer,
690.11 -> and includes plugins that enable things like alerting,
693.96 -> anomaly detection, deep level security,
698.61 -> a bunch of different functionality
700.38 -> that's provide by those plugins.
703.3 -> It enables you to ingest your data,
707.17 -> you can secure it, and analyze it with aggregations,
711.49 -> view it, and bring all the logs,
714.33 -> metrics and traces into one place.
718.03 -> It does have an observability suite that's built into it,
721.6 -> along with anomaly detection and alerting, as I mentioned,
725.24 -> and this enables you to identify
727.43 -> and react to issues in your solutions.
732.3 -> Prometheus, also community-driven, open-source.
735.81 -> It's a systems monitoring and alerting toolkit.
740.37 -> It collects and stores metrics, primarily,
743.64 -> they are time series and they have key value pairs,
747.76 -> called labels, that enable you to bring those metrics in.
752.61 -> It has a server, where it's gonna store that data,
755.92 -> an alert manager and a push gateway
758.65 -> to facilitate viewing and working with that data.
764.42 -> Usually you will handle visualizations with Grafana.
769.96 -> It really, again, is more on the metric side.
773.2 -> So it enables you to gain insights
775.45 -> from numeric measurements.
777.16 -> So this is stuff like your CPU use,
780.35 -> your latencies, et cetera.
786.24 -> Grafana is a visualization tool
789.3 -> that is, again, community-driven, open-source,
792.36 -> it's an analytics platform.
794.64 -> You can query your data, visualize your data,
796.83 -> alert on your data, in order to understand
799.54 -> what's going on with your metrics, logs and traces.
802.72 -> It does connect to many different back-ends.
805.81 -> So the data source plugins enable you
807.84 -> to connect to things like MongoDB, Dynatrace,
811.01 -> ServiceNow, Amazon OpenSearch service,
814.02 -> a lot of different Prometheus,
815.65 -> a lot of different back-ends that you can connect it to.
819.91 -> You can define alert rules.
821.82 -> So again, alerting is a key capability
825.62 -> within the observability stack.
827.16 -> You want to be able to send out alerts when things go wrong.
831.74 -> With Grafana, you can send alerts to places like Slack,
834.87 -> PagerDuty, and OpsGenie.
839.97 -> You can also work with different time ranges.
843.07 -> That gives you the ability to kind of drill in
845.44 -> and understand now versus then.
851.57 -> So one of the foundational elements of observability
855.46 -> is the OpenTelemetry standard, and the OpenTelemetry stack.
860.88 -> It's an OpenSource project
863.66 -> designed for the creation and management of telemetry data,
866.67 -> such as traces, metrics and logs.
869.75 -> It does support a lot of wire formats,
872.03 -> Jaeger, Zipkin and Prometheus.
875 -> This enables you to bring that data in
878.78 -> and bring it into a common format.
884.51 -> It's an evolving project.
887.388 -> A lot of different people contributing to it,
889.34 -> and we currently have, we'll get to,
893.92 -> the Amazon distribution for OpenTelemetry,
897.25 -> which brings it into a more managed situation.
902.3 -> Currently works with traces and working on metrics and logs,
907.01 -> but not entirely there yet.
910.948 -> Fluentbit is in the collection layer,
913.44 -> again, this is the instrumentation, also open-source,
917.3 -> very fast, light-weight, scalable,
920.83 -> forwards logs, metrics from their sources.
927.42 -> It has a plugin-based ecosystem
929.62 -> that enables you to collect and filter
932.63 -> and transform and augment your data.
937.38 -> It is true open-source, so it's vendor agnostic
941.81 -> and really comes as a derivative of Fluentd.
947.9 -> Integrates with Prometheus and OpenSearch,
951.46 -> Amazon CloudWatch, Amazon X-Ray, Amazon S3,
954.43 -> a lot of different destinations
956.67 -> where you can send this data and forward it on.
962.37 -> In the OpenSearch project, we have Data Prepper.
965.88 -> Data Prepper is in the collection and transformation.
969.34 -> So it's also open-source
971.8 -> and its processes observe ability data.
976.06 -> So it gives you features that enable you to filter
978.92 -> and enrich and transform the data
981.839 -> that's coming through the pipe
984.71 -> and enable downstream analytics and visualization.
991.18 -> It, right now, supports the processing
993.32 -> of distributed trace data, log ingestion,
996.53 -> and is moving towards supporting metric data in the future.
1003.37 -> Also integrates with Jaeger, Zipkin,
1005.84 -> OpenTelemetry and Fluentbit.
1011.44 -> So, we're gonna take all of those pieces
1014.03 -> and build them into one example architecture,
1018.36 -> that is an open-sourced focus architecture.
1021.09 -> On the collections side,
1022.87 -> we have our OpenTelemetry collector
1025.61 -> that's going to bring in traces
1028.32 -> and get those to Data Prepper, which is gonna prepare them.
1033.61 -> We also have Fluentbit collecting metrics
1037.33 -> and bringing those into Data Prepper.
1041.43 -> Fluentbit also can tail and collect log data,
1044.62 -> which it's gonna flow directly to OpenSearch.
1047.76 -> So we have these two pathways that come into OpenSearch,
1050.44 -> the Data Prepper pathway for traces and metrics,
1054.02 -> logs flowing from Fluentbit, directly to OpenSearch.
1058.14 -> We can also take metrics out to Prometheus from Fluentbit
1063.13 -> for our metric evaluation.
1067.66 -> On the visualization side,
1069.29 -> then OpenSearch dashboards will connect with OpenSearch
1072.5 -> and enable you to build visualizations
1074.55 -> and employ the more rich experience within OpenSearch,
1079.16 -> specifically around traces and log data.
1085.14 -> We can use Grafana to connect both to OpenSearch
1088.47 -> and Prometheus and that gives us additional ability
1092.5 -> to bring that metric data into the same location, visually,
1096.52 -> as the log and trace data.
1105.15 -> So we've gone through
1106.39 -> a number of the different open-source alternatives.
1110.55 -> Many of those have a Managed Solution, within AWS,
1114.38 -> that will make it easier to deploy and run and operate
1118.71 -> those observability components.
1121.5 -> First of those we'll talk about is
1122.78 -> Amazon Managed service for Prometheus.
1126.56 -> It's Prometheus compatible.
1128.55 -> It's a monitoring and alerting service, and again,
1131.81 -> you use it for monitoring containerized applications
1134.62 -> and infrastructure at scale.
1137.32 -> It automatically scales for ingestion and storage
1140.72 -> and alerting and querying of the metrics
1143.39 -> as your workload grows or shrinks.
1146.97 -> It's integrated with Amazon EKS,
1149.21 -> Elastic Kubernetes Service, Elastic Container Service,
1153.14 -> and AWS Distro for OpenTelemetry.
1158.78 -> We said a little bit about Amazon OpenSearch service,
1161.1 -> but Amazon OpenSearch service is a managed service
1163.97 -> that makes it easy to deploy, operate and scale OpenSearch
1167.83 -> and legacy Elasticsearch clusters in the AWS Cloud.
1171.84 -> It has some built-in observability tooling
1174.29 -> with trace analytics panel,
1176.98 -> event analytics panel and a log analytics panel.
1181.62 -> It also comes with anomaly detection features
1184.73 -> that automatically learn the normal behavior of your system
1188.33 -> and can generate alerts
1189.84 -> when that behavior leaves the normal.
1194.7 -> It's integrated with Kinesis Data Firehose,
1197.29 -> CloudWatch logs and other tooling.
1202.599 -> ServiceLens enables you to run observability
1206.47 -> on top of your CloudWatch data,
1208.87 -> integrate traces, metrics, logs and alarms into one place.
1214.29 -> It integrates CloudWatch with AWS X-Ray,
1217.72 -> also to provide an end-to-end service map
1220.5 -> and view of your application.
1223.24 -> You can do correlation against Lambda functions,
1226.13 -> API Gateways, Java applications,
1229.41 -> and either Container or on EC2.
1234.46 -> Amazon Managed Grafana is a secure production-ready,
1237.76 -> open-source distribution, supported by AWS.
1241.77 -> We developed it in collaboration with Grafana Labs
1244.9 -> and it enables you to connect to data sources
1246.93 -> like Amazon OpenSearch service,
1248.84 -> Amazon Managed service for Prometheus,
1251.27 -> X-Ray, CloudWatch, TimeStream,
1253.86 -> it has security built in and enables you
1257.51 -> to connect securely with all of these data sources
1260.5 -> in a way that preserves their integrity.
1263.1 -> There are some pre-built dashboards
1265.25 -> that give you faster access
1267.44 -> to the insights that you're looking for.
1270.19 -> AWS X-Ray collects data about requests
1272.52 -> that your application serves and gives you tools
1275.68 -> to view and gain insights into that data.
1279.72 -> X-Ray receives traces from your application,
1282.64 -> an addition to AWS services, like AWS Lambda,
1286.39 -> that enable you to bring that trace data in
1289.23 -> and view service maps and trace graphs
1291.79 -> and the other tools that you're used to.
1297.94 -> Amazon OpenSearch service is a managed service
1301.47 -> that enables you to deploy, scale and operate OpenSearch
1305.88 -> within the AWS Cloud.
1309.49 -> It supports OpenSearch versions 1.0 and 1.1,
1313.6 -> as of today, as well as Legacy Elastic search versions
1317.74 -> from 1.5 to 7.10, with visualization capabilities
1322.07 -> provided by OpenSearch dashboards and Kibana,
1325.625 -> for the 1.5 to 7.10 versions.
1331.5 -> Finally, we'll talk a little bit about AWS Distro
1333.56 -> for OpenTelemetry, secure, production-ready,
1337.03 -> AWS supported distribution of the OpenTelemetry project.
1340.81 -> It is backed by AWS support
1343.43 -> and gives you one-click deploy
1346.137 -> from the ECS and Lambda consoles.
1350.75 -> There are exporters that enable monitoring solutions,
1353.58 -> like AMP, Amazon Managed service for Prometheus,
1357.436 -> CloudWatch, X-Ray, OpenSearch service
1359.92 -> and other third party solutions.
1363.9 -> We're gonna dig in, just a touch,
1365.18 -> on Amazon OpenSearch service,
1366.81 -> just to give you a feel for some of the specifics
1369.37 -> around what observability looks like in practice.
1374.26 -> So again, we have our logs, metrics and traces,
1377.7 -> we're gonna look at, first, logs.
1380.86 -> So within Amazon OpenSearch service
1382.92 -> and from OpenSearch dashboards,
1386.29 -> you can do all kinds of visualizations
1388.35 -> that enable you, first of all,
1390.71 -> to see in aggregate what's happening.
1394.4 -> We have live tailing of the logs,
1397.05 -> including surrounding events.
1398.43 -> So this enables you to dig in
1400.8 -> and really look at the logs and figure out,
1403.33 -> okay, here's some kind of gross statistics
1406.5 -> about what's happening
1407.72 -> and here ar some specific log lines, you know,
1409.98 -> OpenSearch is ultimately a search engine,
1411.93 -> I can search for my errors, I can look at what's going on
1415.02 -> and what's happening around that in my log files,
1418.42 -> in a time-based kind of way.
1423.83 -> Amazon OpenSearch service and OpenSearch dashboard
1426.89 -> provides a complete, sort of, trace analytics experience,
1433.19 -> and when we talk about, you know, sort of trace analytics,
1435.62 -> there are a couple of major components that we see.
1439.02 -> The first of those is trace spans.
1442.13 -> Again, traces provide that end-to-end view
1445.24 -> of the processing of a request within your system.
1448.68 -> They're all connected together by a single trace ID,
1451.58 -> so a request comes in, it's assigned a trace ID,
1454.89 -> you carry that trace ID through your software,
1458.2 -> instrumenting with calls to send out that trace data,
1462.57 -> or log that trace data, all again, based on that trace.
1466.48 -> So, with this, we can get a hierarchical,
1468.92 -> you can get a hierarchical view of the processing
1471.97 -> of your request and especially the latencies
1475.27 -> and any errors that occurred
1476.83 -> in the processing of that request.
1479.15 -> This enables you to really drill in
1480.56 -> and figure out where is the time going in my application?
1484.33 -> Is my database very slow?
1486.66 -> Or perhaps there's a particular code section
1488.99 -> that is taking most of my latency.
1491.78 -> That gives you the opportunity to dig in
1493.71 -> and really look at that piece of code
1496.51 -> to figure out where the bottleneck is
1499.24 -> and really to remediate that.
1502.18 -> Your service map give you a higher level view
1505.49 -> that's an end-to-end view of all of the microservices
1509.58 -> that you've touched in the processing of your request.
1512.68 -> And again, this all aggregates up
1514.08 -> so you get a view of where is the latency going?
1517.8 -> Again, if there are errors, they'll show up here.
1521.17 -> So this lets you look at your components,
1522.87 -> figure out how they're connected,
1524.61 -> and figure out, you know,
1526.12 -> the dependencies and where there might be a challenge
1529.91 -> with latency around those dependencies.
1533.92 -> And finally, you have trace groups,
1535.32 -> trace groups enable you to bring trace information
1539.67 -> into a grouped format around
1542.69 -> particular activities in the application.
1545.07 -> So this way, you can again, look and figure out
1547.76 -> where is the latency going,
1549.8 -> and where do I need to figure out and fix something?
1556.17 -> We recently have added application analytics.
1559.7 -> This enables you to build application views
1563.95 -> across log, trace, and metric data.
1567.61 -> You select log sources or trace groups
1569.62 -> or services to be part of an application.
1572.44 -> It enables you to monitor availability
1574.52 -> and drill into detailed views
1576.57 -> on the traces and service logs.
1580.05 -> This gives you, again, the span ID and trace ID,
1583.34 -> you can trace into what's going on
1585.1 -> and figure out any issues that you're having.
1589.63 -> One of the features that's a kind of sideways feature,
1592.3 -> but super useful, with OpenSearch dashboards,
1595.87 -> we support a feature called Notebooks.
1598.06 -> Notebooks are documents that enable you
1600.71 -> to put cards onto that document
1603.22 -> to bring all kinds of different information together
1606.53 -> and really tell a story about a particular event
1609.5 -> or something that happened.
1611.46 -> You can export those as PDFs or PNGs
1614.64 -> and you can share them around
1617.41 -> and enable everybody to know what's going on.
1623.74 -> With OpenSearch and Amazon OpenSearch service,
1626.9 -> we provide machine learning kind of innovations,
1630.8 -> and chief amongst these,
1631.92 -> and especially within the observability space,
1635.19 -> is our streaming anomaly detection.
1637.94 -> With streaming anomaly detection,
1640.4 -> the system will automatically learn the correct behavior
1643.87 -> or the normal behavior of a metric that you're sending in,
1647.49 -> metric or metrics.
1649.587 -> It uses Random Cut Forest to predict
1653.4 -> when things are going off the rails
1655.72 -> and is integrated with alerting to send you alerts
1658.95 -> when things like your CPU is suddenly spiking
1661.45 -> or your traffic is suddenly spiking.
1664.2 -> It brings that information to you.
1667.75 -> Recent improvement enables you to,
1671.01 -> basically collect or group by categories within your data.
1675.04 -> So, the typical use case for this would be
1677.45 -> if you're running 1,000 servers,
1680.56 -> and you wanna look at the CPU utilization of,
1684.71 -> you know, you actually wanna group that down by the host
1688.25 -> so that you can see if there's a particular host
1690.81 -> that's exhibiting anomalous behavior,
1693.17 -> you can get an alert for that specific host.
1699.46 -> Just a quick pass at, sort of,
1702.1 -> how all of this would work in a container service,
1706.73 -> but you have your VPC with your availability zones,
1712.6 -> you have your user sending application traffic in
1716.45 -> via a load balancer,
1718.05 -> it's really all hitting your Kubernetes application,
1722.52 -> which potentially is running with Amazon RVS,
1725.7 -> as a database layer, you're gonna use,
1729.675 -> whether it's the OTEL Collector or Fluentbit
1732.86 -> or one of the other architectures,
1734.92 -> to bring that data out, to manage service for Grafana,
1738.9 -> manage Prometheus, Amazon OpenSearch service,
1742.79 -> and then use Grafana for dash boarding
1745.95 -> or OpenSearch dashboards for Dashboard.
1750.33 -> Just looking at a little bit deeper,
1753.26 -> we have our metrics, traces and logs,
1755.86 -> within the worker node then,
1757.8 -> we have our application pod and container.
1761.3 -> The Fluentbit is gonna tail standard out and standard error,
1765.41 -> gonna forward that to Amazon OpenSearch service.
1769.17 -> We have our OTEL Collector Container.
1771.57 -> OTEL is open telemetry that is going to
1775.13 -> send metrics and traces off to Prometheus
1779.32 -> as well as Amazon OpenSearch service.
1784.78 -> In the prior diagram, I had a box that said,
1787.617 -> "Buffering and delivery,"
1789.9 -> I wanted to share one of our more cost-efficient
1794.06 -> and good architectures
1796.62 -> in terms of bringing that data across.
1798.4 -> So if you have either Fluentd or Fluentbit,
1801.24 -> we generally would look at S3 as a staging area
1804.8 -> where you're sending all of your metrics, traces and logs.
1809.86 -> You can trigger off of the bucket notification
1814.4 -> and use a Lambda to queue up the object create into SQS.
1821.5 -> We then have a Lambda that's gonna pull
1823.78 -> the original object, parse it,
1827.2 -> and prepare it to deliver to Amazon OpenSearch service,
1831.05 -> using the bulk API.
1832.88 -> This is a very common pattern that we use.
1836.28 -> It is both, again, cost-efficient,
1838.71 -> it gives you, basically, a backup of all of your data,
1842.66 -> exists in S3, so that makes it easy to replay
1845.86 -> or work with other tools like Athena
1850.964 -> and this enables you to, again,
1852.95 -> keep that data over the longterm at low cost.
1859.41 -> So three takeaways here,
1861.18 -> observability really is gonna allow you
1864.53 -> to measure and monitor the behavior
1867.27 -> of your applications and infrastructure.
1870.12 -> And the end-goal of all of this observability
1872.6 -> is, again, to bring a better end-user experience
1877.13 -> for a better business outcomes for your software.
1882.425 -> We have open-source technologies across the board,
1887.74 -> also powered by AWS, in many cases,
1891.33 -> that enable you to measure and monitor this behavior.
1896.49 -> How can we help?
1898.61 -> Please learn more, look at our OpenSearch service-free trial
1903.92 -> and would love to hear about what you do.
1908.12 -> Thanks very much for your time and attention today,
1910.22 -> really appreciate it.

Source: https://www.youtube.com/watch?v=1E-ffpHHC5g