Managing time series data with Amazon OpenSearch Service--Details | Amazon Web Services

Managing time series data with Amazon OpenSearch Service--Details | Amazon Web Services


Managing time series data with Amazon OpenSearch Service--Details | Amazon Web Services

Learn how to manage time series data with Amazon OpenSearch Service’s data streams to help simplify logs, metrics, and traces. The video includes a demo on the data streams functionality.

Learn more: https://go.aws/3Uv5LGa

Subscribe:
More AWS videos - http://bit.ly/2O3zS75
More AWS events videos - http://bit.ly/316g9t4

ABOUT AWS
Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers — including the fastest-growing startups, largest enterprises, and leading government agencies — are using AWS to lower costs, become more agile, and innovate faster.

#OpenSearch #AWS #AmazonWebServices #CloudComputing


Content

4.95 -> - Okay, so hello everyone.
6.72 -> My name is Prashant Agrawal and I am working
8.88 -> as an Analytics Specialist Solution Architect in AWS,
12.33 -> where I primarily focus on Amazon OpenSearch Service.
15.6 -> And today I will be talking about managing time-series data
18.556 -> with data-streams using Amazon OpenSearch Service.
22.2 -> So, before getting into the data-streams,
24.81 -> let me explain about Amazon OpenSearch Service.
27.87 -> So, Amazon OpenSearch Service
29.4 -> is a fully managed service that makes it easy
31.8 -> to deploy, operate and scale OpenSearch
34.65 -> along with legacy Elasticsearch cluster in AWS Cloud.
38.7 -> It also has tight integration with other AWS services
42.24 -> such as Amazon Kinesis Data Firehose,
44.94 -> AWS Lambda, CloudWatch, and so on.
47.66 -> So let's take a quick detour
50.1 -> and talk about what is OpenSearch.
52.38 -> So OpenSearch is a community driven,
54.36 -> open source search and analytics suite
56.43 -> which is derived from Apache 2.0
58.53 -> licensed version of Elasticsearch 7.10.2
61.76 -> and Kibana 7.10.2 projects.
64.74 -> It consists of a search engine, which is OpenSearch
67.56 -> that provides you a way to perform search on your data.
71.1 -> As well as, we do analyze your logs
73.71 -> using visualization interface called OpenSearch Dashboards.
77.52 -> So using OpenSearch,
78.87 -> you can easily index the data of any type
81.24 -> such as logs, metrics and traces,
84.57 -> and then you can use visualization interface
86.79 -> using the OpenSearch Dashboard or even the API
89.82 -> to perform aggregation on top of those data.
92.85 -> Next, we also have set up plugins that OpenSearch provides
96.12 -> which adds the comprehensive functionality
98.13 -> such as generating alerts on the log data
100.95 -> or look for anomalies using machine learning algorithm
104.1 -> or maybe use the trace analytics
106.08 -> and the observability feature
107.52 -> to capture logs, metrics and traces,
110.31 -> and then make the code addition across
112.11 -> those logs, metrics and traces.
114.611 -> So let's jump on next and see why do we need data-stream.
119.43 -> So one of the common use case with OpenSearch
122.39 -> is to index continuous generating time-series data
125.67 -> such as logs, metrics and traces.
128.55 -> Previously, automating index rollover
130.8 -> means you had to first create a write index,
133.5 -> configure a rollover alias
135.42 -> and verify that indexes are being rolled over as expected.
139.38 -> That makes your bootstrapping process for an index
142.32 -> to be more cumbersome than it needs to be.
145.08 -> So with the data-stream it is optimized for time-series data
148.56 -> that is primarily append only in nature
150.75 -> and simplifies all the initial setup process for the data.
154.65 -> So let's see how the flow works
156.3 -> with and without data-stream.
158.31 -> So prior to data-stream,
159.81 -> this is how you user set up their indexes
161.97 -> for time-series data.
163.47 -> So they start with setting up an index
165.51 -> by specifying the index mapping and settings.
168.54 -> And, as their data grows,
170.7 -> they encounter the scaling issue with their logs and data.
174.24 -> And at that time they start thinking
176.04 -> about creating daily rotating indices
178.26 -> on the basis of dates
179.79 -> or maybe you use the Rollover API
181.68 -> to create an index for each date.
183.93 -> At this point, user has got more than one indices
186.99 -> to run the search, so they can either use the wildcard
189.853 -> to specify the index such as logs hyphen nginx hyphen star
194.55 -> or maybe too, they can use the create an alias
198.54 -> by creating the alias for those time-series indices.
202.11 -> So here in the next one you can see,
204.45 -> they can define the index alias for all relevant index
207.75 -> to bring it under one umbrella.
209.91 -> And here, searches can be performed
211.98 -> across all the underlying indices on the alias.
215.04 -> But how about performing the write updates
217.83 -> and not indexing the data,
219.57 -> as in how do we determine that which index
221.88 -> would be getting the write request, right?
224.58 -> So in this case, they can solve it
226.95 -> by marking the latest index as the write index
230.07 -> and this can be solved by bringing the common setting
234.15 -> such as the mapping index settings,
236.31 -> index patterns into an index template
239.31 -> which can be again configured on all new indices
242.52 -> getting created as a part of the rollover process.
245.58 -> So now, user needs to call the rollover
248.13 -> and calling it manually on daily basis
250.35 -> is not a practical solution
251.82 -> and they have to manage it by themselves.
254.22 -> So in that case,
255.51 -> they could use the Index State Management Policy
258.93 -> to define the index pattern and rollover policy
261.54 -> to rollover the indices.
263.1 -> It could be done either on the basis of time or the size.
266.91 -> With this legacy method,
268.32 -> user has to be aware of so many concept
270.93 -> to manage and maintain those indices
273.36 -> and they need to set up them explicitly.
276.06 -> So if you consolidate it, so we can see that,
280.35 -> what are the typical challenges
281.91 -> with the time-series indices.
283.497 -> So, OpenSearch has basic support of time-series data
287.07 -> where one can name indices on the basis of timestamp
290.25 -> by prefixing it with some name and then suffix as time,
293.82 -> where you can have some indices
295.5 -> on the basis of day, week, month, etcetera.
299.49 -> If everything is working fine
300.9 -> then why do we really need data-stream?
303.24 -> So managing your time-series indices,
304.53 -> you need to perform index rotation
306.99 -> then your log ingestion tools
308.4 -> such as Beats, logs test
309.78 -> can help you to rotate those indices on dates.
312.36 -> But, what if you hit onto some event
315.39 -> such as Black Friday or Cyber Monday
318.12 -> where your data is significantly larger than the usual.
321.12 -> So it creates the data's cube for those days
323.55 -> where you are getting a large amount of traffic.
327.72 -> So once index is rotated you need to make sure
330.21 -> that the write index is pointed to the current date index
333.33 -> and then index alias is rollover as well.
336.06 -> So here alias is nothing but a way to query
338.58 -> all the latest data which points to the current
340.98 -> or latest underlying index for the specific log.
344.79 -> This is why we have the data-stream
347.07 -> come into the existence.
348.72 -> So, let's see how data-stream works
351.66 -> at very high level and solve some of these problems
354.45 -> being a first class citizen to store the time-series data.
358.02 -> So data-stream has one or more
359.82 -> auto generated backing indices
361.89 -> and prefixed by dot ds, then the data-stream name
365.053 -> and the generation id.
366.547 -> So if you look here,
368.13 -> here the data-stream name is dot ds logs hyphen nginx.
371.91 -> Then the backing indices are like dot ds
374.13 -> hyphen logs hyphen nginx hyphen 000001
378.33 -> then 02 and 03 and so on.
381.6 -> Now coming back to the write operation,
383.85 -> so for any write operation,
385.56 -> you send the request to data-stream
387.69 -> then it will send it to the latest index.
390 -> Once an index is rolled, it will become read only
392.7 -> and you cannot add any new document
394.62 -> to the rotated index.
396.06 -> So typically in the log analytics scenario
399.03 -> there is rarely a modification.
401.04 -> And if needed, you can run update by query
403.64 -> or delete by query in the older backing indices.
407.64 -> Now let's talk about the search
409.26 -> like how do we perform the read request from those indices.
412.65 -> While learning this search query, data-stream routes
415.14 -> request to all the backing indices
417.33 -> and of course you can filter out the data
419.25 -> by timestamp to perform the intelligent search
422.1 -> on those data sets.
423.42 -> You can also use the query such as the aggregation query
427.53 -> or maybe use the filters like the Boolean filter,
430.23 -> merge clause, suit clause, etcetera to filter out the data.
434.12 -> So this is how read and write works with the data-stream.
437.79 -> So, let's talk about
439.71 -> some of the data-stream caveats what we have.
442.35 -> So, because as I mentioned data-stream
444.48 -> is primarily designed for the append only data.
447.24 -> So in order to create a data-stream,
449.19 -> you first need to create an index template
451.68 -> which configures a set of indices as a data-stream.
454.74 -> Then you need to ensure that each document
456.99 -> has a timestamp field
458.25 -> otherwise it will be added by default.
460.86 -> Then, recent data is more relevant
463.29 -> which makes it suitable for even the hot warm architecture
466.62 -> with respect to the Amazon OpenSearch service.
469.38 -> Then there are certain operations which you cannot perform
472.183 -> that will, that are termed as the blocked operation
474.96 -> like clone, delete, close, freeze.
477.87 -> So those kind of things could not be performed
480.12 -> because these operation could hinder the indexing
483.51 -> and that's why these are blocked.
486.45 -> Talking about the Ingestion API,
488.328 -> so you can run the simple API or you can do the bulk API,
492.39 -> but the only thing is if you're running the bulk API
494.88 -> you now have to explicitly specify the op type as create,
498.63 -> that is the pre requisite for sending any bulk request
501.36 -> using the data-stream.
503.16 -> So let's jump into the workings of data-stream
505.47 -> and see it in action with a short demo.
509.13 -> Okay, so let's see how it works in real world.
512.7 -> So, I have logged into the OpenSearch Dashboard
515.94 -> and I'm going to create a data-stream
518.46 -> but as of now I do not have any index template
521.82 -> and if you create a data-stream
523.11 -> without creating an index template, it will error out.
525.84 -> So if you recall my data-stream caveat slide
528.6 -> so in order to create it,
530.07 -> you need to create an index template
531.6 -> which configures a set of indices as a data-stream,
534.474 -> and data-stream object indicates that it's a data-stream
537.78 -> and not the regular index template.
539.67 -> And then the index pattern matches
541.38 -> with the name of the data-stream over here.
544.14 -> So in this case, if I'm going to create an data-stream,
547.819 -> it is giving me an error
550.35 -> like no matching index template found
552.03 -> for this particular data-stream,
553.32 -> so, now I'm going to create the index template
555.96 -> by specifying the index pattern as logs hyphen nginx.
559.458 -> And then I have the number of shards as one
561.48 -> and number of replica as one.
563.1 -> Then I have the timestamp field as @timestamp.
566.52 -> Now I have created an index template.
569.07 -> Let's verify
570.15 -> whether I have the current index template or not.
572.58 -> So, on running the GET index template logs nginx,
576 -> you can see the index template details
578.04 -> like the index patterns, name,
579.72 -> number of shards, replica of what we have configured.
583.38 -> So let's move on and then create the data-stream now.
587.01 -> So in this, basically when you create the data-stream,
592.26 -> so you can use the data-stream API to explicitly create it
596.52 -> or even like data-stream will initialize
599.25 -> as soon as you send the first document to the index
602.25 -> which is matching the index pattern name.
605.19 -> So here I'm creating the data-stream explicitly
608.43 -> by running this command and it gives me the acknowledge too
611.1 -> that means data-stream has been created.
613.68 -> You can verify the data-stream and associated index
616.71 -> by running the GET data-stream logs hyphen nginx.
620.43 -> In this case, it returns me the name of the data-stream,
623.307 -> the timestamp field and what is the backing indices
626.37 -> for this particular data-stream,
628.41 -> and then what is the generation, what is the status
630.96 -> and the template behind this particular data-stream.
634.05 -> Now let's start into sending
636.33 -> or writing the data into the data-stream.
639.24 -> So here,
641.52 -> in order to ingest data into data-stream
643.32 -> you can use the regular indexing API,
646.08 -> for example over here,
648.54 -> I'm writing the POST logs hyphen nginx
651.48 -> which is my data-stream name,
652.95 -> then I have a body where I have a couple of like message
656.13 -> and @timestamp.
657.99 -> So as I mentioned earlier
659.1 -> you need to have the timestamp field
660.447 -> otherwise it will give me an error.
662.52 -> So, after you run this particular command
665.97 -> your data is being indexed
667.62 -> and your document has been created
669.39 -> onto the data-stream with the backing indices
672.18 -> as ds logs nginx 00001.
675.69 -> Apart from sending the individual request
677.91 -> you can also send the bulk API request as well
681.39 -> in order to ingest data into the data-stream.
683.76 -> The only thing is you need to have the operation type
686.49 -> as create when you are sending any bulk request.
689.52 -> Now if I run this command, it inserts two document
692.88 -> where I have these two documents into my bulk API as well,
697.14 -> one with the message 'login success'
698.94 -> and another one as the message 'login failed'.
702.63 -> Let's see how the search works
706.11 -> and how you can perform searching the data-stream
709.86 -> and get all the documents from that stream.
712.5 -> So you can do a simple search request
716.1 -> like GET your data-stream name
719.19 -> followed by the underscore search API.
721.47 -> It will return you all the data
723.06 -> for that particular data-stream.
725.1 -> So I have inserted three documents on it
727.38 -> so you can see one is with 'login attempt',
729.66 -> one is the 'login success',
730.89 -> and another one is the 'login failed'.
732.72 -> So this way you canceled the data from the data-stream
736.92 -> and you can further use the Query DSL as well
739.444 -> to run advance query such as the bool query with filter
742.497 -> for running aggregations and what not using the data-stream.
746.31 -> Next talk about how you can perform the rollover
749.16 -> onto the data-stream.
750.21 -> So you can use the ISM Policy
752.94 -> in order to define the rollover policy
754.71 -> on the basis of size or maybe on the basis of the date-time.
758.28 -> Apart from that you can also run the manual command
760.98 -> to rollover the data-stream
762.72 -> by running the command as POST API
765.42 -> with your data-stream name followed by underscore rollover.
768.93 -> Once you run this command, it will give you the details
771.57 -> about your old index and the new index.
774.09 -> So now your old index has gone into the read only mode,
777 -> you cannot perform any write operation on this,
779.46 -> and all of the data what you are going to send,
781.59 -> it will be sent to the new index
783.15 -> which is the ds logs nginx 000002.
787.83 -> You can verify the rollover operation as well
790.53 -> by running the GET data-stream logs hyphen nginx
794.46 -> and here we have the generation two
796.59 -> means it has been rolled over,
798.6 -> this the second generation of the data-stream.
801.78 -> And then it has the details of your timestamp field
804.78 -> and then all the indices.
807.21 -> Next, we can talk about
810 -> whether we can delete the write index or not.
812.22 -> So, if you run the command to delete the write index
815.91 -> which is the 02 as of now,
818.19 -> it will error out because you cannot perform
820.77 -> the delete operation on the write index.
823.62 -> So if you have to delete the write index
826.17 -> then you have to delete the index template
828.66 -> and then you have to delete the data-stream.
831.15 -> So before showing you the delete command
833.58 -> let me quickly go to the index management page
837.69 -> and there we can search for all the indices what we have.
841.92 -> So go to the indices tab, show the data-stream indices
845.4 -> then you can select the data-stream.
847.5 -> So here I have two data, two indices on the data-stream
850.68 -> logs hyphen nginx 000001 and 02
854.82 -> and you can see I have three documents on the 01
857.01 -> and there is no documents on 02
858.81 -> because we haven't run any Ingestion API
861.668 -> to ingest the data after the rollover.
864.42 -> Now let's move to the command again
867.24 -> and how you can perform the delete.
870.06 -> So now if you go and try to delete the data-stream
873.845 -> it will delete everything and if you go to,
880.65 -> fetch the data-stream over here
883.41 -> like perform any search,
885.09 -> so it will say like no index was found
887.4 -> because we have deleted the data-stream.
889.92 -> So once you delete the data-stream
891.81 -> it will delete all the indices or all the backing indices
895.23 -> as well the write indices from that particular data-stream.
899.34 -> So this is all about a quick demo about the data-stream
902.43 -> and let's get back onto the deck
904.44 -> and then we can talk about what we discussed
908.43 -> as a part of the demo.
909.54 -> So I will show all the script over there in the form of deck
912.93 -> and followed by couple of documents
916.14 -> like how you can get started with the data-stream
918.9 -> and then what should be the next step
921 -> or the next action on that.
922.59 -> Thank you.
924.36 -> So in the demo we have seen
926.16 -> how you could create a data-stream,
928.2 -> without creating an index template, it will error out.
930.69 -> So if you recall the data-stream caveats,
932.88 -> in order to create a data-stream
934.02 -> you first need to create an index template
936.3 -> and this shows like how you can create the index template,
938.79 -> you can refer this script for creating the index template.
942.03 -> Once we have created the index template,
944.82 -> you can verify those by running the GET API
947.76 -> to make sure your index template is created.
950.64 -> Then after you have created an index template,
952.86 -> you can create a data-stream
954.57 -> where you can use the data-stream API
956.517 -> to explicitly create a data-stream.
958.38 -> If you're not creating it explicitly
960.51 -> then as soon as you send the first document,
962.46 -> it will create the data-stream.
964.83 -> Talking about the ingestion,
966.36 -> you can send the individual request as this one
969.12 -> or you can refer the demo which I showed just now
971.91 -> where you can send the bulk request as well
974.28 -> with the op type as create.
976.59 -> Then last but not the least,
977.94 -> you can run the search request
979.83 -> using the GET data-stream name slash underscore search.
983.43 -> It will give you the top twenty documents
985.44 -> for that particular data-stream
987.48 -> and then you can further use the Query DSL
989.941 -> to run advanced queries
991.2 -> such as using the bool query with filters,
993.78 -> running aggregates and send what not.
996.24 -> Lastly we saw in the demo how you can manage those indices
999.325 -> and data-stream from OpenSearch Dashboard,
1001.52 -> where you can go to the index management,
1003.38 -> filter the data for the data-streams
1005.84 -> and then it will list down all the backing indices
1008.57 -> for a particular data-stream.
1010.73 -> So that's what we covered as part of demo.
1013.85 -> So coming back to the conclusion,
1016.1 -> so in a nutshell, data-stream helps you
1018.47 -> to intelligently manage the time-series data
1021.41 -> where developer or operation team
1023.12 -> can perform focus on the development
1025.1 -> and growing their business
1026.63 -> rather than spending time on management tasks
1029 -> such as managing time-series indices,
1031.19 -> performing the rollover and so on.
1034.61 -> So this concludes a quick overview
1036.38 -> of data-stream along with a demo.
1038.36 -> If you would like to know more about data-stream,
1040.34 -> please check out our documentation
1042.17 -> which you can get from the QR code that's on here.
1044.99 -> So thank you for listening
1046.46 -> and feel free to reach out to us
1047.84 -> if you have any further questions.
1049.52 -> Thank you.

Source: https://www.youtube.com/watch?v=Mm3GFTt8wMA