Managing time series data with Amazon OpenSearch Service--Details | Amazon Web Services

Aug 16, 2023

Managing time series data with Amazon OpenSearch Service--Details | Amazon Web Services

Learn how to manage time series data with Amazon OpenSearch Service’s data streams to help simplify logs, metrics, and traces. The video includes a demo on the data streams functionality.

Learn more: https://go.aws/3Uv5LGa

Subscribe:
More AWS videos - http://bit.ly/2O3zS75
More AWS events videos - http://bit.ly/316g9t4

ABOUT AWS
Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers — including the fastest-growing startups, largest enterprises, and leading government agencies — are using AWS to lower costs, become more agile, and innovate faster.

#OpenSearch #AWS #AmazonWebServices #CloudComputing

Content

4.95 -> - Okay, so hello everyone.

6.72 -> My name is Prashant Agrawal and I am working

8.88 -> as an Analytics Specialist Solution Architect in AWS,

12.33 -> where I primarily focus on Amazon OpenSearch Service.

15.6 -> And today I will be talking about managing time-series data

18.556 -> with data-streams using Amazon OpenSearch Service.

22.2 -> So, before getting into the data-streams,

24.81 -> let me explain about Amazon OpenSearch Service.

27.87 -> So, Amazon OpenSearch Service

29.4 -> is a fully managed service that makes it easy

31.8 -> to deploy, operate and scale OpenSearch

34.65 -> along with legacy Elasticsearch cluster in AWS Cloud.

38.7 -> It also has tight integration with other AWS services

42.24 -> such as Amazon Kinesis Data Firehose,

44.94 -> AWS Lambda, CloudWatch, and so on.

47.66 -> So let's take a quick detour

50.1 -> and talk about what is OpenSearch.

52.38 -> So OpenSearch is a community driven,

54.36 -> open source search and analytics suite

56.43 -> which is derived from Apache 2.0

58.53 -> licensed version of Elasticsearch 7.10.2

61.76 -> and Kibana 7.10.2 projects.

64.74 -> It consists of a search engine, which is OpenSearch

67.56 -> that provides you a way to perform search on your data.

71.1 -> As well as, we do analyze your logs

73.71 -> using visualization interface called OpenSearch Dashboards.

77.52 -> So using OpenSearch,

78.87 -> you can easily index the data of any type

81.24 -> such as logs, metrics and traces,

84.57 -> and then you can use visualization interface

86.79 -> using the OpenSearch Dashboard or even the API

89.82 -> to perform aggregation on top of those data.

92.85 -> Next, we also have set up plugins that OpenSearch provides

96.12 -> which adds the comprehensive functionality

98.13 -> such as generating alerts on the log data

100.95 -> or look for anomalies using machine learning algorithm

104.1 -> or maybe use the trace analytics

106.08 -> and the observability feature

107.52 -> to capture logs, metrics and traces,

110.31 -> and then make the code addition across

112.11 -> those logs, metrics and traces.

114.611 -> So let's jump on next and see why do we need data-stream.

119.43 -> So one of the common use case with OpenSearch

122.39 -> is to index continuous generating time-series data

125.67 -> such as logs, metrics and traces.

128.55 -> Previously, automating index rollover

130.8 -> means you had to first create a write index,

133.5 -> configure a rollover alias

135.42 -> and verify that indexes are being rolled over as expected.

139.38 -> That makes your bootstrapping process for an index

142.32 -> to be more cumbersome than it needs to be.

145.08 -> So with the data-stream it is optimized for time-series data

148.56 -> that is primarily append only in nature

150.75 -> and simplifies all the initial setup process for the data.

154.65 -> So let's see how the flow works

156.3 -> with and without data-stream.

158.31 -> So prior to data-stream,

159.81 -> this is how you user set up their indexes

161.97 -> for time-series data.

163.47 -> So they start with setting up an index

165.51 -> by specifying the index mapping and settings.

168.54 -> And, as their data grows,

170.7 -> they encounter the scaling issue with their logs and data.

174.24 -> And at that time they start thinking

176.04 -> about creating daily rotating indices

178.26 -> on the basis of dates

179.79 -> or maybe you use the Rollover API

181.68 -> to create an index for each date.

183.93 -> At this point, user has got more than one indices

186.99 -> to run the search, so they can either use the wildcard

189.853 -> to specify the index such as logs hyphen nginx hyphen star

194.55 -> or maybe too, they can use the create an alias

198.54 -> by creating the alias for those time-series indices.

202.11 -> So here in the next one you can see,

204.45 -> they can define the index alias for all relevant index

207.75 -> to bring it under one umbrella.

209.91 -> And here, searches can be performed

211.98 -> across all the underlying indices on the alias.

215.04 -> But how about performing the write updates

217.83 -> and not indexing the data,

219.57 -> as in how do we determine that which index

221.88 -> would be getting the write request, right?

224.58 -> So in this case, they can solve it

226.95 -> by marking the latest index as the write index

230.07 -> and this can be solved by bringing the common setting

234.15 -> such as the mapping index settings,

236.31 -> index patterns into an index template

239.31 -> which can be again configured on all new indices

242.52 -> getting created as a part of the rollover process.

245.58 -> So now, user needs to call the rollover

248.13 -> and calling it manually on daily basis

250.35 -> is not a practical solution

251.82 -> and they have to manage it by themselves.

254.22 -> So in that case,

255.51 -> they could use the Index State Management Policy

258.93 -> to define the index pattern and rollover policy

261.54 -> to rollover the indices.

263.1 -> It could be done either on the basis of time or the size.

266.91 -> With this legacy method,

268.32 -> user has to be aware of so many concept

270.93 -> to manage and maintain those indices

273.36 -> and they need to set up them explicitly.

276.06 -> So if you consolidate it, so we can see that,

280.35 -> what are the typical challenges

281.91 -> with the time-series indices.

283.497 -> So, OpenSearch has basic support of time-series data

287.07 -> where one can name indices on the basis of timestamp

290.25 -> by prefixing it with some name and then suffix as time,

293.82 -> where you can have some indices

295.5 -> on the basis of day, week, month, etcetera.

299.49 -> If everything is working fine

300.9 -> then why do we really need data-stream?

303.24 -> So managing your time-series indices,

304.53 -> you need to perform index rotation

306.99 -> then your log ingestion tools

308.4 -> such as Beats, logs test

309.78 -> can help you to rotate those indices on dates.

312.36 -> But, what if you hit onto some event

315.39 -> such as Black Friday or Cyber Monday

318.12 -> where your data is significantly larger than the usual.

321.12 -> So it creates the data's cube for those days

323.55 -> where you are getting a large amount of traffic.

327.72 -> So once index is rotated you need to make sure

330.21 -> that the write index is pointed to the current date index

333.33 -> and then index alias is rollover as well.

336.06 -> So here alias is nothing but a way to query

338.58 -> all the latest data which points to the current

340.98 -> or latest underlying index for the specific log.

344.79 -> This is why we have the data-stream

347.07 -> come into the existence.

348.72 -> So, let's see how data-stream works

351.66 -> at very high level and solve some of these problems

354.45 -> being a first class citizen to store the time-series data.

358.02 -> So data-stream has one or more

359.82 -> auto generated backing indices

361.89 -> and prefixed by dot ds, then the data-stream name

365.053 -> and the generation id.

366.547 -> So if you look here,

368.13 -> here the data-stream name is dot ds logs hyphen nginx.

371.91 -> Then the backing indices are like dot ds

374.13 -> hyphen logs hyphen nginx hyphen 000001

378.33 -> then 02 and 03 and so on.

381.6 -> Now coming back to the write operation,

383.85 -> so for any write operation,

385.56 -> you send the request to data-stream

387.69 -> then it will send it to the latest index.

390 -> Once an index is rolled, it will become read only

392.7 -> and you cannot add any new document

394.62 -> to the rotated index.

396.06 -> So typically in the log analytics scenario

399.03 -> there is rarely a modification.

401.04 -> And if needed, you can run update by query

403.64 -> or delete by query in the older backing indices.

407.64 -> Now let's talk about the search

409.26 -> like how do we perform the read request from those indices.

412.65 -> While learning this search query, data-stream routes

415.14 -> request to all the backing indices

417.33 -> and of course you can filter out the data

419.25 -> by timestamp to perform the intelligent search

422.1 -> on those data sets.

423.42 -> You can also use the query such as the aggregation query

427.53 -> or maybe use the filters like the Boolean filter,

430.23 -> merge clause, suit clause, etcetera to filter out the data.

434.12 -> So this is how read and write works with the data-stream.

437.79 -> So, let's talk about

439.71 -> some of the data-stream caveats what we have.

442.35 -> So, because as I mentioned data-stream

444.48 -> is primarily designed for the append only data.

447.24 -> So in order to create a data-stream,

449.19 -> you first need to create an index template

451.68 -> which configures a set of indices as a data-stream.

454.74 -> Then you need to ensure that each document

456.99 -> has a timestamp field

458.25 -> otherwise it will be added by default.

460.86 -> Then, recent data is more relevant

463.29 -> which makes it suitable for even the hot warm architecture

466.62 -> with respect to the Amazon OpenSearch service.

469.38 -> Then there are certain operations which you cannot perform

472.183 -> that will, that are termed as the blocked operation

474.96 -> like clone, delete, close, freeze.

477.87 -> So those kind of things could not be performed

480.12 -> because these operation could hinder the indexing

483.51 -> and that's why these are blocked.

486.45 -> Talking about the Ingestion API,

488.328 -> so you can run the simple API or you can do the bulk API,

492.39 -> but the only thing is if you're running the bulk API

494.88 -> you now have to explicitly specify the op type as create,

498.63 -> that is the pre requisite for sending any bulk request

501.36 -> using the data-stream.

503.16 -> So let's jump into the workings of data-stream

505.47 -> and see it in action with a short demo.

509.13 -> Okay, so let's see how it works in real world.

512.7 -> So, I have logged into the OpenSearch Dashboard

515.94 -> and I'm going to create a data-stream

518.46 -> but as of now I do not have any index template

521.82 -> and if you create a data-stream

523.11 -> without creating an index template, it will error out.

525.84 -> So if you recall my data-stream caveat slide

528.6 -> so in order to create it,

530.07 -> you need to create an index template

531.6 -> which configures a set of indices as a data-stream,

534.474 -> and data-stream object indicates that it's a data-stream

537.78 -> and not the regular index template.

539.67 -> And then the index pattern matches

541.38 -> with the name of the data-stream over here.

544.14 -> So in this case, if I'm going to create an data-stream,

547.819 -> it is giving me an error

550.35 -> like no matching index template found

552.03 -> for this particular data-stream,

553.32 -> so, now I'm going to create the index template

555.96 -> by specifying the index pattern as logs hyphen nginx.

559.458 -> And then I have the number of shards as one

561.48 -> and number of replica as one.

563.1 -> Then I have the timestamp field as @timestamp.

566.52 -> Now I have created an index template.

569.07 -> Let's verify

570.15 -> whether I have the current index template or not.

572.58 -> So, on running the GET index template logs nginx,

576 -> you can see the index template details

578.04 -> like the index patterns, name,

579.72 -> number of shards, replica of what we have configured.

583.38 -> So let's move on and then create the data-stream now.

587.01 -> So in this, basically when you create the data-stream,

592.26 -> so you can use the data-stream API to explicitly create it

596.52 -> or even like data-stream will initialize

599.25 -> as soon as you send the first document to the index

602.25 -> which is matching the index pattern name.

605.19 -> So here I'm creating the data-stream explicitly

608.43 -> by running this command and it gives me the acknowledge too

611.1 -> that means data-stream has been created.

613.68 -> You can verify the data-stream and associated index

616.71 -> by running the GET data-stream logs hyphen nginx.

620.43 -> In this case, it returns me the name of the data-stream,

623.307 -> the timestamp field and what is the backing indices

626.37 -> for this particular data-stream,

628.41 -> and then what is the generation, what is the status

630.96 -> and the template behind this particular data-stream.

634.05 -> Now let's start into sending

636.33 -> or writing the data into the data-stream.

639.24 -> So here,

641.52 -> in order to ingest data into data-stream

643.32 -> you can use the regular indexing API,

646.08 -> for example over here,

648.54 -> I'm writing the POST logs hyphen nginx

651.48 -> which is my data-stream name,

652.95 -> then I have a body where I have a couple of like message

656.13 -> and @timestamp.

657.99 -> So as I mentioned earlier

659.1 -> you need to have the timestamp field

660.447 -> otherwise it will give me an error.

662.52 -> So, after you run this particular command

665.97 -> your data is being indexed

667.62 -> and your document has been created

669.39 -> onto the data-stream with the backing indices

672.18 -> as ds logs nginx 00001.

675.69 -> Apart from sending the individual request

677.91 -> you can also send the bulk API request as well

681.39 -> in order to ingest data into the data-stream.

683.76 -> The only thing is you need to have the operation type

686.49 -> as create when you are sending any bulk request.

689.52 -> Now if I run this command, it inserts two document

692.88 -> where I have these two documents into my bulk API as well,

697.14 -> one with the message 'login success'

698.94 -> and another one as the message 'login failed'.

702.63 -> Let's see how the search works

706.11 -> and how you can perform searching the data-stream

709.86 -> and get all the documents from that stream.

712.5 -> So you can do a simple search request

716.1 -> like GET your data-stream name

719.19 -> followed by the underscore search API.

721.47 -> It will return you all the data

723.06 -> for that particular data-stream.

725.1 -> So I have inserted three documents on it

727.38 -> so you can see one is with 'login attempt',

729.66 -> one is the 'login success',

730.89 -> and another one is the 'login failed'.

732.72 -> So this way you canceled the data from the data-stream

736.92 -> and you can further use the Query DSL as well

739.444 -> to run advance query such as the bool query with filter

742.497 -> for running aggregations and what not using the data-stream.

746.31 -> Next talk about how you can perform the rollover

749.16 -> onto the data-stream.

750.21 -> So you can use the ISM Policy

752.94 -> in order to define the rollover policy

754.71 -> on the basis of size or maybe on the basis of the date-time.

758.28 -> Apart from that you can also run the manual command

760.98 -> to rollover the data-stream

762.72 -> by running the command as POST API

765.42 -> with your data-stream name followed by underscore rollover.

768.93 -> Once you run this command, it will give you the details

771.57 -> about your old index and the new index.

774.09 -> So now your old index has gone into the read only mode,

777 -> you cannot perform any write operation on this,

779.46 -> and all of the data what you are going to send,

781.59 -> it will be sent to the new index

783.15 -> which is the ds logs nginx 000002.

787.83 -> You can verify the rollover operation as well

790.53 -> by running the GET data-stream logs hyphen nginx

794.46 -> and here we have the generation two

796.59 -> means it has been rolled over,

798.6 -> this the second generation of the data-stream.

801.78 -> And then it has the details of your timestamp field

804.78 -> and then all the indices.

807.21 -> Next, we can talk about

810 -> whether we can delete the write index or not.

812.22 -> So, if you run the command to delete the write index

815.91 -> which is the 02 as of now,

818.19 -> it will error out because you cannot perform

820.77 -> the delete operation on the write index.

823.62 -> So if you have to delete the write index

826.17 -> then you have to delete the index template

828.66 -> and then you have to delete the data-stream.

831.15 -> So before showing you the delete command

833.58 -> let me quickly go to the index management page

837.69 -> and there we can search for all the indices what we have.

841.92 -> So go to the indices tab, show the data-stream indices

845.4 -> then you can select the data-stream.

847.5 -> So here I have two data, two indices on the data-stream

850.68 -> logs hyphen nginx 000001 and 02

854.82 -> and you can see I have three documents on the 01

857.01 -> and there is no documents on 02

858.81 -> because we haven't run any Ingestion API

861.668 -> to ingest the data after the rollover.

864.42 -> Now let's move to the command again

867.24 -> and how you can perform the delete.

870.06 -> So now if you go and try to delete the data-stream

873.845 -> it will delete everything and if you go to,

880.65 -> fetch the data-stream over here

883.41 -> like perform any search,

885.09 -> so it will say like no index was found

887.4 -> because we have deleted the data-stream.

889.92 -> So once you delete the data-stream

891.81 -> it will delete all the indices or all the backing indices

895.23 -> as well the write indices from that particular data-stream.

899.34 -> So this is all about a quick demo about the data-stream

902.43 -> and let's get back onto the deck

904.44 -> and then we can talk about what we discussed

908.43 -> as a part of the demo.

909.54 -> So I will show all the script over there in the form of deck

912.93 -> and followed by couple of documents

916.14 -> like how you can get started with the data-stream

918.9 -> and then what should be the next step

921 -> or the next action on that.

922.59 -> Thank you.

924.36 -> So in the demo we have seen

926.16 -> how you could create a data-stream,

928.2 -> without creating an index template, it will error out.

930.69 -> So if you recall the data-stream caveats,

932.88 -> in order to create a data-stream

934.02 -> you first need to create an index template

936.3 -> and this shows like how you can create the index template,

938.79 -> you can refer this script for creating the index template.

942.03 -> Once we have created the index template,

944.82 -> you can verify those by running the GET API

947.76 -> to make sure your index template is created.

950.64 -> Then after you have created an index template,

952.86 -> you can create a data-stream

954.57 -> where you can use the data-stream API

956.517 -> to explicitly create a data-stream.

958.38 -> If you're not creating it explicitly

960.51 -> then as soon as you send the first document,

962.46 -> it will create the data-stream.

964.83 -> Talking about the ingestion,

966.36 -> you can send the individual request as this one

969.12 -> or you can refer the demo which I showed just now

971.91 -> where you can send the bulk request as well

974.28 -> with the op type as create.

976.59 -> Then last but not the least,

977.94 -> you can run the search request

979.83 -> using the GET data-stream name slash underscore search.

983.43 -> It will give you the top twenty documents

985.44 -> for that particular data-stream

987.48 -> and then you can further use the Query DSL

989.941 -> to run advanced queries

991.2 -> such as using the bool query with filters,

993.78 -> running aggregates and send what not.

996.24 -> Lastly we saw in the demo how you can manage those indices

999.325 -> and data-stream from OpenSearch Dashboard,

1001.52 -> where you can go to the index management,

1003.38 -> filter the data for the data-streams

1005.84 -> and then it will list down all the backing indices

1008.57 -> for a particular data-stream.

1010.73 -> So that's what we covered as part of demo.

1013.85 -> So coming back to the conclusion,

1016.1 -> so in a nutshell, data-stream helps you

1018.47 -> to intelligently manage the time-series data

1021.41 -> where developer or operation team

1023.12 -> can perform focus on the development

1025.1 -> and growing their business

1026.63 -> rather than spending time on management tasks

1029 -> such as managing time-series indices,

1031.19 -> performing the rollover and so on.

1034.61 -> So this concludes a quick overview

1036.38 -> of data-stream along with a demo.

1038.36 -> If you would like to know more about data-stream,

1040.34 -> please check out our documentation

1042.17 -> which you can get from the QR code that's on here.

1044.99 -> So thank you for listening

1046.46 -> and feel free to reach out to us

1047.84 -> if you have any further questions.

1049.52 -> Thank you.

Source: https://www.youtube.com/watch?v=Mm3GFTt8wMA