Processing Streaming Data with AWS Lambda - AWS Online Tech Talks

Aug 16, 2023

Processing Streaming Data with AWS Lambda - AWS Online Tech Talks

Processing streaming data can be complex in traditional, server-based architectures, especially if you need to react in real-time. Amazon Kinesis makes it possible to collect, process, and analyze this data at scale, and AWS Lambda can make it easier to develop highly scalable, custom workloads to turn the data into useful insights. This tech talk explains common streaming data scenarios, when to use Kinesis or Kinesis Data Firehose, and how to use Lambda in a streaming architecture. Learn about the extract, transform, load (ETL) process using Lambda and how you can implement AWS services to build data analytics. This talk also discusses best practices to help you build efficient and effective streaming solutions.

Learning Objectives:
*Understand how to manage streaming data workloads
*Learn how to use AWS Lambda to process streaming data
*Use best practices in your Lambda architectures to reduce cost and improve scale

***To learn more about the services featured in this talk, please visit: https://aws.amazon.com/serverless/ Subscribe to AWS Online Tech Talks On AWS:
https://www.youtube.com/@AWSOnlineTec…

Follow Amazon Web Services:
Official Website: https://aws.amazon.com/what-is-aws
Twitch: https://twitch.tv/aws
Twitter: https://twitter.com/awsdevelopers
Facebook: https://facebook.com/amazonwebservices
Instagram: https://instagram.com/amazonwebservices

☁️ AWS Online Tech Talks cover a wide range of topics and expertise levels through technical deep dives, demos, customer examples, and live Q\u0026A with AWS experts. Builders can choose from bite-sized 15-minute sessions, insightful fireside chats, immersive virtual workshops, interactive office hours, or watch on-demand tech talks at your own pace. Join us to fuel your learning journey with AWS.

#AWS

Content

0.88 -> thanks for joining me today for this

2.399 -> session which is about processing

4.16 -> streaming data

5.279 -> with aws lambda my name is james besik

9.04 -> and i'm a principal developer advocate

11.2 -> here in the aws serverless team i'm a

14.4 -> self-confessed serverless geek and i've

16 -> built quite a few production systems

17.76 -> using serverless infrastructure

20 -> prior to being a da i was a software

21.92 -> developer for many years

23.519 -> and also a product manager for a long

25.199 -> time the most important thing on my

27.119 -> slide here is my twitter handle

29.279 -> and my email address so if you have any

30.88 -> questions about kinesis

32.399 -> or serverless in general feel free to

34.559 -> reach out to me and i'll do my very best

36.48 -> to help

38.8 -> today we're talking about processing

40.719 -> streaming data with aws lambda

43.6 -> in this session i'll cover five topics

46.879 -> first i'll talk about streaming use

48.719 -> cases and briefly explain

50.559 -> the different kinesis services then i'll

53.76 -> show how you can build zero

55.039 -> administration stream processing

57.199 -> with kinesis data fire hose lambda is a

60.399 -> really important service for processing

62.239 -> streaming data

63.28 -> so i'll talk about how to use it for

65.04 -> on-demand compute

66.32 -> with kinesis data streams in production

69.36 -> systems it's important to know how to

70.799 -> scale

71.52 -> monitor and troubleshoot kinesis so i'll

73.6 -> provide some important guidance here

75.6 -> and finally i'll discuss how to optimize

78.08 -> your kinesis based application

80.159 -> and highlight some best practices

84.72 -> amazon kinesis makes it easier to

86.799 -> collect process

88.08 -> and analyze real-time streaming data for

91.28 -> applications

92.24 -> this can help you get timely insights

94.4 -> and react quickly to new information

97.28 -> kinesis provides cost-effective

99.04 -> streaming data processing at any scale

101.439 -> with the flexibility to choose the tools

103.36 -> that best suit the needs of your

104.72 -> workload

106.24 -> moving from traditional batch

107.759 -> architectures to streaming architectures

110.399 -> it creates new capabilities you go from

113.439 -> hourly

114 -> server logs to real-time metrics or from

116.799 -> weekly and monthly bills

118.64 -> created by a job to a system that

120.56 -> monitors real-time spending alerts

122.719 -> and implementing spending caps in

125.439 -> analyzing click streams

126.96 -> it can move from daily reports to a

128.959 -> real-time analysis

130.879 -> and for customers like financial

132.319 -> institutions who have fraud detection

134.239 -> systems

135.2 -> critically this information can drive

137.44 -> real-time detection

139.599 -> bringing real-time capabilities to

141.36 -> workloads fundamentally changes the

143.52 -> capabilities

144.56 -> and the problems that you can solve

148.8 -> there are a wide variety of use cases

150.8 -> for kinesis

151.92 -> here are just a few of the common ones

153.599 -> that we see from our customers

156 -> in video processing kinesis allows you

158.16 -> to stream video from connected devices

160.4 -> to aws

161.76 -> for analytics or machine learning the

164.16 -> video can then be processed by other

166.08 -> services

167.04 -> depending upon the needs of the workload

170.239 -> in industrial automation sensors collect

172.879 -> large amounts of data from thousands of

174.8 -> devices

176.16 -> this data can be ingested by kinesis and

178.8 -> analyzed by kinesis data analytics

181.44 -> in real time to power dashboards for

183.68 -> real-time monitoring

184.959 -> the data can also be stored and analyzed

187.36 -> historically

188.4 -> or used to train machine learning models

191.84 -> you can use kinesis to process streaming

193.84 -> data from iot devices such as consumer

196.48 -> appliances

197.599 -> embedded sensors and tv set-top boxes

201.12 -> you can then use the data to send

203.04 -> real-time alerts

204.239 -> or take actions programmatically when a

206.4 -> sensor exceeds certain operating

208.239 -> thresholds

210.56 -> data connected via kinesis can also be

213.2 -> filtered and

214 -> aggregated with kinesis data fire hose

216.959 -> and then stored durably in s3 buckets to

219.28 -> create data lakes

220.799 -> this enables analytics and machine

222.56 -> learning for use cases like production

224.4 -> optimization

225.44 -> and predictive maintenance kinesis data

228.799 -> streams can be used to

230 -> collect log and event data from sources

232.48 -> such as servers

233.68 -> desktops and mobile devices you can then

236.879 -> build

237.28 -> kinesis applications to continuously

239.36 -> process the data

240.64 -> generate metrics power live dashboards

243.68 -> and emit aggregated data into stores

245.84 -> such as amazon s3

249.76 -> under the kinesis umbrella there are

251.36 -> four distinct services with different

253.439 -> capabilities

255.12 -> kinesis data streams is a highly

257.04 -> scalable and durable

258.959 -> real-time data streaming service that

261.04 -> can continuously capture gigabytes of

263.199 -> data per

263.84 -> second from hundreds of thousands of

266.08 -> sources

267.68 -> amazon kinesis video streams makes it

269.68 -> easy to securely stream video from

271.6 -> connected devices to aws

273.84 -> for analytics machine learning and other

276 -> processing i'm not covering this service

278.08 -> today

278.639 -> but it's an important part of the

279.919 -> kinesis suite

282 -> kinesis data firehose is a managed

284.08 -> streaming service

285.199 -> with minimal administration and it's the

287.199 -> easiest way to capture

288.88 -> transform and load data streams into aws

292 -> data stores

293.12 -> you can use the data collected here for

294.88 -> real-time analytics with existing

296.56 -> business intelligence tools

300 -> kinesis data analytics enables you to

301.919 -> process data streams

303.28 -> in real time with sql or apache flink

306.639 -> you can use familiar programming

308 -> languages like sql to build complex

310.08 -> queries that run continuously on

312.24 -> streaming data

315.68 -> generally this is the pattern for

317.28 -> implementing a streaming data solution

319.919 -> data producers continuously generate

322 -> data and write it to a stream

324.4 -> a data producer could be a web server

326.639 -> sending logs

327.84 -> or it could be an application server

329.6 -> sending metrics or it could be an

331.36 -> internet of things device

333.039 -> sending telemetry the streaming service

336.16 -> then durably steers that stores the data

338.479 -> once it's been received it provides a

340.32 -> temporary buffer

341.52 -> to prepare the data and it's capable of

343.6 -> handling handling high throughput

346.56 -> the streaming service delivers the

348.08 -> records to data consumers

350.16 -> the consumer continuously processes the

352.16 -> data

353.36 -> in many cases this means cleaning

355.039 -> preparing and aggregating the records

357.759 -> transforming the raw data into

359.36 -> information

362.4 -> producers create records and records

364.72 -> contain three pieces of information

367.52 -> the partition key logically separates

369.919 -> sets of data

371.12 -> and is hashed to route the data to a

373.199 -> shard

374.56 -> the sequence number is unique per

377.84 -> partition key within the shard it's

380.08 -> assigned by kinesis

381.6 -> after the producer writes to the stream

384.479 -> and the data block

385.759 -> is a base64 encoded payload up to one

388.88 -> megabyte in size

392.8 -> using kinesis enables you to decouple

395.199 -> the data producer and consumer

397.36 -> which can help promote the development

399.12 -> of a microservices model

401.039 -> and make it easier to manage large

402.88 -> workloads

404.24 -> producers put records into streams and

407.039 -> consumers get

408.08 -> records from streams and then process

410.08 -> them

411.68 -> the delay between the time a record is

413.759 -> put into the stream

414.88 -> and the time it can be retrieved this is

416.8 -> called the put to get delay

418.639 -> is typically less than one second so

421.12 -> kinesis data streams application can

423.28 -> start consuming the data from the stream

425.599 -> almost immediately after the data is

427.68 -> added

429.28 -> a kinesis data stream is a set of one or

432 -> more shards

433.039 -> each shard contains a sequence of data

435.12 -> records

436.08 -> and each data record has a sequence

438 -> number that's assigned by the service

441.039 -> if your data rate increases you can

443.039 -> increase or decrease

444.56 -> the number of shards allocated to your

446.4 -> stream

447.759 -> kinesis can automatically encrypt

449.52 -> sensitive information as a producer

451.52 -> enters it into the stream

453.039 -> it uses aws kms for encryption

457.199 -> record ordering is maintained within

458.96 -> each shard and you can configure record

461.36 -> retention

462.08 -> for up to one year

465.599 -> a shard is a uniquely identified

467.599 -> sequence of data records in a stream

470.08 -> it provides a fixed unit of capacity

473.12 -> each shard can support up to 1 000

475.68 -> records

476.56 -> per second for writes up to a maximum

479.28 -> total data write rate

480.72 -> of one megabyte per second including

482.96 -> partition keys

484.879 -> each shard can support up to five

486.72 -> transactions per second for reads

489.199 -> up to a maximum total data read rate of

491.759 -> 2 megabytes per second

494.479 -> the data capacity of your entire stream

496.8 -> is a function of the number of shards

498.639 -> that you specify for the stream

500.8 -> and the total capacity of the stream is

502.879 -> the sum of the capacities of the shards

506.479 -> each shard contains records ordered by

508.72 -> arrival time

510.319 -> in a shard records from the trim horizon

513.2 -> are all available

514.56 -> or all the available records since the

516.399 -> beginning whereas records from the tip

518.56 -> or the latest

519.519 -> are the most current records essentially

522.24 -> the difference is whether you want to

523.68 -> start from the oldest record the trim

525.519 -> horizon

526.399 -> or from right now the latest and skip

529.04 -> data between the latest checkpoint and

532.839 -> now

534.56 -> lambda is the on-demand commute compute

536.88 -> service

537.6 -> at the heart of the aws serverless

539.6 -> portfolio

540.64 -> and it can consume and process data from

542.72 -> kinesis streams

544.56 -> the lambda service polls kinesis

546.56 -> automatically and invokes your lambda

548.48 -> functions when records are available

551.2 -> the benefit of using lambda for

552.72 -> streaming data processing

554.399 -> is that lambda manages scaling and that

556.48 -> allows you to focus on the custom

558.16 -> business logic

559.2 -> to process data instead of the

561.04 -> underlying infrastructure

563.44 -> records are delivered as a payload to

565.44 -> the lambda function

566.48 -> and you can configure how many records

568.32 -> are in each batch

569.68 -> up to ten thousand or six megabytes as a

572.72 -> total payload limit

574.88 -> lambda supports a variety of custom

577.04 -> runtimes natively such as pythonnode.net

580.48 -> and you can also bring your own runtime

582.08 -> too

582.399 -> we have customers running even erlang

584.72 -> php

585.68 -> or cobol lambda is agnostic to your

588.16 -> choice of runtime

590.24 -> in the event source mapping you can also

592.08 -> configure a starting position for

593.68 -> records

594.48 -> you can process only new records all

596.72 -> existing records

598 -> or records created after a certain date

601.04 -> once the lambda function is finished

602.56 -> processing it returns the result to

604.32 -> kinesis

607.6 -> for each shard lambda configures a polar

609.92 -> that pulls the shard every second

612 -> and invokes your lambda function when

614.079 -> records are available

615.6 -> the polar is managed internally by the

617.68 -> lambda service

620.32 -> internally a record processor pulls the

622.88 -> kinesis shard

624.56 -> the batcher creates the batches to be

626.399 -> processed by the function

628.079 -> and the invoker gets batches and invokes

630.959 -> your lambda function

635.04 -> kinesis data fire hose is a fully manual

637.76 -> service that automatically provisions

639.839 -> manages and scales the resources

641.6 -> required to process and load your

643.36 -> streaming data

646 -> configure data fire hose within minutes

648.16 -> in the console

649.36 -> cli or cloud formation and it's then

651.68 -> immediately available to receive data

653.68 -> from thousands of data sources

656.399 -> fire hose can optionally invoke a lambda

658.48 -> function to transform data before it

660.32 -> stores results

662.079 -> it's integrated with some aws services

664.959 -> like

665.279 -> s3 redshift and the elasticsearch

667.839 -> service

668.56 -> it can also deliver data to generic http

671.68 -> endpoints

672.64 -> and directly to service providers from

675.519 -> the console you point data fire hose to

677.839 -> the destinations you need

679.519 -> and use your existing applications and

681.44 -> tools to analyze streaming data

684.64 -> once setup data fire hose loads

686.8 -> streaming data

687.76 -> into your destination continuously as

690.399 -> they arrive

691.2 -> and there's no ongoing administration

693.2 -> and you don't need to manage shards

695.6 -> it's also a pay as you go service where

697.44 -> the cost is based upon usage

699.36 -> with no minimum fees

703.279 -> there are various sources for data fire

705.279 -> hose the first is a direct put from the

707.519 -> data producer

709.04 -> you can send data using the kinesis

711.12 -> agent the data fire hose api

714 -> or the aws sdk you can also configure

717.76 -> your data

718.72 -> firehose delivery stream to

720.24 -> automatically read data from an existing

722.32 -> kinesis data stream

724.24 -> and you can use cloudwatch logs

726.24 -> eventbridge or iot as a source

729.76 -> apart from s3 redshift and elasticsearch

732.8 -> there's also a range of partners that

734.48 -> you can use as destinations for data

736.8 -> firehose

737.92 -> these include datadog dynatrace

740.959 -> logic monitor mongodb cloud new relic

744.72 -> splunk and sumo logic for http

748.72 -> endpoint delivery there is a defined

750.8 -> request and response format

753.12 -> endpoints have three minutes to respond

754.959 -> to a request before a timeout occurs

758.56 -> the frequency of data delivery to s3 is

761.279 -> determined by the buffer size

763.12 -> and buffer interval values that you

765.04 -> configure for your delivery stream

767.44 -> the service buffers incoming data before

769.6 -> delivering it to s3

772.16 -> you can configure the buffer size

774.48 -> between 1 to 128 megabytes

777.04 -> or buffer interval between 60 and 900

780.399 -> seconds

781.519 -> the condition that is satisfied first

783.519 -> will then trigger data delivery to s3

787.2 -> when data delivery falls behind data

789.44 -> writing to a stream

790.8 -> the service raises the buffer size

792.56 -> dynamically this allows this allows it

794.8 -> to catch up

795.6 -> and ensure that the data is delivered to

797.68 -> the destination

799.519 -> fire hose also buffers incoming data

801.839 -> before delivering it to splunk

803.92 -> the buffer size is 5 megabytes and the

806.32 -> buffer interval

807.279 -> is 60 seconds these aren't configurable

810.079 -> because they're optimized specifically

812 -> for the splunk integration

816 -> the overall operation of a data firehose

818.72 -> stream looks like this

820.72 -> the data source puts records onto the

823.12 -> stream

824.16 -> fire hose invokes the data

825.68 -> transformation lambda function to

827.36 -> process the records

828.639 -> it uses the batch size value in the

830.48 -> configuration to determine the number

832.56 -> of records sent per invocation the

835.68 -> transformed records are returned from

837.44 -> lambda

838.24 -> back to the firehose stream those

841.12 -> records then forwarded to the

842.48 -> destination

843.36 -> in this case the amazon elasticsearch

845.839 -> service

848 -> if your lambda function invocation fails

850.72 -> because of a network timeout or because

852.639 -> you've reached the lambda invocation

854.399 -> limit

855.199 -> firehose retries the invocation three

857.68 -> times by default

859.36 -> if the invocation does not succeed

861.68 -> firehose then skips that batch of

863.519 -> records

864.24 -> any skipped records are treated as

865.92 -> unsuccessfully processed records

868.72 -> if the status of a data transformation

870.56 -> of a record is processing failed

873.04 -> then firehose also treats the record as

875.76 -> unsuccessfully processed

878.079 -> any unsuccessfully processed records

880.16 -> that are delivered to your s3 bucket

882.48 -> in a folder called processing dash

884.959 -> failed

885.839 -> this will include metadata indicating

888 -> the number of attempts made

889.68 -> the timestamp for the last attempt and

892.079 -> the lambda functions arm

894.88 -> firehose can backup all untrunk

897.04 -> transformed records

898.32 -> to your s3 bucket concurrently while

900.72 -> delivering transformed records to your

902.56 -> destination

903.839 -> you can enable source record backup when

906.56 -> you create or update a delivery stream

911.04 -> when you enable data transformation the

913.44 -> service buffers

914.48 -> incoming data by up to three megabytes

917.12 -> by default

918.16 -> you can adjust this buffering size by

920.079 -> using the processing configuration

922.24 -> api fire hose then invokes the specified

926.32 -> lambda function asynchronously

928.32 -> with each buffered batch using the

930.399 -> lambda synchronous invocation

932.48 -> mode the lambda function processes and

935.519 -> returns

936.24 -> a list of transformed records with a

938.56 -> status of each record

941.04 -> the status is an attribute called result

943.519 -> with the possible values of okay

945.92 -> dropped or processing failed in the

948.959 -> result in the returned payload the data

951.12 -> attribute

951.92 -> must be base encoded

955.279 -> the transformed data is then sent back

957.199 -> from lambda to firehose and from there

959.519 -> it's sent to the destination once the

961.519 -> specified buffering size or buffering

963.6 -> interval is reached

964.72 -> whichever one happens first

968.639 -> the data transformation lambda function

970.72 -> can run for up to five minutes

972.48 -> these functions are commonly used to

974.24 -> filter enrich and convert data

977.44 -> in filtering the lambda function can

979.519 -> remove attributes from records

981.519 -> or remove entire records based upon your

984.079 -> business logic

985.6 -> some customers use filtering to remove

987.839 -> personally identifiable information from

989.759 -> records and streams for example

992.079 -> you can also enrich records by fetching

994.24 -> data from other aws services

996.88 -> or from external data sources one common

1000.079 -> process is to use a

1001.759 -> goip service to look up the geographical

1004.639 -> location of ip addresses

1006.399 -> and append this data to records just

1008.88 -> remember that the response

1010.079 -> size in asynchronously invoked lambda

1012.56 -> functions

1013.279 -> is six megabytes in converting data

1016.48 -> you have complete flexibility in

1018.32 -> modifying the record layout to match the

1020.24 -> needs of your data consumer

1022.16 -> there are lambda blueprints available as

1024.16 -> examples of data conversion

1026.079 -> using this process firehose passes a

1029.439 -> record id

1030.559 -> along with each record to lambda during

1032.48 -> the invocation

1033.6 -> each transformed record must be returned

1036.559 -> with the exact same record

1038 -> id

1041.039 -> biohose invokes the data transformation

1043.36 -> lambda function

1044.48 -> and scales up the function if the number

1046.319 -> of records in the stream grows

1048.96 -> when the destination is s3 redshift or

1051.919 -> the elasticsearch service

1053.76 -> firehose allows up to five outstanding

1055.919 -> lambda implications per

1057.36 -> shard when the destination is splunk the

1060.32 -> quota is 10 outstanding lambda

1062.24 -> invocations per shard

1065.679 -> you can monitor a data fire hose stream

1067.919 -> in several ways

1069.679 -> firehose sends amazon cloudwatch custom

1072.16 -> metrics and logs

1073.36 -> with detailed monitoring for each

1074.96 -> delivery stream

1076.72 -> if you're using the kinesis agent this

1078.72 -> also publishes custom cloud watch

1080.559 -> metrics

1081.36 -> to help assess whether the agent is

1083.6 -> working as expected

1085.52 -> the service also uses cloudtrail to log

1088.24 -> api

1088.799 -> calls and store the data in an s3 bucket

1091.44 -> to maintain

1092.32 -> an api call history to monitor data

1095.76 -> transformations

1096.96 -> watch an alarm on the execute process

1099.6 -> dot success metric for this for the

1101.52 -> delivery stream

1103.039 -> check the ratio of successful lambda

1105.12 -> functions to all lambda functions

1106.96 -> this should be nearly 100 constantly and

1109.679 -> if it's not

1110.4 -> check the lambda functions logs to

1111.919 -> investigate further

1113.52 -> and watch an alarm on the delivery to

1116.52 -> star.success metric

1118.32 -> similarly these should also be nearly

1120.4 -> 100 all of the time

1124.96 -> next let's take a look at amazon kinesis

1127.2 -> data streams

1128.16 -> which is a massively scalable and

1129.919 -> durable real-time data streaming service

1132.4 -> and how it's used with lambda

1136.16 -> data streams makes your streaming data

1138 -> available to multiple real-time

1139.84 -> analytics applications

1141.44 -> within about a second of the data being

1143.52 -> collected

1144.96 -> producers send data to streams and you

1147.52 -> can use up to five consumers to process

1149.76 -> records if you use kinesis data

1152.48 -> analytics or data fire hose on a stream

1155.039 -> these count towards this subscriber

1156.88 -> limit lambda is one type of consumer you

1159.84 -> can use to process data from a stream

1162.559 -> the lambda service polls each shard in

1164.96 -> your kinesis stream for records

1167.28 -> the event source mapping shares read

1169.679 -> through

1170.4 -> read through put with other consumers of

1172.799 -> the shard

1174.559 -> lambda reads records in batches from the

1176.799 -> data stream and invokes your function

1178.799 -> synchronously the batch of records is

1181.28 -> delivered in an event or payload

1185.84 -> so let's look at shard processing

1189.679 -> the lambda service polls the shard once

1191.84 -> per second

1192.72 -> for a set of lambda records and then

1194.799 -> synchronously invokes the function

1196.64 -> with a batch of records if the

1199.36 -> processing is successful

1200.88 -> it moves on to the next batch if not the

1203.6 -> retry

1204.24 -> and error behavior will depend upon the

1206 -> event source configuration

1208.88 -> using the default settings lambda

1210.64 -> invokes the function again with the same

1212.32 -> batch of records

1213.52 -> it continues to do this until the batch

1215.28 -> is successful or the records expire out

1217.76 -> of the stream

1219.2 -> this may not be ideal for large batches

1221.2 -> so you can also use the bisect

1223.28 -> batch on function error feature in the

1226.4 -> batch processing failure

1227.919 -> this will often be caused by a single

1229.76 -> record and this feature helps find that

1231.76 -> record

1232.96 -> using this lambda splits the batch into

1234.96 -> two retrying the oldest half of the

1237.36 -> records first

1238.559 -> this splitting process is repeated

1240.64 -> recursively

1241.679 -> until it finds the one record that's

1243.44 -> causing the failure

1244.96 -> the other batches process successfully

1248.4 -> the batch with one record is retried

1250.48 -> until successful

1251.84 -> respecting the maximum number of retries

1254 -> and record age in the stream

1256.24 -> in this case earlier retries do not

1258.96 -> count towards the

1260 -> maximum retries for records that are

1263.2 -> retried

1264.08 -> the maximum number of times or they've

1265.84 -> aged out of a stream those will be

1267.2 -> discarded

1268.08 -> oftentimes you will want to save these

1269.76 -> messages so to do this configure a

1271.84 -> failure destination

1273.2 -> and instead those records will be sent

1275.12 -> there

1277.919 -> you can scale the throughput of kinesis

1279.76 -> data streams by adding shards to a

1281.679 -> stream

1282.72 -> the easiest way to do this is with the

1284.72 -> update shard count

1286 -> api to use this set a target shard count

1290.08 -> and the api will manage splitting and

1291.84 -> merging shards in the background

1294.24 -> this process is asynchronous and it can

1296.72 -> cause

1297.36 -> short-lived shards to be created in

1299.52 -> addition to the final shards

1301.76 -> the short-lived shards count towards the

1304.08 -> total limit for your account in the

1305.76 -> region

1307.6 -> when using this api call we recommend

1310.32 -> that you specify a target shard count

1312.64 -> that is a multiple of 25 so 25

1316.72 -> 50 75 percent and so on

1320.159 -> you can specify any target value within

1322.559 -> your shard limit

1323.84 -> however if you specify a target that

1325.679 -> isn't a multiple of 25 percent

1327.679 -> the scaling action might take longer to

1329.919 -> complete

1331.36 -> for advanced users you can use split

1333.76 -> shard and merge shard directly

1337.76 -> this is an example of the cli command to

1340.08 -> scale a stream

1341.2 -> from four to six shards if the action is

1344.559 -> successful

1345.52 -> the service returns a 200 response with

1348.159 -> a json payload containing the stream

1350.32 -> name

1350.96 -> current shard count and the target shard

1353.28 -> count

1354.48 -> this process is asynchronous so the

1356.559 -> response occurs before the scaling

1358.4 -> operation is complete

1361.84 -> using this api there are some important

1363.919 -> limits

1364.88 -> you cannot scale to more than 10 times

1367.44 -> per rolling

1368.559 -> 24-hour period per stream you cannot

1371.679 -> scale up to more than double your

1373.039 -> current shard count

1374 -> for a stream you cannot scale down to

1376.88 -> below half your current shard count for

1379.039 -> a stream

1380.24 -> you also cannot scale to more than ten

1382.32 -> thousand shards in a stream

1383.919 -> or if your stream already has more than

1385.6 -> ten thousand scale down

1387.52 -> unless the target is under ten thousand

1390.559 -> many of these quotas in place are soft

1392.4 -> limits so contact aws support

1394.96 -> if you need higher limits

1398.96 -> resharding enables you to increase or

1401.28 -> decrease the number of shards in a

1403.039 -> stream

1403.679 -> to adapt to changes in the rate of data

1406.08 -> flowing through the stream

1408.24 -> it's typically performed by an

1409.679 -> administrator or monitoring application

1412.48 -> in response to kinesis metrics

1415.76 -> you must specify the new starting hash

1417.919 -> value when performing

1419.44 -> is this command this value determines

1422.4 -> the point of the split within the parent

1424.24 -> shard hash key space

1426.24 -> in most cases you want to do an even

1428.24 -> split

1429.679 -> after issuing the api call the existing

1432.08 -> stream

1432.88 -> goes into updating status and the stream

1435.679 -> scales up by splitting shards

1438.08 -> this creates two new child shards that

1440.88 -> split the partition key space of the

1442.799 -> parent shard

1444.64 -> the parent shard is still available but

1446.96 -> it enters a state called

1448.4 -> closed the consuming lambda function

1450.72 -> does not start receiving

1452.08 -> records from the child shards until it's

1454.799 -> processed

1455.44 -> all the records from the parent shards

1459.76 -> once the last records are processed in

1461.84 -> the parent shard

1462.96 -> its status becomes expired and no more

1465.679 -> records

1466.4 -> will be processed by that shard whereas

1469.36 -> only one lambda function

1470.72 -> processed the shard before splitting

1472.799 -> once the child shards are open and

1474.72 -> active

1475.6 -> now there are two lambda functions

1480.159 -> nieces data streams and cloudwatch are

1482.4 -> integrated

1483.44 -> the metrics that you configure for your

1485.12 -> streams automatically collected

1487.52 -> and pushed to cloudwatch every minute

1490.24 -> there are several useful metrics to

1491.919 -> monitor

1492.72 -> your streams and shards the get

1495.84 -> records dot iterator age milliseconds

1498.559 -> metric

1499.279 -> measures the difference between the age

1500.96 -> of the last record consumed

1503.039 -> and the latest record put to a stream

1506 -> this is important to monitor

1507.6 -> since having too high of an iterator age

1509.919 -> in relation to your stream's retention

1511.76 -> period

1512.559 -> can cause you to lose data as records

1514.96 -> expire from the stream

1517.12 -> this value should generally not exceed

1519.279 -> 50 percent

1520.24 -> of your stream retention when you get to

1522.48 -> 100

1523.44 -> of your stream retention data will be

1525.44 -> lost

1526.64 -> if you're getting behind a temporary

1528.799 -> stop gap is to increase the retention

1530.96 -> time

1531.44 -> of your stream the better solution is to

1534.08 -> add more consumers to keep up

1537.44 -> when your consumers exceed the read

1539.679 -> provisioned throughput exceeded metric

1542.159 -> they will start being throttled and you

1543.919 -> won't be able to read from the stream

1545.84 -> this can start backing up your stream so

1548.159 -> monitor the average statistic for this

1550 -> metric and try to get this value as

1551.6 -> close to zero as possible

1554 -> the same is true with the right

1555.919 -> provision throughput exceeded metric

1558.64 -> when exceeded producers are throttled

1560.72 -> and you won't be able to put records to

1562.4 -> the stream

1563.679 -> monitoring the average for this

1565.039 -> statistic can help you determine

1567.279 -> if your producers are healthy

1570.559 -> the put record dot success and put

1573.64 -> records.success

1574.88 -> metrics are incremented whenever

1576.88 -> producers succeed

1578.32 -> in sending data to your stream

1580.72 -> monitoring for spikes or drops can help

1582.72 -> you monitor

1583.679 -> the health of producers and catch

1585.84 -> problems early

1587.6 -> you'll want to watch the average

1588.96 -> statistic for whichever the two of the

1590.72 -> two api

1591.6 -> calls you're using because cloudwatch

1593.76 -> splits the two apis

1595.36 -> into two different metrics

1599.08 -> getrecords.success is the consumer side

1601.52 -> equivalent

1602.24 -> of put records.success look for spikes

1605.44 -> or drops in this metric

1606.88 -> to ensure that your consumers are

1608.32 -> healthy the average is the most

1610.559 -> useful statistic for this purpose

1614.64 -> there are also several useful metrics

1616.64 -> for monitoring lambda function consumers

1619.679 -> set alarms on errors and throttles when

1622 -> these go over zero

1623.52 -> so you can investigate further also

1626.4 -> monitor spikes in duration which may

1628.24 -> indicate if the consumer is slowing down

1630.64 -> so you can take corrective action lambda

1634.08 -> emits the iterator age metric when the

1636.32 -> function

1636.799 -> finishes processing a batch of records

1639.2 -> the metric

1639.84 -> indicates how old the last function in

1641.76 -> the batch was when the finish

1643.36 -> when the processing finishes if your

1645.52 -> function is processing new events

1647.6 -> you can use the iterator age to estimate

1649.679 -> the latency

1650.72 -> between when a record is added and when

1653.039 -> the function processes it

1656.88 -> if the iterator age starts to grow

1658.799 -> rapidly here are some troubleshooting

1661.2 -> steps and questions that can help

1662.88 -> alleviate the problem

1664.64 -> how many lambda functions are subscribed

1666.64 -> to the stream can you add another

1668.72 -> consumer to help process records are any

1671.679 -> functions showing errors or throttles in

1673.679 -> their metrics

1674.88 -> are you seeing a large increase in the

1677.039 -> incoming records or incoming bytes

1678.96 -> metrics in the stream

1680.24 -> that indicate a growth input in producer

1682.48 -> data

1683.76 -> instead of allowing a function to error

1685.679 -> out you could use try catch handling

1688.08 -> to log the arrow error log the records

1690.48 -> that cause the arrows

1691.76 -> and then return successfully this allows

1693.76 -> lambda to process the next batch

1696.64 -> you can also scale the lambda

1698 -> concurrency per shard by using the

1700 -> parallelization factor

1701.6 -> which i'll discuss in a few minutes

1704 -> increasing lambda memory can also

1705.6 -> increase the performance of lambda

1707.039 -> consumers

1708.08 -> since it increases the amount of virtual

1709.84 -> cpu and compute power

1712.159 -> available for processing

1715.76 -> the read provision throughput exceeded

1717.919 -> metric can warn you

1719.76 -> when you're reaching the five reads per

1721.919 -> seconds

1722.799 -> or two megabytes per second limit if you

1725.84 -> can remove subscribers

1727.279 -> this can alleviate the issue remember

1730 -> that kinesis data analytics

1731.84 -> and kinesis fire hose are also

1733.76 -> subscribers

1734.88 -> so you can remove these if they're not

1736.559 -> needed if you need all the subscribers

1739.44 -> on the stream

1740.24 -> one way to solve this is to use enhanced

1742.399 -> fan out

1743.36 -> this feature enables consumers to

1745.12 -> receive records from a stream

1746.96 -> with a throughput of up to two megabytes

1748.88 -> of data per second per shard

1751.44 -> this throughput is dedicated which means

1753.76 -> that consumers

1754.88 -> using enhanced fanout don't contend with

1757.12 -> other consumers

1758.08 -> that are receiving data from the stream

1760.64 -> kinesis pushes

1761.84 -> data records from the stream to

1763.679 -> consumers using enhanced fan

1766.08 -> out meaning these consumers don't need

1768.08 -> to poll for data

1769.84 -> to see how this works with lambda

1771.36 -> consumers see the compute blog post

1774.08 -> at s12d dot com forward slash enhanced

1777.039 -> fan out

1781.039 -> in this next section i'll talk about

1782.559 -> some features and configurations

1784.159 -> available for optimizing the performance

1786.32 -> of your kinesis data streams application

1790.159 -> lambda consumers can use enhanced fan

1792.48 -> out and http

1793.84 -> 2. to minimize latency and maximize

1796.88 -> read throughput you can create a

1798.72 -> datastream consumer with enhanced fanout

1801.44 -> stream consumers can get a dedicated

1803.919 -> connection to each shard

1805.44 -> that doesn't impact other applications

1807.44 -> reading from the stream

1809.44 -> the dedicated throughput can help if you

1811.36 -> have many applications

1812.88 -> reading the same data or if you're

1814.88 -> reprocessing a stream with large records

1817.6 -> kinesis pushes records to lambda over

1820.08 -> http 2

1821.52 -> which increases performance by up to 65

1823.76 -> percent

1826.559 -> standard consumers use a pull model over

1828.96 -> http

1830.399 -> whereas efo consumers use a push model

1833.2 -> over http

1835.679 -> a standard consumer with with five

1838 -> consumers

1839.039 -> would average 200 milliseconds of

1841.36 -> latency each

1842.32 -> it's up to one second for all five

1845.36 -> using enhanced fan out the consumers are

1847.36 -> completely independent and did not

1848.88 -> impact each other

1850.08 -> even with five consumers each consumer

1852.96 -> averages about 70 milliseconds of

1854.88 -> latency

1856.159 -> the read throughput is dedicated so this

1858.159 -> provides significantly faster throughput

1860.24 -> for many workloads

1861.76 -> but note there is an additional charge

1863.6 -> for using this feature

1867.2 -> by default there is one instance of a

1869.12 -> lambda function per

1870.399 -> shard in a stream you can increase the

1872.88 -> number of concurrent lambda functions

1874.799 -> that are processing a shard by changing

1876.96 -> the parallelization

1878.08 -> factor the batches maintain in-order

1880.96 -> processing per partition key

1883.039 -> and this feature is available for both

1885.2 -> kinesis data streams

1886.72 -> and for dynamodb streams

1892.159 -> the parallelization factor from one to

1894.32 -> two you can see in this diagram

1896.72 -> how each shard now has two instances of

1899.44 -> a consuming lambda function

1900.88 -> processing batches in parallel the

1903.6 -> result is that records are processed in

1905.519 -> half the

1906 -> time all things being equal you can also

1908.64 -> increase this value

1909.679 -> to a maximum of 10.

1914 -> kinesis has a per shard right limit of

1916.559 -> one megabyte per second

1918.32 -> or 1 000 messages a second if your data

1921.36 -> producers are creating many small

1923.12 -> messages

1924 -> you may reach the limit of messages

1926 -> while still being under the one megabyte

1928 -> per second limit

1929.6 -> having many small messages can lead to

1931.6 -> lower throughput per shard

1933.44 -> and it can increase the cost of a

1934.799 -> workload you can often resolve this

1937.44 -> issue

1938.08 -> by using aggregation to increase the

1940.159 -> payload size

1941.279 -> reduce the number of messages and

1943.12 -> improve the throughput

1945.2 -> there are two libraries commonly used to

1947.12 -> help with this the kinesis producer

1949.279 -> library

1949.919 -> kpl and the kinesis aggregation library

1953.6 -> the kpl provides a layer of abstraction

1955.919 -> for ingesting data

1957.44 -> and offers a synchronous and

1958.96 -> asynchronous interface

1961.039 -> it's recommended to use the asynchronous

1963.36 -> interface

1964.159 -> wherever possible the kpl handles

1967.2 -> batching and multi-threading

1968.96 -> and also emits metrics to cloud watch so

1971.519 -> you can monitor performance

1973.76 -> for java users the kinesis client

1976 -> library integrates seamlessly with the

1978.32 -> kpl

1979.12 -> and can help on the consumer side

1981.279 -> otherwise for consumers using other

1983.039 -> runtimes

1983.919 -> the kinesis aggregation library can help

1986 -> simplify the de-aggregation process if

1988.64 -> you use this technique

1991.84 -> for streams with low numbers of records

1993.76 -> you may find that the consuming lambda

1995.36 -> function

1996.08 -> is invoked with small batches which

1998.159 -> increases the processing cost per

2000 -> message

2001.36 -> if the latency sensitivity of the

2003.12 -> workload is less important for example

2005.519 -> in archiving workloads you can change

2007.919 -> this behavior

2008.799 -> to wait for more messages to arrive

2010.72 -> before invoking the lambda function

2014.399 -> by default lambda invokes your function

2016.72 -> as soon as records are available in the

2018.399 -> stream

2019.36 -> if the batch that lambda reads from the

2020.96 -> stream has only one record in it

2023.12 -> lambda sends only one record to the

2025.039 -> function to avoid this you can tell the

2027.36 -> event source to buffer records

2029.2 -> for up to five minutes by configuring a

2031.12 -> batch window

2032.96 -> before invoking the function lambda

2035.279 -> continues to read records from the

2036.88 -> stream until it's gathered a full batch

2038.799 -> or until the batch window expires

2042 -> this is an additional knob to tune the

2044.32 -> streaming trigger

2045.279 -> you can set a time to wait before

2047.12 -> triggering up to five minutes

2048.8 -> seven seconds the batch size is still

2051.44 -> respected

2052.399 -> and it will trigger on full batches

2053.919 -> before the batch window is up

2055.76 -> this works for both kinesis data streams

2058.159 -> and dynamo db streams triggers

2063.839 -> kinesis data analytics is the easiest

2065.919 -> way to transform and analyze streaming

2068.079 -> data in real time

2069.52 -> you can interactively query streaming

2071.44 -> data using standard sql

2073.359 -> and you can build apache flink

2074.96 -> applications using java

2077.119 -> python and scala you can also build

2080 -> apache beam applications using java to

2082.72 -> analyze data streams

2086.159 -> you can use kinesis data analytics for

2088.159 -> many use cases to process data

2090.079 -> continuously

2091.28 -> and derive insights in near real time

2094.639 -> there are three common use cases for

2096.639 -> kinesis data analytics

2099.04 -> streaming etl you can prepare data

2101.2 -> before loading

2102.32 -> into a data lake or data warehouse

2104.72 -> normalizing data

2106 -> with a specified schema and reducing or

2108.48 -> eliminating

2109.68 -> batch etl steps for continuous metric

2113.359 -> generation kda calculates statistics and

2116.48 -> trends over time

2117.76 -> you can use this to create real-time

2119.52 -> leader boards in games

2120.96 -> or measure sensor averages in iot

2123.52 -> devices over a rolling time window

2126.56 -> finally for responsive real-time metrics

2129.119 -> you send real-time

2130 -> alarms or notifications when certain

2132 -> metrics reach pre-defined

2133.76 -> thresholds or when your application

2135.76 -> detects anomalies

2137.76 -> in all cases the process is the same you

2140.079 -> initially capture data

2141.599 -> via kinesis data fire hose or kinesis

2143.839 -> data streams

2145.04 -> and then kinesis data analytics is the

2146.96 -> consumer for those streams

2148.96 -> the output is then sent to downstream

2150.88 -> consumers and tools for alerting

2152.96 -> visualization

2154 -> and distribution kinesis data analytics

2158.32 -> implements the ansi 2008 sql standard

2161.359 -> with extensions

2162.88 -> these extensions enable you to process

2165.359 -> streaming data using sql

2167.839 -> if you're building a sql application

2169.68 -> there are a couple of important concepts

2171.44 -> i want to briefly introduce

2173.359 -> the first is a stream you map a

2175.52 -> streaming source

2176.64 -> to an in-application stream that is

2178.96 -> created using a sql statement like this

2182.24 -> data continuously flows from the from

2184.4 -> the streaming source

2185.599 -> into the in application stream this

2188.4 -> stream works like a table

2190 -> that you can query using sql statements

2192.32 -> but it's called a stream because it

2193.68 -> represents continuous data flow

2196.48 -> you can have multiple writers insert

2198.56 -> data into an in-application stream

2201.44 -> and there can be multiple readers

2202.88 -> selecting from the stream

2205.04 -> think of this as an in-application

2206.88 -> stream that implements

2208.4 -> a published subscribe messaging paradigm

2210.88 -> in this paradigm

2212 -> data can be processed interpreted and

2214.24 -> forwarded by a cascade

2215.839 -> of streaming sql statements without

2217.599 -> having to be stored in a traditional

2219.359 -> relational database

2222.56 -> once you've created an in-application

2224.32 -> stream then you can pump data into it

2226.64 -> using a pump which is defined

2228.079 -> using a statement like this a pump is a

2230.72 -> continuous

2231.44 -> insert query that's running to insert

2233.839 -> data

2234.48 -> from an in application stream into

2236.64 -> another in application stream

2241.2 -> sql queries in your application code

2243.359 -> execute continuously

2245.119 -> over in application streams these

2247.839 -> streams represent

2248.8 -> unbounded data that flows continuously

2251.04 -> through your application

2252.8 -> to get results from this continuously

2254.8 -> updating input

2256.079 -> you bound queries using a window defined

2258.88 -> in terms of time

2260.8 -> these are also called windowed sql

2264.16 -> for a time-based windowed query you

2266.4 -> specify a time-based windowed size

2269.44 -> you can specify a query to process

2271.359 -> records in a tumbling window

2273.68 -> a sliding window or a stagger window

2276.16 -> depending on your application

2278.48 -> sliding windows aggregate data

2280.32 -> continuously

2281.599 -> using a fixed time or a row count

2283.839 -> interval

2285.28 -> tumbling windows aggregate data using

2287.76 -> distinct time-based windows

2289.68 -> these open and close at regular

2291.44 -> intervals such as every 15 minutes

2294.72 -> stagger window windows use keyed

2297.839 -> time-based windows

2299.119 -> these windows open as data arrives and

2301.68 -> the keys allow for multiple overlapping

2303.839 -> windows

2306.48 -> if you need to use tumbling windows in

2308.48 -> your aggregations there's a relatively

2310.48 -> new feature that can help

2312.48 -> lambda now supports streaming analytics

2314.64 -> calculations for kinesis

2316.96 -> this allows developers to calculate

2319.119 -> aggregates near real time

2321.04 -> and pass the state across multiple

2322.88 -> lambda invocations

2324.64 -> this provides an alternative way to

2326.48 -> build analytics solutions

2329.28 -> a tumbling window in lambda is a fixed

2331.839 -> size

2332.56 -> non-overlapping time interval of up to

2334.8 -> 15 minutes

2336.32 -> you specify the duration in the event

2338.24 -> source mapping between the stream

2340.079 -> and the lambda function when you apply a

2342.56 -> tumbling

2343.28 -> window to a stream items in the stream

2345.76 -> are grouped by window

2347.119 -> and sent to the processing lambda

2348.56 -> function the function

2350.56 -> returns a state value that is passed to

2352.64 -> the next tumbling window

2355.04 -> in the diagram shown a stream has four

2357.359 -> windows over 60 minutes

2359.599 -> the first window contains items one and

2362 -> five

2363.04 -> the function sums these and returns the

2365.44 -> value six

2367.119 -> the sum result is passed to the second

2369.119 -> tumbling window

2370.24 -> which adds 3 2 and 7. that brings the

2373.28 -> total to 18

2374.8 -> and that's passed to the third window

2376.88 -> and so on

2379.68 -> in practice tumbling windows and lambda

2381.839 -> look like this

2383.359 -> the incoming event is the same as before

2385.92 -> kinesis provides an array of records

2388.48 -> what's different is the handful of new

2390.72 -> attributes

2391.92 -> there's a start and end time to the

2394 -> window and the state attribute

2396.4 -> the state is initially empty the first

2398.88 -> invocation returns a value in the state

2401.28 -> in this case

2402.16 -> the item count and sales total when

2404.96 -> kinesis invokes the second lambda

2406.72 -> function

2407.359 -> it passes the state it received from the

2409.28 -> first

2410.48 -> when the final invocation occurs there

2412.56 -> is a new attribute

2413.599 -> in the event payload indicating that

2415.839 -> it's the last one

2417.52 -> you can then choose to durably persist

2419.44 -> the calculation result

2420.88 -> in s3 dynamodb efs

2424.079 -> or another downstream service

2428.96 -> when you're building data streaming

2430.4 -> solutions that use lambda consumers

2432.8 -> you can do everything in the aws

2434.72 -> management console

2436.319 -> but once you get familiar with building

2437.92 -> these applications

2439.359 -> it's often easier to move to an

2441.2 -> infrastructure as code solution

2443.76 -> the aws serverless application model

2446.16 -> also known as sam

2447.52 -> lets you express the lambda function and

2449.76 -> the event source mapping

2451.28 -> in code templates using the sam cli

2454.64 -> you can then deploy these templates into

2456.4 -> your aws account

2458.079 -> these can help create repeatable

2460.079 -> versionable deployments

2461.52 -> that you can share across teams

2465.2 -> here's this here's an example of a sam

2467.04 -> template that deploys a kinesis data

2469.04 -> stream

2469.76 -> a consumer a lambda function and the iam

2472.72 -> resources needed to run the application

2475.839 -> you can see and deploy the entire

2477.68 -> example in the url

2479.52 -> shown on the slide the first resource

2482.8 -> is the kinesi stream itself which is

2484.88 -> defined with a shard count of one

2488.24 -> the next resource defines the kinesis

2490.48 -> stream consumer

2492 -> it sets the stream arn that's the amazon

2494.4 -> resource name

2495.68 -> and the consumer name next the template

2499.52 -> defines the lambda function

2501.359 -> to consume the records from the kinesis

2503.599 -> stream

2505.04 -> the sam template specifies the code

2507.04 -> location run

2508.319 -> time and handler name the event source

2511.2 -> mapping references the application

2512.96 -> consumer for the kinesis stream

2515.04 -> it sets the siding permit position to

2516.96 -> latest it could have used the trim

2518.56 -> horizon instead

2519.92 -> and the batch size of 100 records

2522.96 -> the batch size specifies the number of

2524.8 -> records that the poller's

2526.319 -> batcher gathers before it involves the

2528.4 -> function

2531.04 -> finally the output section of the

2533.04 -> template provides references to the

2534.88 -> resources that have been

2536 -> created this entire sam template can be

2539.119 -> deployed

2539.76 -> from the command line using the sam cli

2545.28 -> we've reviewed some of the capabilities

2547.2 -> of the kinesis suite of services

2549.359 -> i want to leave you with a few general

2551.04 -> best practices that can help you operate

2553.52 -> production streaming workloads

2555.92 -> it's important to understand the choice

2557.68 -> between kinesis data fire hose

2559.92 -> and kinesis data streams fire hose is a

2563.28 -> fully managed

2564 -> and scalable service that requires no

2565.92 -> administration so if your streaming

2567.599 -> workload

2568.48 -> is storing data in one of its targets

2570.48 -> like s3 or redshift

2572.079 -> this is often the best choice data

2574.88 -> streams provides more flexibility

2576.88 -> but you must monitor the shard activity

2579.119 -> and understand

2580.079 -> how to scale up and down as needed

2583.28 -> architects frequently advise customers

2585.28 -> to design their streaming applications

2587.52 -> with your data consumers in mind you may

2590.16 -> have a variety of different types of

2592.16 -> consumer with different needs

2594 -> and working backwards from their

2595.44 -> requirements can help you build a

2597.28 -> solution

2597.92 -> that provides the most value if the

2600.8 -> payload received by lambda functions

2603.04 -> then you may receive data blobs with

2605.44 -> inconsistent data structures and varying

2607.599 -> attributes

2608.8 -> generally it's best to transform this

2610.56 -> raw data into

2611.92 -> aggregation friendly records that

2614.24 -> downstream consumers

2615.68 -> can work with more easily before getting

2619.28 -> to production we recommend becoming

2620.96 -> familiar with the mechanisms for

2622.56 -> monitoring and scaling streams ahead of

2624.4 -> time

2625.52 -> you can build cloudwatch dashboards

2627.2 -> containing many of the important metrics

2629.04 -> to monitor performance

2630.48 -> so you're ready before going into

2631.92 -> production

2633.52 -> similarly understanding the troubles of

2635.68 -> troubleshooting steps to tackle common

2637.599 -> streaming issues

2638.8 -> can also save significant time and i

2641.28 -> covered a few of those

2642.72 -> common ones earlier

2646.079 -> for more information head over to

2648.44 -> serverlessland.com

2649.76 -> where there are more resources blogs

2652 -> videos workshops and learning paths

2654.24 -> to help you learn more about developing

2656.56 -> serverless solutions on aws

2660 -> my name is james beswick and i'm a

2661.68 -> principal developer advocate

2663.2 -> here in the aws serverless team if you

2666.079 -> have any questions at all about

2667.359 -> streaming data or serverless technology

2669.359 -> in general

2670.16 -> feel free to contact me thanks very much

2672.319 -> for your time today

Source: https://www.youtube.com/watch?v=kSppyMr6sXA