Processing Streaming Data with AWS Lambda - AWS Online Tech Talks

Processing Streaming Data with AWS Lambda - AWS Online Tech Talks


Processing Streaming Data with AWS Lambda - AWS Online Tech Talks

Processing streaming data can be complex in traditional, server-based architectures, especially if you need to react in real-time. Amazon Kinesis makes it possible to collect, process, and analyze this data at scale, and AWS Lambda can make it easier to develop highly scalable, custom workloads to turn the data into useful insights. This tech talk explains common streaming data scenarios, when to use Kinesis or Kinesis Data Firehose, and how to use Lambda in a streaming architecture. Learn about the extract, transform, load (ETL) process using Lambda and how you can implement AWS services to build data analytics. This talk also discusses best practices to help you build efficient and effective streaming solutions.

Learning Objectives:
*Understand how to manage streaming data workloads
*Learn how to use AWS Lambda to process streaming data
*Use best practices in your Lambda architectures to reduce cost and improve scale

***To learn more about the services featured in this talk, please visit: https://aws.amazon.com/serverless/ Subscribe to AWS Online Tech Talks On AWS:
https://www.youtube.com/@AWSOnlineTec

Follow Amazon Web Services:
Official Website: https://aws.amazon.com/what-is-aws
Twitch: https://twitch.tv/aws
Twitter: https://twitter.com/awsdevelopers
Facebook: https://facebook.com/amazonwebservices
Instagram: https://instagram.com/amazonwebservices

☁️ AWS Online Tech Talks cover a wide range of topics and expertise levels through technical deep dives, demos, customer examples, and live Q\u0026A with AWS experts. Builders can choose from bite-sized 15-minute sessions, insightful fireside chats, immersive virtual workshops, interactive office hours, or watch on-demand tech talks at your own pace. Join us to fuel your learning journey with AWS.

#AWS


Content

0.88 -> thanks for joining me today for this
2.399 -> session which is about processing
4.16 -> streaming data
5.279 -> with aws lambda my name is james besik
9.04 -> and i'm a principal developer advocate
11.2 -> here in the aws serverless team i'm a
14.4 -> self-confessed serverless geek and i've
16 -> built quite a few production systems
17.76 -> using serverless infrastructure
20 -> prior to being a da i was a software
21.92 -> developer for many years
23.519 -> and also a product manager for a long
25.199 -> time the most important thing on my
27.119 -> slide here is my twitter handle
29.279 -> and my email address so if you have any
30.88 -> questions about kinesis
32.399 -> or serverless in general feel free to
34.559 -> reach out to me and i'll do my very best
36.48 -> to help
38.8 -> today we're talking about processing
40.719 -> streaming data with aws lambda
43.6 -> in this session i'll cover five topics
46.879 -> first i'll talk about streaming use
48.719 -> cases and briefly explain
50.559 -> the different kinesis services then i'll
53.76 -> show how you can build zero
55.039 -> administration stream processing
57.199 -> with kinesis data fire hose lambda is a
60.399 -> really important service for processing
62.239 -> streaming data
63.28 -> so i'll talk about how to use it for
65.04 -> on-demand compute
66.32 -> with kinesis data streams in production
69.36 -> systems it's important to know how to
70.799 -> scale
71.52 -> monitor and troubleshoot kinesis so i'll
73.6 -> provide some important guidance here
75.6 -> and finally i'll discuss how to optimize
78.08 -> your kinesis based application
80.159 -> and highlight some best practices
84.72 -> amazon kinesis makes it easier to
86.799 -> collect process
88.08 -> and analyze real-time streaming data for
91.28 -> applications
92.24 -> this can help you get timely insights
94.4 -> and react quickly to new information
97.28 -> kinesis provides cost-effective
99.04 -> streaming data processing at any scale
101.439 -> with the flexibility to choose the tools
103.36 -> that best suit the needs of your
104.72 -> workload
106.24 -> moving from traditional batch
107.759 -> architectures to streaming architectures
110.399 -> it creates new capabilities you go from
113.439 -> hourly
114 -> server logs to real-time metrics or from
116.799 -> weekly and monthly bills
118.64 -> created by a job to a system that
120.56 -> monitors real-time spending alerts
122.719 -> and implementing spending caps in
125.439 -> analyzing click streams
126.96 -> it can move from daily reports to a
128.959 -> real-time analysis
130.879 -> and for customers like financial
132.319 -> institutions who have fraud detection
134.239 -> systems
135.2 -> critically this information can drive
137.44 -> real-time detection
139.599 -> bringing real-time capabilities to
141.36 -> workloads fundamentally changes the
143.52 -> capabilities
144.56 -> and the problems that you can solve
148.8 -> there are a wide variety of use cases
150.8 -> for kinesis
151.92 -> here are just a few of the common ones
153.599 -> that we see from our customers
156 -> in video processing kinesis allows you
158.16 -> to stream video from connected devices
160.4 -> to aws
161.76 -> for analytics or machine learning the
164.16 -> video can then be processed by other
166.08 -> services
167.04 -> depending upon the needs of the workload
170.239 -> in industrial automation sensors collect
172.879 -> large amounts of data from thousands of
174.8 -> devices
176.16 -> this data can be ingested by kinesis and
178.8 -> analyzed by kinesis data analytics
181.44 -> in real time to power dashboards for
183.68 -> real-time monitoring
184.959 -> the data can also be stored and analyzed
187.36 -> historically
188.4 -> or used to train machine learning models
191.84 -> you can use kinesis to process streaming
193.84 -> data from iot devices such as consumer
196.48 -> appliances
197.599 -> embedded sensors and tv set-top boxes
201.12 -> you can then use the data to send
203.04 -> real-time alerts
204.239 -> or take actions programmatically when a
206.4 -> sensor exceeds certain operating
208.239 -> thresholds
210.56 -> data connected via kinesis can also be
213.2 -> filtered and
214 -> aggregated with kinesis data fire hose
216.959 -> and then stored durably in s3 buckets to
219.28 -> create data lakes
220.799 -> this enables analytics and machine
222.56 -> learning for use cases like production
224.4 -> optimization
225.44 -> and predictive maintenance kinesis data
228.799 -> streams can be used to
230 -> collect log and event data from sources
232.48 -> such as servers
233.68 -> desktops and mobile devices you can then
236.879 -> build
237.28 -> kinesis applications to continuously
239.36 -> process the data
240.64 -> generate metrics power live dashboards
243.68 -> and emit aggregated data into stores
245.84 -> such as amazon s3
249.76 -> under the kinesis umbrella there are
251.36 -> four distinct services with different
253.439 -> capabilities
255.12 -> kinesis data streams is a highly
257.04 -> scalable and durable
258.959 -> real-time data streaming service that
261.04 -> can continuously capture gigabytes of
263.199 -> data per
263.84 -> second from hundreds of thousands of
266.08 -> sources
267.68 -> amazon kinesis video streams makes it
269.68 -> easy to securely stream video from
271.6 -> connected devices to aws
273.84 -> for analytics machine learning and other
276 -> processing i'm not covering this service
278.08 -> today
278.639 -> but it's an important part of the
279.919 -> kinesis suite
282 -> kinesis data firehose is a managed
284.08 -> streaming service
285.199 -> with minimal administration and it's the
287.199 -> easiest way to capture
288.88 -> transform and load data streams into aws
292 -> data stores
293.12 -> you can use the data collected here for
294.88 -> real-time analytics with existing
296.56 -> business intelligence tools
300 -> kinesis data analytics enables you to
301.919 -> process data streams
303.28 -> in real time with sql or apache flink
306.639 -> you can use familiar programming
308 -> languages like sql to build complex
310.08 -> queries that run continuously on
312.24 -> streaming data
315.68 -> generally this is the pattern for
317.28 -> implementing a streaming data solution
319.919 -> data producers continuously generate
322 -> data and write it to a stream
324.4 -> a data producer could be a web server
326.639 -> sending logs
327.84 -> or it could be an application server
329.6 -> sending metrics or it could be an
331.36 -> internet of things device
333.039 -> sending telemetry the streaming service
336.16 -> then durably steers that stores the data
338.479 -> once it's been received it provides a
340.32 -> temporary buffer
341.52 -> to prepare the data and it's capable of
343.6 -> handling handling high throughput
346.56 -> the streaming service delivers the
348.08 -> records to data consumers
350.16 -> the consumer continuously processes the
352.16 -> data
353.36 -> in many cases this means cleaning
355.039 -> preparing and aggregating the records
357.759 -> transforming the raw data into
359.36 -> information
362.4 -> producers create records and records
364.72 -> contain three pieces of information
367.52 -> the partition key logically separates
369.919 -> sets of data
371.12 -> and is hashed to route the data to a
373.199 -> shard
374.56 -> the sequence number is unique per
377.84 -> partition key within the shard it's
380.08 -> assigned by kinesis
381.6 -> after the producer writes to the stream
384.479 -> and the data block
385.759 -> is a base64 encoded payload up to one
388.88 -> megabyte in size
392.8 -> using kinesis enables you to decouple
395.199 -> the data producer and consumer
397.36 -> which can help promote the development
399.12 -> of a microservices model
401.039 -> and make it easier to manage large
402.88 -> workloads
404.24 -> producers put records into streams and
407.039 -> consumers get
408.08 -> records from streams and then process
410.08 -> them
411.68 -> the delay between the time a record is
413.759 -> put into the stream
414.88 -> and the time it can be retrieved this is
416.8 -> called the put to get delay
418.639 -> is typically less than one second so
421.12 -> kinesis data streams application can
423.28 -> start consuming the data from the stream
425.599 -> almost immediately after the data is
427.68 -> added
429.28 -> a kinesis data stream is a set of one or
432 -> more shards
433.039 -> each shard contains a sequence of data
435.12 -> records
436.08 -> and each data record has a sequence
438 -> number that's assigned by the service
441.039 -> if your data rate increases you can
443.039 -> increase or decrease
444.56 -> the number of shards allocated to your
446.4 -> stream
447.759 -> kinesis can automatically encrypt
449.52 -> sensitive information as a producer
451.52 -> enters it into the stream
453.039 -> it uses aws kms for encryption
457.199 -> record ordering is maintained within
458.96 -> each shard and you can configure record
461.36 -> retention
462.08 -> for up to one year
465.599 -> a shard is a uniquely identified
467.599 -> sequence of data records in a stream
470.08 -> it provides a fixed unit of capacity
473.12 -> each shard can support up to 1 000
475.68 -> records
476.56 -> per second for writes up to a maximum
479.28 -> total data write rate
480.72 -> of one megabyte per second including
482.96 -> partition keys
484.879 -> each shard can support up to five
486.72 -> transactions per second for reads
489.199 -> up to a maximum total data read rate of
491.759 -> 2 megabytes per second
494.479 -> the data capacity of your entire stream
496.8 -> is a function of the number of shards
498.639 -> that you specify for the stream
500.8 -> and the total capacity of the stream is
502.879 -> the sum of the capacities of the shards
506.479 -> each shard contains records ordered by
508.72 -> arrival time
510.319 -> in a shard records from the trim horizon
513.2 -> are all available
514.56 -> or all the available records since the
516.399 -> beginning whereas records from the tip
518.56 -> or the latest
519.519 -> are the most current records essentially
522.24 -> the difference is whether you want to
523.68 -> start from the oldest record the trim
525.519 -> horizon
526.399 -> or from right now the latest and skip
529.04 -> data between the latest checkpoint and
532.839 -> now
534.56 -> lambda is the on-demand commute compute
536.88 -> service
537.6 -> at the heart of the aws serverless
539.6 -> portfolio
540.64 -> and it can consume and process data from
542.72 -> kinesis streams
544.56 -> the lambda service polls kinesis
546.56 -> automatically and invokes your lambda
548.48 -> functions when records are available
551.2 -> the benefit of using lambda for
552.72 -> streaming data processing
554.399 -> is that lambda manages scaling and that
556.48 -> allows you to focus on the custom
558.16 -> business logic
559.2 -> to process data instead of the
561.04 -> underlying infrastructure
563.44 -> records are delivered as a payload to
565.44 -> the lambda function
566.48 -> and you can configure how many records
568.32 -> are in each batch
569.68 -> up to ten thousand or six megabytes as a
572.72 -> total payload limit
574.88 -> lambda supports a variety of custom
577.04 -> runtimes natively such as pythonnode.net
580.48 -> and you can also bring your own runtime
582.08 -> too
582.399 -> we have customers running even erlang
584.72 -> php
585.68 -> or cobol lambda is agnostic to your
588.16 -> choice of runtime
590.24 -> in the event source mapping you can also
592.08 -> configure a starting position for
593.68 -> records
594.48 -> you can process only new records all
596.72 -> existing records
598 -> or records created after a certain date
601.04 -> once the lambda function is finished
602.56 -> processing it returns the result to
604.32 -> kinesis
607.6 -> for each shard lambda configures a polar
609.92 -> that pulls the shard every second
612 -> and invokes your lambda function when
614.079 -> records are available
615.6 -> the polar is managed internally by the
617.68 -> lambda service
620.32 -> internally a record processor pulls the
622.88 -> kinesis shard
624.56 -> the batcher creates the batches to be
626.399 -> processed by the function
628.079 -> and the invoker gets batches and invokes
630.959 -> your lambda function
635.04 -> kinesis data fire hose is a fully manual
637.76 -> service that automatically provisions
639.839 -> manages and scales the resources
641.6 -> required to process and load your
643.36 -> streaming data
646 -> configure data fire hose within minutes
648.16 -> in the console
649.36 -> cli or cloud formation and it's then
651.68 -> immediately available to receive data
653.68 -> from thousands of data sources
656.399 -> fire hose can optionally invoke a lambda
658.48 -> function to transform data before it
660.32 -> stores results
662.079 -> it's integrated with some aws services
664.959 -> like
665.279 -> s3 redshift and the elasticsearch
667.839 -> service
668.56 -> it can also deliver data to generic http
671.68 -> endpoints
672.64 -> and directly to service providers from
675.519 -> the console you point data fire hose to
677.839 -> the destinations you need
679.519 -> and use your existing applications and
681.44 -> tools to analyze streaming data
684.64 -> once setup data fire hose loads
686.8 -> streaming data
687.76 -> into your destination continuously as
690.399 -> they arrive
691.2 -> and there's no ongoing administration
693.2 -> and you don't need to manage shards
695.6 -> it's also a pay as you go service where
697.44 -> the cost is based upon usage
699.36 -> with no minimum fees
703.279 -> there are various sources for data fire
705.279 -> hose the first is a direct put from the
707.519 -> data producer
709.04 -> you can send data using the kinesis
711.12 -> agent the data fire hose api
714 -> or the aws sdk you can also configure
717.76 -> your data
718.72 -> firehose delivery stream to
720.24 -> automatically read data from an existing
722.32 -> kinesis data stream
724.24 -> and you can use cloudwatch logs
726.24 -> eventbridge or iot as a source
729.76 -> apart from s3 redshift and elasticsearch
732.8 -> there's also a range of partners that
734.48 -> you can use as destinations for data
736.8 -> firehose
737.92 -> these include datadog dynatrace
740.959 -> logic monitor mongodb cloud new relic
744.72 -> splunk and sumo logic for http
748.72 -> endpoint delivery there is a defined
750.8 -> request and response format
753.12 -> endpoints have three minutes to respond
754.959 -> to a request before a timeout occurs
758.56 -> the frequency of data delivery to s3 is
761.279 -> determined by the buffer size
763.12 -> and buffer interval values that you
765.04 -> configure for your delivery stream
767.44 -> the service buffers incoming data before
769.6 -> delivering it to s3
772.16 -> you can configure the buffer size
774.48 -> between 1 to 128 megabytes
777.04 -> or buffer interval between 60 and 900
780.399 -> seconds
781.519 -> the condition that is satisfied first
783.519 -> will then trigger data delivery to s3
787.2 -> when data delivery falls behind data
789.44 -> writing to a stream
790.8 -> the service raises the buffer size
792.56 -> dynamically this allows this allows it
794.8 -> to catch up
795.6 -> and ensure that the data is delivered to
797.68 -> the destination
799.519 -> fire hose also buffers incoming data
801.839 -> before delivering it to splunk
803.92 -> the buffer size is 5 megabytes and the
806.32 -> buffer interval
807.279 -> is 60 seconds these aren't configurable
810.079 -> because they're optimized specifically
812 -> for the splunk integration
816 -> the overall operation of a data firehose
818.72 -> stream looks like this
820.72 -> the data source puts records onto the
823.12 -> stream
824.16 -> fire hose invokes the data
825.68 -> transformation lambda function to
827.36 -> process the records
828.639 -> it uses the batch size value in the
830.48 -> configuration to determine the number
832.56 -> of records sent per invocation the
835.68 -> transformed records are returned from
837.44 -> lambda
838.24 -> back to the firehose stream those
841.12 -> records then forwarded to the
842.48 -> destination
843.36 -> in this case the amazon elasticsearch
845.839 -> service
848 -> if your lambda function invocation fails
850.72 -> because of a network timeout or because
852.639 -> you've reached the lambda invocation
854.399 -> limit
855.199 -> firehose retries the invocation three
857.68 -> times by default
859.36 -> if the invocation does not succeed
861.68 -> firehose then skips that batch of
863.519 -> records
864.24 -> any skipped records are treated as
865.92 -> unsuccessfully processed records
868.72 -> if the status of a data transformation
870.56 -> of a record is processing failed
873.04 -> then firehose also treats the record as
875.76 -> unsuccessfully processed
878.079 -> any unsuccessfully processed records
880.16 -> that are delivered to your s3 bucket
882.48 -> in a folder called processing dash
884.959 -> failed
885.839 -> this will include metadata indicating
888 -> the number of attempts made
889.68 -> the timestamp for the last attempt and
892.079 -> the lambda functions arm
894.88 -> firehose can backup all untrunk
897.04 -> transformed records
898.32 -> to your s3 bucket concurrently while
900.72 -> delivering transformed records to your
902.56 -> destination
903.839 -> you can enable source record backup when
906.56 -> you create or update a delivery stream
911.04 -> when you enable data transformation the
913.44 -> service buffers
914.48 -> incoming data by up to three megabytes
917.12 -> by default
918.16 -> you can adjust this buffering size by
920.079 -> using the processing configuration
922.24 -> api fire hose then invokes the specified
926.32 -> lambda function asynchronously
928.32 -> with each buffered batch using the
930.399 -> lambda synchronous invocation
932.48 -> mode the lambda function processes and
935.519 -> returns
936.24 -> a list of transformed records with a
938.56 -> status of each record
941.04 -> the status is an attribute called result
943.519 -> with the possible values of okay
945.92 -> dropped or processing failed in the
948.959 -> result in the returned payload the data
951.12 -> attribute
951.92 -> must be base encoded
955.279 -> the transformed data is then sent back
957.199 -> from lambda to firehose and from there
959.519 -> it's sent to the destination once the
961.519 -> specified buffering size or buffering
963.6 -> interval is reached
964.72 -> whichever one happens first
968.639 -> the data transformation lambda function
970.72 -> can run for up to five minutes
972.48 -> these functions are commonly used to
974.24 -> filter enrich and convert data
977.44 -> in filtering the lambda function can
979.519 -> remove attributes from records
981.519 -> or remove entire records based upon your
984.079 -> business logic
985.6 -> some customers use filtering to remove
987.839 -> personally identifiable information from
989.759 -> records and streams for example
992.079 -> you can also enrich records by fetching
994.24 -> data from other aws services
996.88 -> or from external data sources one common
1000.079 -> process is to use a
1001.759 -> goip service to look up the geographical
1004.639 -> location of ip addresses
1006.399 -> and append this data to records just
1008.88 -> remember that the response
1010.079 -> size in asynchronously invoked lambda
1012.56 -> functions
1013.279 -> is six megabytes in converting data
1016.48 -> you have complete flexibility in
1018.32 -> modifying the record layout to match the
1020.24 -> needs of your data consumer
1022.16 -> there are lambda blueprints available as
1024.16 -> examples of data conversion
1026.079 -> using this process firehose passes a
1029.439 -> record id
1030.559 -> along with each record to lambda during
1032.48 -> the invocation
1033.6 -> each transformed record must be returned
1036.559 -> with the exact same record
1038 -> id
1041.039 -> biohose invokes the data transformation
1043.36 -> lambda function
1044.48 -> and scales up the function if the number
1046.319 -> of records in the stream grows
1048.96 -> when the destination is s3 redshift or
1051.919 -> the elasticsearch service
1053.76 -> firehose allows up to five outstanding
1055.919 -> lambda implications per
1057.36 -> shard when the destination is splunk the
1060.32 -> quota is 10 outstanding lambda
1062.24 -> invocations per shard
1065.679 -> you can monitor a data fire hose stream
1067.919 -> in several ways
1069.679 -> firehose sends amazon cloudwatch custom
1072.16 -> metrics and logs
1073.36 -> with detailed monitoring for each
1074.96 -> delivery stream
1076.72 -> if you're using the kinesis agent this
1078.72 -> also publishes custom cloud watch
1080.559 -> metrics
1081.36 -> to help assess whether the agent is
1083.6 -> working as expected
1085.52 -> the service also uses cloudtrail to log
1088.24 -> api
1088.799 -> calls and store the data in an s3 bucket
1091.44 -> to maintain
1092.32 -> an api call history to monitor data
1095.76 -> transformations
1096.96 -> watch an alarm on the execute process
1099.6 -> dot success metric for this for the
1101.52 -> delivery stream
1103.039 -> check the ratio of successful lambda
1105.12 -> functions to all lambda functions
1106.96 -> this should be nearly 100 constantly and
1109.679 -> if it's not
1110.4 -> check the lambda functions logs to
1111.919 -> investigate further
1113.52 -> and watch an alarm on the delivery to
1116.52 -> star.success metric
1118.32 -> similarly these should also be nearly
1120.4 -> 100 all of the time
1124.96 -> next let's take a look at amazon kinesis
1127.2 -> data streams
1128.16 -> which is a massively scalable and
1129.919 -> durable real-time data streaming service
1132.4 -> and how it's used with lambda
1136.16 -> data streams makes your streaming data
1138 -> available to multiple real-time
1139.84 -> analytics applications
1141.44 -> within about a second of the data being
1143.52 -> collected
1144.96 -> producers send data to streams and you
1147.52 -> can use up to five consumers to process
1149.76 -> records if you use kinesis data
1152.48 -> analytics or data fire hose on a stream
1155.039 -> these count towards this subscriber
1156.88 -> limit lambda is one type of consumer you
1159.84 -> can use to process data from a stream
1162.559 -> the lambda service polls each shard in
1164.96 -> your kinesis stream for records
1167.28 -> the event source mapping shares read
1169.679 -> through
1170.4 -> read through put with other consumers of
1172.799 -> the shard
1174.559 -> lambda reads records in batches from the
1176.799 -> data stream and invokes your function
1178.799 -> synchronously the batch of records is
1181.28 -> delivered in an event or payload
1185.84 -> so let's look at shard processing
1189.679 -> the lambda service polls the shard once
1191.84 -> per second
1192.72 -> for a set of lambda records and then
1194.799 -> synchronously invokes the function
1196.64 -> with a batch of records if the
1199.36 -> processing is successful
1200.88 -> it moves on to the next batch if not the
1203.6 -> retry
1204.24 -> and error behavior will depend upon the
1206 -> event source configuration
1208.88 -> using the default settings lambda
1210.64 -> invokes the function again with the same
1212.32 -> batch of records
1213.52 -> it continues to do this until the batch
1215.28 -> is successful or the records expire out
1217.76 -> of the stream
1219.2 -> this may not be ideal for large batches
1221.2 -> so you can also use the bisect
1223.28 -> batch on function error feature in the
1226.4 -> batch processing failure
1227.919 -> this will often be caused by a single
1229.76 -> record and this feature helps find that
1231.76 -> record
1232.96 -> using this lambda splits the batch into
1234.96 -> two retrying the oldest half of the
1237.36 -> records first
1238.559 -> this splitting process is repeated
1240.64 -> recursively
1241.679 -> until it finds the one record that's
1243.44 -> causing the failure
1244.96 -> the other batches process successfully
1248.4 -> the batch with one record is retried
1250.48 -> until successful
1251.84 -> respecting the maximum number of retries
1254 -> and record age in the stream
1256.24 -> in this case earlier retries do not
1258.96 -> count towards the
1260 -> maximum retries for records that are
1263.2 -> retried
1264.08 -> the maximum number of times or they've
1265.84 -> aged out of a stream those will be
1267.2 -> discarded
1268.08 -> oftentimes you will want to save these
1269.76 -> messages so to do this configure a
1271.84 -> failure destination
1273.2 -> and instead those records will be sent
1275.12 -> there
1277.919 -> you can scale the throughput of kinesis
1279.76 -> data streams by adding shards to a
1281.679 -> stream
1282.72 -> the easiest way to do this is with the
1284.72 -> update shard count
1286 -> api to use this set a target shard count
1290.08 -> and the api will manage splitting and
1291.84 -> merging shards in the background
1294.24 -> this process is asynchronous and it can
1296.72 -> cause
1297.36 -> short-lived shards to be created in
1299.52 -> addition to the final shards
1301.76 -> the short-lived shards count towards the
1304.08 -> total limit for your account in the
1305.76 -> region
1307.6 -> when using this api call we recommend
1310.32 -> that you specify a target shard count
1312.64 -> that is a multiple of 25 so 25
1316.72 -> 50 75 percent and so on
1320.159 -> you can specify any target value within
1322.559 -> your shard limit
1323.84 -> however if you specify a target that
1325.679 -> isn't a multiple of 25 percent
1327.679 -> the scaling action might take longer to
1329.919 -> complete
1331.36 -> for advanced users you can use split
1333.76 -> shard and merge shard directly
1337.76 -> this is an example of the cli command to
1340.08 -> scale a stream
1341.2 -> from four to six shards if the action is
1344.559 -> successful
1345.52 -> the service returns a 200 response with
1348.159 -> a json payload containing the stream
1350.32 -> name
1350.96 -> current shard count and the target shard
1353.28 -> count
1354.48 -> this process is asynchronous so the
1356.559 -> response occurs before the scaling
1358.4 -> operation is complete
1361.84 -> using this api there are some important
1363.919 -> limits
1364.88 -> you cannot scale to more than 10 times
1367.44 -> per rolling
1368.559 -> 24-hour period per stream you cannot
1371.679 -> scale up to more than double your
1373.039 -> current shard count
1374 -> for a stream you cannot scale down to
1376.88 -> below half your current shard count for
1379.039 -> a stream
1380.24 -> you also cannot scale to more than ten
1382.32 -> thousand shards in a stream
1383.919 -> or if your stream already has more than
1385.6 -> ten thousand scale down
1387.52 -> unless the target is under ten thousand
1390.559 -> many of these quotas in place are soft
1392.4 -> limits so contact aws support
1394.96 -> if you need higher limits
1398.96 -> resharding enables you to increase or
1401.28 -> decrease the number of shards in a
1403.039 -> stream
1403.679 -> to adapt to changes in the rate of data
1406.08 -> flowing through the stream
1408.24 -> it's typically performed by an
1409.679 -> administrator or monitoring application
1412.48 -> in response to kinesis metrics
1415.76 -> you must specify the new starting hash
1417.919 -> value when performing
1419.44 -> is this command this value determines
1422.4 -> the point of the split within the parent
1424.24 -> shard hash key space
1426.24 -> in most cases you want to do an even
1428.24 -> split
1429.679 -> after issuing the api call the existing
1432.08 -> stream
1432.88 -> goes into updating status and the stream
1435.679 -> scales up by splitting shards
1438.08 -> this creates two new child shards that
1440.88 -> split the partition key space of the
1442.799 -> parent shard
1444.64 -> the parent shard is still available but
1446.96 -> it enters a state called
1448.4 -> closed the consuming lambda function
1450.72 -> does not start receiving
1452.08 -> records from the child shards until it's
1454.799 -> processed
1455.44 -> all the records from the parent shards
1459.76 -> once the last records are processed in
1461.84 -> the parent shard
1462.96 -> its status becomes expired and no more
1465.679 -> records
1466.4 -> will be processed by that shard whereas
1469.36 -> only one lambda function
1470.72 -> processed the shard before splitting
1472.799 -> once the child shards are open and
1474.72 -> active
1475.6 -> now there are two lambda functions
1480.159 -> nieces data streams and cloudwatch are
1482.4 -> integrated
1483.44 -> the metrics that you configure for your
1485.12 -> streams automatically collected
1487.52 -> and pushed to cloudwatch every minute
1490.24 -> there are several useful metrics to
1491.919 -> monitor
1492.72 -> your streams and shards the get
1495.84 -> records dot iterator age milliseconds
1498.559 -> metric
1499.279 -> measures the difference between the age
1500.96 -> of the last record consumed
1503.039 -> and the latest record put to a stream
1506 -> this is important to monitor
1507.6 -> since having too high of an iterator age
1509.919 -> in relation to your stream's retention
1511.76 -> period
1512.559 -> can cause you to lose data as records
1514.96 -> expire from the stream
1517.12 -> this value should generally not exceed
1519.279 -> 50 percent
1520.24 -> of your stream retention when you get to
1522.48 -> 100
1523.44 -> of your stream retention data will be
1525.44 -> lost
1526.64 -> if you're getting behind a temporary
1528.799 -> stop gap is to increase the retention
1530.96 -> time
1531.44 -> of your stream the better solution is to
1534.08 -> add more consumers to keep up
1537.44 -> when your consumers exceed the read
1539.679 -> provisioned throughput exceeded metric
1542.159 -> they will start being throttled and you
1543.919 -> won't be able to read from the stream
1545.84 -> this can start backing up your stream so
1548.159 -> monitor the average statistic for this
1550 -> metric and try to get this value as
1551.6 -> close to zero as possible
1554 -> the same is true with the right
1555.919 -> provision throughput exceeded metric
1558.64 -> when exceeded producers are throttled
1560.72 -> and you won't be able to put records to
1562.4 -> the stream
1563.679 -> monitoring the average for this
1565.039 -> statistic can help you determine
1567.279 -> if your producers are healthy
1570.559 -> the put record dot success and put
1573.64 -> records.success
1574.88 -> metrics are incremented whenever
1576.88 -> producers succeed
1578.32 -> in sending data to your stream
1580.72 -> monitoring for spikes or drops can help
1582.72 -> you monitor
1583.679 -> the health of producers and catch
1585.84 -> problems early
1587.6 -> you'll want to watch the average
1588.96 -> statistic for whichever the two of the
1590.72 -> two api
1591.6 -> calls you're using because cloudwatch
1593.76 -> splits the two apis
1595.36 -> into two different metrics
1599.08 -> getrecords.success is the consumer side
1601.52 -> equivalent
1602.24 -> of put records.success look for spikes
1605.44 -> or drops in this metric
1606.88 -> to ensure that your consumers are
1608.32 -> healthy the average is the most
1610.559 -> useful statistic for this purpose
1614.64 -> there are also several useful metrics
1616.64 -> for monitoring lambda function consumers
1619.679 -> set alarms on errors and throttles when
1622 -> these go over zero
1623.52 -> so you can investigate further also
1626.4 -> monitor spikes in duration which may
1628.24 -> indicate if the consumer is slowing down
1630.64 -> so you can take corrective action lambda
1634.08 -> emits the iterator age metric when the
1636.32 -> function
1636.799 -> finishes processing a batch of records
1639.2 -> the metric
1639.84 -> indicates how old the last function in
1641.76 -> the batch was when the finish
1643.36 -> when the processing finishes if your
1645.52 -> function is processing new events
1647.6 -> you can use the iterator age to estimate
1649.679 -> the latency
1650.72 -> between when a record is added and when
1653.039 -> the function processes it
1656.88 -> if the iterator age starts to grow
1658.799 -> rapidly here are some troubleshooting
1661.2 -> steps and questions that can help
1662.88 -> alleviate the problem
1664.64 -> how many lambda functions are subscribed
1666.64 -> to the stream can you add another
1668.72 -> consumer to help process records are any
1671.679 -> functions showing errors or throttles in
1673.679 -> their metrics
1674.88 -> are you seeing a large increase in the
1677.039 -> incoming records or incoming bytes
1678.96 -> metrics in the stream
1680.24 -> that indicate a growth input in producer
1682.48 -> data
1683.76 -> instead of allowing a function to error
1685.679 -> out you could use try catch handling
1688.08 -> to log the arrow error log the records
1690.48 -> that cause the arrows
1691.76 -> and then return successfully this allows
1693.76 -> lambda to process the next batch
1696.64 -> you can also scale the lambda
1698 -> concurrency per shard by using the
1700 -> parallelization factor
1701.6 -> which i'll discuss in a few minutes
1704 -> increasing lambda memory can also
1705.6 -> increase the performance of lambda
1707.039 -> consumers
1708.08 -> since it increases the amount of virtual
1709.84 -> cpu and compute power
1712.159 -> available for processing
1715.76 -> the read provision throughput exceeded
1717.919 -> metric can warn you
1719.76 -> when you're reaching the five reads per
1721.919 -> seconds
1722.799 -> or two megabytes per second limit if you
1725.84 -> can remove subscribers
1727.279 -> this can alleviate the issue remember
1730 -> that kinesis data analytics
1731.84 -> and kinesis fire hose are also
1733.76 -> subscribers
1734.88 -> so you can remove these if they're not
1736.559 -> needed if you need all the subscribers
1739.44 -> on the stream
1740.24 -> one way to solve this is to use enhanced
1742.399 -> fan out
1743.36 -> this feature enables consumers to
1745.12 -> receive records from a stream
1746.96 -> with a throughput of up to two megabytes
1748.88 -> of data per second per shard
1751.44 -> this throughput is dedicated which means
1753.76 -> that consumers
1754.88 -> using enhanced fanout don't contend with
1757.12 -> other consumers
1758.08 -> that are receiving data from the stream
1760.64 -> kinesis pushes
1761.84 -> data records from the stream to
1763.679 -> consumers using enhanced fan
1766.08 -> out meaning these consumers don't need
1768.08 -> to poll for data
1769.84 -> to see how this works with lambda
1771.36 -> consumers see the compute blog post
1774.08 -> at s12d dot com forward slash enhanced
1777.039 -> fan out
1781.039 -> in this next section i'll talk about
1782.559 -> some features and configurations
1784.159 -> available for optimizing the performance
1786.32 -> of your kinesis data streams application
1790.159 -> lambda consumers can use enhanced fan
1792.48 -> out and http
1793.84 -> 2. to minimize latency and maximize
1796.88 -> read throughput you can create a
1798.72 -> datastream consumer with enhanced fanout
1801.44 -> stream consumers can get a dedicated
1803.919 -> connection to each shard
1805.44 -> that doesn't impact other applications
1807.44 -> reading from the stream
1809.44 -> the dedicated throughput can help if you
1811.36 -> have many applications
1812.88 -> reading the same data or if you're
1814.88 -> reprocessing a stream with large records
1817.6 -> kinesis pushes records to lambda over
1820.08 -> http 2
1821.52 -> which increases performance by up to 65
1823.76 -> percent
1826.559 -> standard consumers use a pull model over
1828.96 -> http
1830.399 -> whereas efo consumers use a push model
1833.2 -> over http
1835.679 -> a standard consumer with with five
1838 -> consumers
1839.039 -> would average 200 milliseconds of
1841.36 -> latency each
1842.32 -> it's up to one second for all five
1845.36 -> using enhanced fan out the consumers are
1847.36 -> completely independent and did not
1848.88 -> impact each other
1850.08 -> even with five consumers each consumer
1852.96 -> averages about 70 milliseconds of
1854.88 -> latency
1856.159 -> the read throughput is dedicated so this
1858.159 -> provides significantly faster throughput
1860.24 -> for many workloads
1861.76 -> but note there is an additional charge
1863.6 -> for using this feature
1867.2 -> by default there is one instance of a
1869.12 -> lambda function per
1870.399 -> shard in a stream you can increase the
1872.88 -> number of concurrent lambda functions
1874.799 -> that are processing a shard by changing
1876.96 -> the parallelization
1878.08 -> factor the batches maintain in-order
1880.96 -> processing per partition key
1883.039 -> and this feature is available for both
1885.2 -> kinesis data streams
1886.72 -> and for dynamodb streams
1892.159 -> the parallelization factor from one to
1894.32 -> two you can see in this diagram
1896.72 -> how each shard now has two instances of
1899.44 -> a consuming lambda function
1900.88 -> processing batches in parallel the
1903.6 -> result is that records are processed in
1905.519 -> half the
1906 -> time all things being equal you can also
1908.64 -> increase this value
1909.679 -> to a maximum of 10.
1914 -> kinesis has a per shard right limit of
1916.559 -> one megabyte per second
1918.32 -> or 1 000 messages a second if your data
1921.36 -> producers are creating many small
1923.12 -> messages
1924 -> you may reach the limit of messages
1926 -> while still being under the one megabyte
1928 -> per second limit
1929.6 -> having many small messages can lead to
1931.6 -> lower throughput per shard
1933.44 -> and it can increase the cost of a
1934.799 -> workload you can often resolve this
1937.44 -> issue
1938.08 -> by using aggregation to increase the
1940.159 -> payload size
1941.279 -> reduce the number of messages and
1943.12 -> improve the throughput
1945.2 -> there are two libraries commonly used to
1947.12 -> help with this the kinesis producer
1949.279 -> library
1949.919 -> kpl and the kinesis aggregation library
1953.6 -> the kpl provides a layer of abstraction
1955.919 -> for ingesting data
1957.44 -> and offers a synchronous and
1958.96 -> asynchronous interface
1961.039 -> it's recommended to use the asynchronous
1963.36 -> interface
1964.159 -> wherever possible the kpl handles
1967.2 -> batching and multi-threading
1968.96 -> and also emits metrics to cloud watch so
1971.519 -> you can monitor performance
1973.76 -> for java users the kinesis client
1976 -> library integrates seamlessly with the
1978.32 -> kpl
1979.12 -> and can help on the consumer side
1981.279 -> otherwise for consumers using other
1983.039 -> runtimes
1983.919 -> the kinesis aggregation library can help
1986 -> simplify the de-aggregation process if
1988.64 -> you use this technique
1991.84 -> for streams with low numbers of records
1993.76 -> you may find that the consuming lambda
1995.36 -> function
1996.08 -> is invoked with small batches which
1998.159 -> increases the processing cost per
2000 -> message
2001.36 -> if the latency sensitivity of the
2003.12 -> workload is less important for example
2005.519 -> in archiving workloads you can change
2007.919 -> this behavior
2008.799 -> to wait for more messages to arrive
2010.72 -> before invoking the lambda function
2014.399 -> by default lambda invokes your function
2016.72 -> as soon as records are available in the
2018.399 -> stream
2019.36 -> if the batch that lambda reads from the
2020.96 -> stream has only one record in it
2023.12 -> lambda sends only one record to the
2025.039 -> function to avoid this you can tell the
2027.36 -> event source to buffer records
2029.2 -> for up to five minutes by configuring a
2031.12 -> batch window
2032.96 -> before invoking the function lambda
2035.279 -> continues to read records from the
2036.88 -> stream until it's gathered a full batch
2038.799 -> or until the batch window expires
2042 -> this is an additional knob to tune the
2044.32 -> streaming trigger
2045.279 -> you can set a time to wait before
2047.12 -> triggering up to five minutes
2048.8 -> seven seconds the batch size is still
2051.44 -> respected
2052.399 -> and it will trigger on full batches
2053.919 -> before the batch window is up
2055.76 -> this works for both kinesis data streams
2058.159 -> and dynamo db streams triggers
2063.839 -> kinesis data analytics is the easiest
2065.919 -> way to transform and analyze streaming
2068.079 -> data in real time
2069.52 -> you can interactively query streaming
2071.44 -> data using standard sql
2073.359 -> and you can build apache flink
2074.96 -> applications using java
2077.119 -> python and scala you can also build
2080 -> apache beam applications using java to
2082.72 -> analyze data streams
2086.159 -> you can use kinesis data analytics for
2088.159 -> many use cases to process data
2090.079 -> continuously
2091.28 -> and derive insights in near real time
2094.639 -> there are three common use cases for
2096.639 -> kinesis data analytics
2099.04 -> streaming etl you can prepare data
2101.2 -> before loading
2102.32 -> into a data lake or data warehouse
2104.72 -> normalizing data
2106 -> with a specified schema and reducing or
2108.48 -> eliminating
2109.68 -> batch etl steps for continuous metric
2113.359 -> generation kda calculates statistics and
2116.48 -> trends over time
2117.76 -> you can use this to create real-time
2119.52 -> leader boards in games
2120.96 -> or measure sensor averages in iot
2123.52 -> devices over a rolling time window
2126.56 -> finally for responsive real-time metrics
2129.119 -> you send real-time
2130 -> alarms or notifications when certain
2132 -> metrics reach pre-defined
2133.76 -> thresholds or when your application
2135.76 -> detects anomalies
2137.76 -> in all cases the process is the same you
2140.079 -> initially capture data
2141.599 -> via kinesis data fire hose or kinesis
2143.839 -> data streams
2145.04 -> and then kinesis data analytics is the
2146.96 -> consumer for those streams
2148.96 -> the output is then sent to downstream
2150.88 -> consumers and tools for alerting
2152.96 -> visualization
2154 -> and distribution kinesis data analytics
2158.32 -> implements the ansi 2008 sql standard
2161.359 -> with extensions
2162.88 -> these extensions enable you to process
2165.359 -> streaming data using sql
2167.839 -> if you're building a sql application
2169.68 -> there are a couple of important concepts
2171.44 -> i want to briefly introduce
2173.359 -> the first is a stream you map a
2175.52 -> streaming source
2176.64 -> to an in-application stream that is
2178.96 -> created using a sql statement like this
2182.24 -> data continuously flows from the from
2184.4 -> the streaming source
2185.599 -> into the in application stream this
2188.4 -> stream works like a table
2190 -> that you can query using sql statements
2192.32 -> but it's called a stream because it
2193.68 -> represents continuous data flow
2196.48 -> you can have multiple writers insert
2198.56 -> data into an in-application stream
2201.44 -> and there can be multiple readers
2202.88 -> selecting from the stream
2205.04 -> think of this as an in-application
2206.88 -> stream that implements
2208.4 -> a published subscribe messaging paradigm
2210.88 -> in this paradigm
2212 -> data can be processed interpreted and
2214.24 -> forwarded by a cascade
2215.839 -> of streaming sql statements without
2217.599 -> having to be stored in a traditional
2219.359 -> relational database
2222.56 -> once you've created an in-application
2224.32 -> stream then you can pump data into it
2226.64 -> using a pump which is defined
2228.079 -> using a statement like this a pump is a
2230.72 -> continuous
2231.44 -> insert query that's running to insert
2233.839 -> data
2234.48 -> from an in application stream into
2236.64 -> another in application stream
2241.2 -> sql queries in your application code
2243.359 -> execute continuously
2245.119 -> over in application streams these
2247.839 -> streams represent
2248.8 -> unbounded data that flows continuously
2251.04 -> through your application
2252.8 -> to get results from this continuously
2254.8 -> updating input
2256.079 -> you bound queries using a window defined
2258.88 -> in terms of time
2260.8 -> these are also called windowed sql
2264.16 -> for a time-based windowed query you
2266.4 -> specify a time-based windowed size
2269.44 -> you can specify a query to process
2271.359 -> records in a tumbling window
2273.68 -> a sliding window or a stagger window
2276.16 -> depending on your application
2278.48 -> sliding windows aggregate data
2280.32 -> continuously
2281.599 -> using a fixed time or a row count
2283.839 -> interval
2285.28 -> tumbling windows aggregate data using
2287.76 -> distinct time-based windows
2289.68 -> these open and close at regular
2291.44 -> intervals such as every 15 minutes
2294.72 -> stagger window windows use keyed
2297.839 -> time-based windows
2299.119 -> these windows open as data arrives and
2301.68 -> the keys allow for multiple overlapping
2303.839 -> windows
2306.48 -> if you need to use tumbling windows in
2308.48 -> your aggregations there's a relatively
2310.48 -> new feature that can help
2312.48 -> lambda now supports streaming analytics
2314.64 -> calculations for kinesis
2316.96 -> this allows developers to calculate
2319.119 -> aggregates near real time
2321.04 -> and pass the state across multiple
2322.88 -> lambda invocations
2324.64 -> this provides an alternative way to
2326.48 -> build analytics solutions
2329.28 -> a tumbling window in lambda is a fixed
2331.839 -> size
2332.56 -> non-overlapping time interval of up to
2334.8 -> 15 minutes
2336.32 -> you specify the duration in the event
2338.24 -> source mapping between the stream
2340.079 -> and the lambda function when you apply a
2342.56 -> tumbling
2343.28 -> window to a stream items in the stream
2345.76 -> are grouped by window
2347.119 -> and sent to the processing lambda
2348.56 -> function the function
2350.56 -> returns a state value that is passed to
2352.64 -> the next tumbling window
2355.04 -> in the diagram shown a stream has four
2357.359 -> windows over 60 minutes
2359.599 -> the first window contains items one and
2362 -> five
2363.04 -> the function sums these and returns the
2365.44 -> value six
2367.119 -> the sum result is passed to the second
2369.119 -> tumbling window
2370.24 -> which adds 3 2 and 7. that brings the
2373.28 -> total to 18
2374.8 -> and that's passed to the third window
2376.88 -> and so on
2379.68 -> in practice tumbling windows and lambda
2381.839 -> look like this
2383.359 -> the incoming event is the same as before
2385.92 -> kinesis provides an array of records
2388.48 -> what's different is the handful of new
2390.72 -> attributes
2391.92 -> there's a start and end time to the
2394 -> window and the state attribute
2396.4 -> the state is initially empty the first
2398.88 -> invocation returns a value in the state
2401.28 -> in this case
2402.16 -> the item count and sales total when
2404.96 -> kinesis invokes the second lambda
2406.72 -> function
2407.359 -> it passes the state it received from the
2409.28 -> first
2410.48 -> when the final invocation occurs there
2412.56 -> is a new attribute
2413.599 -> in the event payload indicating that
2415.839 -> it's the last one
2417.52 -> you can then choose to durably persist
2419.44 -> the calculation result
2420.88 -> in s3 dynamodb efs
2424.079 -> or another downstream service
2428.96 -> when you're building data streaming
2430.4 -> solutions that use lambda consumers
2432.8 -> you can do everything in the aws
2434.72 -> management console
2436.319 -> but once you get familiar with building
2437.92 -> these applications
2439.359 -> it's often easier to move to an
2441.2 -> infrastructure as code solution
2443.76 -> the aws serverless application model
2446.16 -> also known as sam
2447.52 -> lets you express the lambda function and
2449.76 -> the event source mapping
2451.28 -> in code templates using the sam cli
2454.64 -> you can then deploy these templates into
2456.4 -> your aws account
2458.079 -> these can help create repeatable
2460.079 -> versionable deployments
2461.52 -> that you can share across teams
2465.2 -> here's this here's an example of a sam
2467.04 -> template that deploys a kinesis data
2469.04 -> stream
2469.76 -> a consumer a lambda function and the iam
2472.72 -> resources needed to run the application
2475.839 -> you can see and deploy the entire
2477.68 -> example in the url
2479.52 -> shown on the slide the first resource
2482.8 -> is the kinesi stream itself which is
2484.88 -> defined with a shard count of one
2488.24 -> the next resource defines the kinesis
2490.48 -> stream consumer
2492 -> it sets the stream arn that's the amazon
2494.4 -> resource name
2495.68 -> and the consumer name next the template
2499.52 -> defines the lambda function
2501.359 -> to consume the records from the kinesis
2503.599 -> stream
2505.04 -> the sam template specifies the code
2507.04 -> location run
2508.319 -> time and handler name the event source
2511.2 -> mapping references the application
2512.96 -> consumer for the kinesis stream
2515.04 -> it sets the siding permit position to
2516.96 -> latest it could have used the trim
2518.56 -> horizon instead
2519.92 -> and the batch size of 100 records
2522.96 -> the batch size specifies the number of
2524.8 -> records that the poller's
2526.319 -> batcher gathers before it involves the
2528.4 -> function
2531.04 -> finally the output section of the
2533.04 -> template provides references to the
2534.88 -> resources that have been
2536 -> created this entire sam template can be
2539.119 -> deployed
2539.76 -> from the command line using the sam cli
2545.28 -> we've reviewed some of the capabilities
2547.2 -> of the kinesis suite of services
2549.359 -> i want to leave you with a few general
2551.04 -> best practices that can help you operate
2553.52 -> production streaming workloads
2555.92 -> it's important to understand the choice
2557.68 -> between kinesis data fire hose
2559.92 -> and kinesis data streams fire hose is a
2563.28 -> fully managed
2564 -> and scalable service that requires no
2565.92 -> administration so if your streaming
2567.599 -> workload
2568.48 -> is storing data in one of its targets
2570.48 -> like s3 or redshift
2572.079 -> this is often the best choice data
2574.88 -> streams provides more flexibility
2576.88 -> but you must monitor the shard activity
2579.119 -> and understand
2580.079 -> how to scale up and down as needed
2583.28 -> architects frequently advise customers
2585.28 -> to design their streaming applications
2587.52 -> with your data consumers in mind you may
2590.16 -> have a variety of different types of
2592.16 -> consumer with different needs
2594 -> and working backwards from their
2595.44 -> requirements can help you build a
2597.28 -> solution
2597.92 -> that provides the most value if the
2600.8 -> payload received by lambda functions
2603.04 -> then you may receive data blobs with
2605.44 -> inconsistent data structures and varying
2607.599 -> attributes
2608.8 -> generally it's best to transform this
2610.56 -> raw data into
2611.92 -> aggregation friendly records that
2614.24 -> downstream consumers
2615.68 -> can work with more easily before getting
2619.28 -> to production we recommend becoming
2620.96 -> familiar with the mechanisms for
2622.56 -> monitoring and scaling streams ahead of
2624.4 -> time
2625.52 -> you can build cloudwatch dashboards
2627.2 -> containing many of the important metrics
2629.04 -> to monitor performance
2630.48 -> so you're ready before going into
2631.92 -> production
2633.52 -> similarly understanding the troubles of
2635.68 -> troubleshooting steps to tackle common
2637.599 -> streaming issues
2638.8 -> can also save significant time and i
2641.28 -> covered a few of those
2642.72 -> common ones earlier
2646.079 -> for more information head over to
2648.44 -> serverlessland.com
2649.76 -> where there are more resources blogs
2652 -> videos workshops and learning paths
2654.24 -> to help you learn more about developing
2656.56 -> serverless solutions on aws
2660 -> my name is james beswick and i'm a
2661.68 -> principal developer advocate
2663.2 -> here in the aws serverless team if you
2666.079 -> have any questions at all about
2667.359 -> streaming data or serverless technology
2669.359 -> in general
2670.16 -> feel free to contact me thanks very much
2672.319 -> for your time today

Source: https://www.youtube.com/watch?v=kSppyMr6sXA