Best practices to maximize scale and performance with Amazon DynamoDB - AWS Virtual Workshop

Best practices to maximize scale and performance with Amazon DynamoDB - AWS Virtual Workshop


Best practices to maximize scale and performance with Amazon DynamoDB - AWS Virtual Workshop

Amazon DynamoDB is a fully managed, multi-region, multi-active database that delivers single-digit millisecond performance at any scale and powers many of the world’s fastest growing businesses such as Dropbox, Disney+, Samsung, and Netflix. In this virtual workshop, AWS Senior Practice Manager, DynamoDB expert Rick Houlihan will go deep into a few of DynamoDB’s key features and share tips and best practices to help customers learn how to maximize scale and performance. Among these are using AWS Identity and Access Management (IAM) to restrict PartiQL from doing scans, using region tagging to facilitate global tables, and exporting to Amazon S3 for audit trail workflows that do not require low latency response or need to support ad hoc queries. Join Rick for this hands-on advanced technical session to explore each of these features in detail, with detailed examples.

Learning Objectives:
* Learn about the foundation of NoSQL databases.
* Dive deep into key features of Amazon DynamoDB, including global tables, PartiQL support, and export to S3.
* Learn tips and best practices to maximize key features for scale and performance. Subscribe to AWS Online Tech Talks On AWS:
https://www.youtube.com/@AWSOnlineTec

Follow Amazon Web Services:
Official Website: https://aws.amazon.com/what-is-aws
Twitch: https://twitch.tv/aws
Twitter: https://twitter.com/awsdevelopers
Facebook: https://facebook.com/amazonwebservices
Instagram: https://instagram.com/amazonwebservices

☁️ AWS Online Tech Talks cover a wide range of topics and expertise levels through technical deep dives, demos, customer examples, and live Q\u0026A with AWS experts. Builders can choose from bite-sized 15-minute sessions, insightful fireside chats, immersive virtual workshops, interactive office hours, or watch on-demand tech talks at your own pace. Join us to fuel your learning journey with AWS.

#AWS


Content

1.67 -> [Music]
7.359 -> welcome everybody and thanks for taking
9.2 -> the time out of your day to come join us
10.559 -> for this webinar we're going to be
12.08 -> talking today about amazon dynamodb some
14.4 -> of the key features and best practices
17.279 -> as well as some of the high level design
18.56 -> patterns and we're going to walk through
20.08 -> some of the technologies and uh
22.08 -> uh and actually demonstrate some of
23.6 -> these features hands-on that uh in in
26.16 -> live fire demos that you can actually
27.84 -> see this stuff we're talking about
29.359 -> actually working
30.48 -> my name is ray houlihan i'm a senior
32.239 -> practice manager for nosql services
34.88 -> at aws and i focus mostly on dynamodb
37.84 -> i'm sure
38.719 -> some of you are probably familiar with
39.92 -> my content others maybe not and that's
42.719 -> what we're here for right we're going to
43.84 -> talk today a little bit about a brief
45.44 -> history of data processing why are we
47.28 -> actually looking at no sql technologies
49.36 -> again for those of you who've heard this
50.719 -> before you know just right along for a
52.399 -> few minutes i guess a lot of folks
53.92 -> obviously haven't yet are new to this
55.84 -> content and we're getting into an
57.36 -> overview of dynamodb what is this what
59.199 -> is dynamodb uh talk a little bit about
61.76 -> the data modeling best practices
64.4 -> uh for nosql in general and what it
66.08 -> really means to manage your data is
67.52 -> normalized versus denormalized schema uh
70.159 -> talk a little bit about dynamodb in the
71.68 -> serverless ecosystem
73.439 -> and and and get into you know some of
75.2 -> the modeling and best practices for real
76.88 -> world applications and showing you some
78.479 -> actual real you know technology demos
80.799 -> and and how this stuff works right so
82.32 -> the first thing we want to talk about is
84 -> the history of data processing why do we
85.6 -> want to use this new technology and
87.52 -> really what it comes down to
89.439 -> when you look at database technology
90.88 -> over the years is there's been a series
92.079 -> of innovations that have been driven by
94 -> you know this thing we call data
95.2 -> pressure data pressure is the ability of
97.28 -> the system to process the amount of data
99.68 -> that we're trying to process at a
101.6 -> reasonable cost or reasonable time and
103.759 -> when we can't do one of those things or
105.439 -> the other we're going to invent
106.88 -> something that's it that's a technology
108.32 -> trigger and we've done this many times
109.759 -> through the years the first system that
111.6 -> we're actually born with the one between
113.439 -> our ears didn't last long we needed to
115.6 -> start writing things down storing
116.96 -> structured data in some way to be able
119.119 -> to build an enterprise build business on
121.04 -> top of and that worked really well
122.56 -> ledger accounting and just kind of
124.159 -> writing things down for many many years
126.64 -> until the 1880 u.s census came along and
129.039 -> a guy named herman hollerith was tasked
130.72 -> with collating all that data processing
133.04 -> everything and what it came down to was
134.959 -> it took him eight years of the 10-year
136.879 -> cycle for the census which was obviously
138.879 -> not going to work the next time around
140.319 -> so he invented the machine readable
141.76 -> punch card and the punch card sorting
143.76 -> machine and the era of modern data
145.76 -> processing was born so you know rapidly
147.76 -> we started develop new you know
149.36 -> technologies new services uh and new
152.48 -> systems to be able to support those
153.92 -> services things like paper tape magnetic
156.16 -> tape distributed block storage random
157.92 -> access file systems and the data sprawl
160.4 -> in the 60s started to drive a team at
163.519 -> ibm led by a guy named edgar codd into
165.68 -> the relation into development of the
167.12 -> relational database right system r was
168.959 -> the first system that came out did many
171.04 -> things right it made data more
172.4 -> consistent it was very difficult to
173.92 -> maintain all these copies and
175.36 -> denormalized data on disks at the time
178.239 -> and it reduced the sprawl of the data
180.08 -> the cost of the data
181.84 -> and the ability of the the enterprise to
184.64 -> be able to store more data by
185.92 -> deduplicating this data it was
187.36 -> tremendous value
189.12 -> and you know again 40 50 years ago the
190.959 -> most expensive resource in the data
192.4 -> center was storage and and so this was a
194.56 -> really effective technology and the
196.48 -> technology for maintaining the copies of
198.159 -> data we're not really available but
200.159 -> today fast forward and you know 50 years
202.879 -> and now the most expensive
204.799 -> uh systems in the data center the most
206.64 -> expensive cost in the data center is
208.159 -> really cpu right it's pennies for
209.76 -> gigabytes to store data we talk about
211.68 -> terabytes petabytes exabytes of data
213.76 -> today so it's not about storing the data
216.239 -> anymore it's about processing it and
218 -> that's where the cost comes in so when
219.84 -> you look at nosql it's about you know
222.4 -> basically processing data at scale now
224.72 -> when you want to use new technologies
226.879 -> and no sql isn't new technology you know
228.799 -> they don't work the same way and this is
230.4 -> one of the biggest problems i see teams
232 -> today they kind of get into this you
233.519 -> know idea of i have this big data
235.12 -> challenge you know my relational
236.64 -> databases are slow you know i'm kind of
238.64 -> sharding and partitioning my data to be
240.319 -> able to handle this scale and and i and
242.799 -> there's this thing called nosql that's
244.4 -> supposed to do all this stuff for me and
245.92 -> it's great let's go get it and they kind
247.36 -> of run down this path well you know the
249.2 -> reality is that the innovators have come
250.959 -> along and they've invented something
252.239 -> here's this data pressure that we have
254.239 -> this big data challenge and they've
256.32 -> they've
257.28 -> invented something but the skill set
258.799 -> isn't there yet so people kind of go
260.479 -> they use it they try to use it the same
262 -> way they use the old technology and they
263.68 -> have a miserable experience and this is
265.919 -> typical of the nosql kind of
268.56 -> path in almost every technology path as
270.96 -> it's introduced to market you can't use
273.04 -> it the same way you use the old
274.24 -> technologies you got to learn how to use
275.68 -> it first
276.88 -> as those skills kind of become commodity
278.639 -> in the market
280 -> the early majority lands you know
282 -> developers kind of become familiar with
283.52 -> how to use it they they
285.12 -> the experience becomes a better
286.56 -> experience right so if you find yourself
288.24 -> talking about how nosql is difficult to
290.4 -> use i i don't understand it well
293.52 -> that's good that you that you recognize
295.04 -> you don't understand it let's kind of
296.639 -> learn about it you didn't always
297.919 -> understand how to use that relational
299.36 -> database either right remember that at
301.52 -> some point uh you it was new to you so
304.4 -> let's get into the new technologies and
306 -> learn how they work we'll have a better
307.36 -> experience than trying to use these
309.199 -> things the same way we use the old ones
310.88 -> and and we'll explain a little bit about
312.4 -> what that means when we get into talking
314.08 -> about some of the best practices and and
316.16 -> things for nosql databases in general
318.8 -> so now why nosql
321.36 -> sql is really about
323.12 -> why no sql it's because sql is really
325.36 -> about that normalized relational view of
327.36 -> data you know that the data is you know
329.6 -> well structured for supporting any kind
331.6 -> of access pattern it's it's it's
333.199 -> agnostic to every pattern and what that
335.039 -> really means is it's really optimized
336.72 -> for none right i have to kind of join
338.479 -> indexes across tables and build these
340.24 -> views of data
341.68 -> you know and as such it kind of scales
343.52 -> vertically it works very well on a data
345.28 -> set that's stored in a single container
346.96 -> it works less
348.4 -> efficiently when data is kind of
349.6 -> distributed across multiple containers
351.28 -> and it's kind of good for those olap
353.28 -> workloads where i don't really
354.32 -> understand what the access patterns are
355.6 -> going to be right so that's kind of the
356.72 -> category we fit no that sql works well
359.199 -> in but you know no sql on the other hand
361.12 -> works really well for oltp applications
364 -> which are applications that have
365.52 -> well-defined access patterns if i
367.28 -> understand the access pattern then i can
368.88 -> kind of model the data to kind of
370.319 -> represent those instantiated views the
372.16 -> result sets of those queries more or
374.16 -> less and and be able to to build
376.8 -> hierarchical data storage structures
378.72 -> that are are more you know denormalized
382.16 -> they scale horizontally across multiple
384.479 -> nodes
385.44 -> and it's really built for you know
386.96 -> situations where i where the access
389.039 -> patterns are well understood now this is
390.8 -> great because most applications today
393.44 -> are pretty well understood patterns
395.6 -> right i mean this is what we're writing
397.039 -> code for because the code executes and
398.88 -> it doesn't ask a different question
400.08 -> every time it executes it asks the
401.759 -> actually the same questions every time
404.08 -> and so having a database that's
406 -> optimized to be able to provide the
407.759 -> result set for the questions that your
409.28 -> applications are asking is actually
410.88 -> quite efficient we'll get into looking
412.479 -> at what that really means when we talk
414.4 -> about amazon dynamodb it's a fully
416.8 -> managed nosql database right no sql
418.8 -> databases are great but scaling the
420.8 -> infrastructure behind them is not so
422.479 -> great it's not something that you want
424.24 -> to be involved with as a business it's
425.84 -> not really core
427.199 -> a core value that you want to develop
428.88 -> and this is really where i start working
430.24 -> with customers a lot is when they get
432.319 -> into the scale aspect of things now i
434.479 -> also work with a lot of customers that
435.84 -> are just getting started and and the
437.599 -> reason why they want to use it is
438.8 -> because again a fully managed system i
440.4 -> don't have to understand how to provide
441.919 -> five nines availability it's there out
443.599 -> of the box uh and for those of us that
445.84 -> tried to manage a five nines
447.12 -> availability service you know how hard
448.8 -> that is and dynamodb gives you that at
451.12 -> the click of a button and that's the
452.8 -> kind of functionality that that startups
454.56 -> want for mission critical services right
456.479 -> so again document wide column database
459.199 -> scales to any workload it's fast and
460.8 -> consistent a scale we have workloads
462.96 -> that execute in
464.72 -> you know tens of millions of
465.84 -> transactions per second and provide low
468.16 -> single digit millisecond latency we'll
469.84 -> show you some of that stuff in a minute
471.599 -> uh fine-grained access control over all
473.759 -> your data at the item level the table
475.84 -> level the the attribute level within the
477.68 -> items you can restrict processes to see
479.759 -> certain chunks of data that you want
481.28 -> them to and others to see other other
483.68 -> bits or the whole uh items and and it's
487.199 -> fully elastic and this is one of the
489.039 -> things that we'll talk about when we get
490.319 -> into the performance characteristics of
492.08 -> dynamo but
493.36 -> there's no other database that works
494.8 -> like this it's fully on demand if you
496.56 -> need it to be you get whatever you want
498.16 -> when you want it you pay for what you
499.919 -> use when you use it and you don't even
502.24 -> pay for storage right until you store
504.16 -> items once you delete the items then the
506.08 -> storage costs go away so there's no
508 -> pre-provisioning of resources with
509.599 -> dynamodb and it's what makes it really
511.28 -> nice for the serverless kind of paradigm
513.36 -> that event driven programming model
515.519 -> that's becoming so popular today
517.68 -> now when you look at nosql databases we
519.919 -> talked a little bit about how they work
521.2 -> differently
522.64 -> dynamodb works the same way as all the
524.399 -> sql databases and that's one thing that
526.32 -> you can argue back and forth with other
527.839 -> people but the modeling of the data is
529.44 -> always the same no matter what the nosql
531.12 -> database platform is there's going to be
532.959 -> some sort of collection of objects right
534.56 -> there's in dynamodb we call these things
536.48 -> a table
537.6 -> tables have items uh items have
539.92 -> attributes and not all the items on the
542 -> table have to have the same attribute
543.6 -> okay so this is one of the
545.44 -> differentiators between a nosql database
547.76 -> and relational databases the the items
549.519 -> on the table they all
550.959 -> they're like the rows from all your
552.399 -> tables you push them you put them all
553.92 -> into one place in in mongodb they call
555.92 -> this a collection in cassandra it's
557.92 -> called a key space dynamodb calls it a
560.64 -> table regardless you're pushing items
563.04 -> into an object into a repository and all
565.92 -> of those objects have to have a unique
568 -> attribute to identify what this object
570 -> is that's the partition key in dynamodb
572.32 -> it's under bar id in mongodb or
574.32 -> documentdb and again it's a partition
576.8 -> key in cassandra
578.32 -> but this uniquely identifies that item
580.32 -> within the table so if i have a
581.6 -> partition key only table
583.6 -> in dynamodb or i have a standard
585.839 -> collection with no indexes in in mongodb
588.399 -> then then this is that key value access
590.399 -> pattern everyone talks about nosql is
592.16 -> great for and it is great for the key
593.839 -> value access pattern but the reality is
596.24 -> applications have a lot more complex
597.92 -> access patterns than that and if you
599.68 -> want to try and push all the relational
601.12 -> data that you have into these big blobs
602.88 -> to describe can be described as key
604.48 -> value data structures it's a very
606.24 -> inefficient way to model your data we'll
607.76 -> get into that in a minute but the
609.519 -> reality here is i want to create
610.88 -> collections of these objects that are
612.24 -> interesting to my application and i want
614.079 -> to treat these things more or less like
615.6 -> the rows of the tables from my
616.959 -> relational databases right you don't
618.64 -> query all the rows that are related to
620.32 -> each other all the time you query
621.92 -> subsets of those rows right you work
623.68 -> with little chunks of the relational
625.519 -> hierarchies that represent your data and
627.68 -> we kind of want to do the same thing
628.8 -> with nosql databases right to be able to
630.8 -> work with the data efficiently we got to
632.72 -> be able to get the data in and out kind
634.72 -> of the same way
636.16 -> if i'm pulling big blobs of data to
637.92 -> update small integers i'm burning a lot
639.68 -> of iops to do that and that's a very
641.68 -> common uh uh
643.6 -> bat you know
645.04 -> anti-pattern in nosql so when you hear
647.36 -> people talk about i'm using this
648.56 -> database and such and such because it
650 -> has large document support for like 16
651.92 -> megabytes you got to look at them and
653.2 -> say well why are you doing that you know
654.64 -> i mean you shouldn't be storing data in
656.56 -> big giant documents like that and if you
658.24 -> are then let's let's talk about object
660.64 -> stores like s3 for that data because you
662.48 -> shouldn't be uh
664.079 -> anyways the anti-pattern is using the
666.079 -> large documents anyways what we want is
668.32 -> to create collections of objects that
670 -> are interesting the application so we
672 -> have partition keys in
673.76 -> dynamodb we can create a additional key
677.279 -> a compound key called a a sort key which
680.56 -> gives us the ability to create
681.6 -> collections of objects within these
682.959 -> partitions right now these partitions
684.72 -> become
686.48 -> more or less groups of items that are
689.279 -> interesting to the application in
690.959 -> mongodb i'd start adding indexes to do
693.04 -> this but
694.079 -> in other nosql databases like cassandra
696.32 -> we also have the ability to add sort
697.92 -> keys
698.959 -> some even have much more complex or key
701.44 -> structures than dynamodb but regardless
703.6 -> what we're trying to do is create a
704.88 -> collection of objects and so if you
706.48 -> think about it we can start to model
708.32 -> these one to many and many many
710.079 -> relationships by creating you know
711.839 -> containers full of things if i have
713.68 -> let's say a partition key which is the
715.92 -> customer id and i have a sort key which
718.48 -> might be a customer's interaction date
721.04 -> concatenated by the interaction id
723.839 -> you know customers do what customers
725.2 -> make orders they they make payments they
727.6 -> they get shipments
729.04 -> they make returns they open tickets
730.959 -> right but i want sometimes customer logs
732.959 -> into a portal and says you know what's
734.32 -> my current you know state of my
736.32 -> interactions right with my with my
738.16 -> service i want to get my you know
739.839 -> landing page for the customer portal
742.24 -> i can query this table now and say give
744.079 -> me everything for this partition key for
746 -> customer id x with the sort key greater
748.48 -> than date y and it brings back
750.399 -> everything right all of those rows from
752.399 -> all of those tables all in one query and
754.72 -> this is where nosql databases start to
757.12 -> shine
758.56 -> against
759.6 -> the the relational database by creating
761.279 -> these grouped collections either on the
763.04 -> table or on indexes right and then we
764.88 -> can start to use range queries on those
766.959 -> sort key operators to get filtered
769.519 -> objects from those collections
771.44 -> essentially
772.399 -> recreating the relational models and
774.24 -> we'll talk about this again in more
775.6 -> detail
778.079 -> that you that you actually model in your
779.68 -> relational databases so that's what
781.6 -> we'll do on the table in a nosql
783.04 -> database we'll add additional indexes
785.68 -> mongodb uses indexes for almost all of
787.839 -> this stuff but again it's always the
789.44 -> same thing i want to create you know
791.76 -> decorate these objects with attributes
793.76 -> that will i will group using indexes or
795.76 -> partition key structures in interesting
797.92 -> ways right in dynamodb we we support
800.8 -> secondary indexes on those attributes to
802.88 -> essentially create additional partitions
804.959 -> or collections of objects on secondary
807.04 -> tables right derivative tables you can
808.8 -> think of these things like that are
810.24 -> maintained when you update the objects
812.399 -> on the table all right so you make a
814.48 -> right to the table if you have indexed
816.32 -> attributes on that item it'll be
818.16 -> projected onto these uh
820.16 -> secondary indexes and you can choose how
822.079 -> much of that item you actually want to
823.519 -> project and sometimes your access
825.36 -> pattern doesn't need all the object data
827.12 -> it only needs the keys to identify
828.8 -> certain conditions of these items maybe
831.04 -> you generate a list of items that have a
832.72 -> key condition
834 -> that violates a particular state
836.639 -> and then i'm going to go get those
837.839 -> objects one by one and process them or
840.16 -> what not it could be the access pattern
842.32 -> needs all of the attributes it could be
843.839 -> it only needs some of the attributes
845.36 -> again it depends on how much you need to
848.079 -> uh
848.88 -> get from the access pattern that you're
850.639 -> indexing for and that's a good way to
852.8 -> control your cost because you got to
854.079 -> remember these indexes just like all you
856.24 -> know databases when you write to the
857.6 -> table you write to the index in dynamodb
859.519 -> you project additional data and that
861.68 -> adds to the cost right so if i project
864.639 -> all i'm essentially doubling my rcu cost
866.88 -> every time i update the item so just
868.48 -> remember that that when when you manage
870.8 -> your projections you want you manage you
872.399 -> want to project the data you need
874.88 -> to your gsis
876.72 -> scaling dynamodb is just like all nosql
878.959 -> databases
880.24 -> except that it actually works
882 -> and douglas adams if you're familiar
883.6 -> with hitchhiker's guide to galaxy one of
885.04 -> my favorite authors i love this quote
887.199 -> right
888.16 -> we all want stuff that works we get
889.6 -> technology instead
891.04 -> but dynamodb is great technology right
892.88 -> and nosql databases in general they all
894.959 -> scale the same way dynamodb gives you a
897.68 -> fully managed service and a fully
899.199 -> distributed backplane service that
901.44 -> that does this for you
903.04 -> but essentially what we're going to do
904.32 -> is we're going to position these items
905.68 -> across some arbitrary key space
907.839 -> in dynamodb we use the partition key
909.6 -> attribute to do that we're going to hash
911.04 -> the value within that attribute and
912.48 -> that's going to generate some you know
914 -> hash value that we that we lay out
916.32 -> across this logical key space and as we
918.399 -> scale the system what i'll do is chop
920.16 -> that key space up and start to
922 -> distribute it out across multiple nodes
924.16 -> now when i query the key space i'm going
926 -> to ask for a partition key equality
928.639 -> condition right and that tells me
930.48 -> exactly which one of these partitions i
932.24 -> need to go to
933.44 -> you know or i'm going to have to scan
934.72 -> the table to find those items right so
936.8 -> this is how nosql databases maintain
938.8 -> that kind of fast and consistent
940.16 -> performance at any scale
942.079 -> is by you know essentially chunking up
944.24 -> the the workload so to speak across an
946.56 -> arbitrary number of physical storage
949.44 -> devices dynamodb is no different and one
952.32 -> of the things dynamodb does for you
953.759 -> though is it manages all of that for you
955.6 -> you don't have to you don't have to
956.88 -> create shards clusters replica sets all
959.759 -> you have to do is tell us how much
960.959 -> capacity you need and we manage that
963.199 -> under the hoods for you so if you look
964.959 -> at you know some of the some of the kind
967.04 -> of performance characteristics of a
968.72 -> dynamodb table
970.56 -> you know we have
971.839 -> a billing mode called auto scaling which
973.759 -> gives us that
974.959 -> kind of just in time capacity
976.959 -> provisioning for your standard workloads
979.279 -> right most workloads kind of follow the
980.959 -> sun or you know they have some sort of
982.639 -> curve during the day in which the demand
984.48 -> will increase gradually to a peak and
986.24 -> then kind of ebb out to a valley and
988.8 -> and when you have those types of
990.16 -> workloads this is the net effect of a
992.16 -> system like dynamodb over a legacy
994.16 -> technology legacy nosql provider or a
997.199 -> relational database where i have to kind
999.279 -> of provision for that peak you know so
1001.04 -> if you think about this workload that
1002.56 -> escalates over time and eventually hits
1005.12 -> a high peak and maybe ebbs off
1007.759 -> you know in a relational database
1009.36 -> platform all that area underneath the
1010.88 -> white line or the red line there is just
1012.88 -> wasted dollars that's going that's you
1014.88 -> know hook it up to hook the fire hose up
1016.72 -> to my wallet and let it go uh with
1019.04 -> dynamodb much tighter uh provisioning
1021.759 -> allocation of course because we can kind
1023.68 -> of see the workload scaling we have
1025.439 -> cloud watch metrics we'll react to
1027.76 -> we have some really nice tools out there
1029.36 -> for you to look at
1031.12 -> that one of my fellow essays rob
1033.12 -> mccauley actually demonstrated recently
1035.679 -> and we'll show you in a second when we
1036.959 -> get into the workshop
1038.4 -> on how to kind of look at those those
1040.24 -> auto scaling
1041.52 -> configurations but regardless it's about
1044 -> getting
1044.88 -> using what you need when you need it and
1046.88 -> and that's where dynamodb just shines
1048.559 -> over any other database you don't get
1049.919 -> that the other thing dynamodb gives you
1051.84 -> is performance scale things that you
1054 -> would are counter-intuitive as we
1055.6 -> increase the workload against the
1057.28 -> dynamodb database we're going to
1058.72 -> actually perform better and that's what
1060.08 -> this is a synthetic workload that's what
1061.76 -> this demonstrates is that actually as we
1064.08 -> peak the workload here and reads and
1065.84 -> writes you know i don't know which ones
1067.6 -> which blues reads uh orange's rights and
1070.4 -> we're actually looking at the get
1071.52 -> latency here and we're seeing how that
1073.039 -> drops as we start to peek into the
1074.64 -> millions of transactions per minute per
1076.88 -> second or over a million per second here
1079.919 -> and the reason why that happens is
1081.28 -> because dynamodb is a fully distributed
1083.679 -> backplane service and and we have you
1086.08 -> know millions of request routers out
1087.6 -> there i don't know millions thousands of
1088.88 -> request routers in a region for sure but
1090.88 -> as you start to hit the table with
1092.24 -> millions of requests per second all of
1094.559 -> those request routers become aware of
1096.08 -> your configuration data for your table
1097.919 -> your security iem permissions there's a
1099.76 -> very short-lived cache you know they'll
1101.12 -> look that data up every second or two
1102.88 -> just to make sure it's not too stale but
1104.32 -> if you're hitting me with millions of
1105.44 -> requests per second i'm not having to go
1106.96 -> look it up every time
1108.64 -> and this starts to
1110.64 -> have an impact right the impact is lower
1112.48 -> latency overall for all your requests
1114.24 -> and you can see that happening here as
1115.52 -> we drop from the four millisecond range
1117.36 -> down into the two 2.5 millisecond range
1120.48 -> as we hit millions of reads per second
1122.559 -> this is just not something you expect
1124.24 -> from a regular database a regular
1125.679 -> database as we start to peek out the
1127.2 -> workload 90 95 cpu you're going to see
1130 -> the opposite right you'll see the
1131.28 -> latency spike you'll see the server
1133.039 -> start to go sideways eventually it just
1134.72 -> goes offline
1136.48 -> and dynamodb doesn't actually go offline
1139.039 -> when you when you when you actually ask
1141.039 -> for more capacity than you have
1142.559 -> provisioned in dynamodb dynamodb has a
1145.28 -> this thing we call a burst bucket which
1146.72 -> is five minutes of unused capacity and
1148.799 -> this is a snapchat it's a well-known
1150.48 -> dynamodb customer super bowl sunday this
1152.64 -> is 2019 super bowl uh you know they hit
1155.6 -> about 13 or 12 11 12 million requests
1158.08 -> per second i think they they peaked out
1159.76 -> somewhere around there
1161.12 -> the provision capacity at the during the
1163.039 -> day was about 5.5 million steady because
1165.76 -> that was the floor of their auto scaling
1167.36 -> algorithm we typically recommend
1168.88 -> customers do this on event days they
1170.88 -> expected a lot of traffic if there's a
1172.72 -> big event day like a prime day a super
1174.64 -> bowl something like that then go ahead
1176.48 -> and set your autoscaling floor up to you
1178.32 -> what you would expect the daily peak to
1179.919 -> be right through the event and then drop
1181.919 -> it back down to your normal operating
1183.679 -> levels you know there's no reason to to
1185.44 -> have to challenge the system
1187.36 -> if you know that it's coming and maybe
1189.2 -> set your ceilings up above that so
1190.799 -> you'll scale up if you get too much
1192.24 -> traffic in this case
1193.679 -> you can see what happened the table just
1195.12 -> flew up above its auto scaling floor
1197.919 -> but for the five minutes proceeding it
1199.52 -> wasn't using its full table allocation
1201.44 -> so you know we'll take all that unused
1203.28 -> capacity and apply it to your spikes to
1205.52 -> your peak loads to keep your system
1207.36 -> running even though you're asking for
1209.12 -> you know a transient spike in workload
1211.52 -> that exceeds your provision throughput
1214 -> what happens with a relational database
1215.6 -> when you do this right it's dead it's
1217.6 -> going to be on the floor it might
1218.72 -> actually be sideways and need life
1220.88 -> support at that point when you run it to
1222.72 -> this type of workload who knows what
1224.24 -> happens right when those types of
1225.76 -> systems run out of
1227.44 -> of infrastructure
1229.36 -> capacity dynamodb has more capacity than
1231.679 -> you need whenever
1233.2 -> so there's nothing to worry about that
1234.72 -> now on demand that's another workload uh
1237.2 -> capacity mode that you can look at for
1238.799 -> your workloads
1240.08 -> this gives us a you know kind of uh
1243.52 -> you know whatever you need when you need
1245.039 -> it within some limits you'll always be
1246.559 -> able to allocate at least twice as much
1248.24 -> throughput as you've previously peaked
1249.919 -> to on the table but you're going to get
1251.2 -> it instantaneously right auto scaling
1252.96 -> takes you know 10 to 12 minutes got to
1255.36 -> read those cloud watch metrics we've got
1257.44 -> to respond and react
1259.12 -> to what's happening
1260.72 -> and then apply new allocations to the
1262.559 -> table so on demand happens right away
1264.159 -> but it's more expensive right so uh it's
1266.4 -> good for workloads that are very
1267.6 -> transient right spiky workloads where
1269.52 -> you don't really know when it's coming
1270.72 -> but you need to have the capacity when
1272.08 -> it's there we have lots of customers
1273.6 -> that had you know we're provisioning for
1275.039 -> that peak workload because they couldn't
1276.799 -> you know throttle anything while they
1278.559 -> waited uh so they were paying a lot uh
1281.919 -> and that's not something you have to do
1283.2 -> anymore with on demand so depending on
1284.72 -> the workload can make a lot of sense for
1286.32 -> you
1287.36 -> and i always recommend people kind of
1288.72 -> start there and and see what it looks
1290.799 -> like after a week or so after a couple
1292.559 -> weeks uh it's and then you can use that
1294.559 -> that previous kind of those previous
1296.72 -> metrics from cloudwatch to configure
1298.88 -> your auto scaling
1300.48 -> the other thing you get with dynamodb we
1302.159 -> mentioned a little earlier is five nines
1303.919 -> availability right out of the box you
1305.44 -> know if you drop a the dynamodb table
1307.44 -> into a single region you get four knives
1309.039 -> availability if you click the button
1310.88 -> enable global tables and replicate your
1312.72 -> data to another region we're going to
1314 -> give you five nines availability and
1315.28 -> that's an sla guarantee you know i know
1317.52 -> a lot of services out there and i've
1318.88 -> actually participated in all these
1320.159 -> services over the years where i've had
1321.44 -> the question you know do we really have
1323.6 -> five nines even though you know of
1325.28 -> course the the rfis and rfps that we're
1328.159 -> responding to are all saying you know
1330.08 -> you must be five nines what do the
1331.52 -> vendors say they all say we're five
1332.96 -> nines are they five nines you know how
1334.72 -> do they measure five nines we measure it
1336.88 -> it's up there on the console for you uh
1339.2 -> and it's an sla guarantee so you know
1341.6 -> those are and those service credits are
1342.88 -> automatic and so it's not something
1344.48 -> that's you know we've had to we have to
1346.159 -> you know dole out you know uh it's
1348.24 -> certainly you know we maintain this
1349.679 -> service as a five nine service
1352.4 -> all right so before we get into the
1353.919 -> demos i'm gonna talk a little bit about
1355.76 -> data modeling
1357.36 -> uh in nosql we talked about relational
1359.6 -> data a bit and how you lay that data out
1361.76 -> into the
1362.72 -> uh into the tables but you know it
1364.72 -> really is about relational data when we
1366.4 -> talk about the data that your
1367.44 -> applications are using there is not an
1369.039 -> application service that i've ever built
1370.64 -> that i haven't you know hasn't you know
1372.24 -> involved using relational data it
1374.4 -> doesn't matter if it's social networking
1375.84 -> i.t systems management document
1377.52 -> management any anything you're trying to
1380.24 -> develop for the data has relationships
1382.799 -> if we can't model those relationships in
1384.559 -> the database then the database is kind
1386.08 -> of useless it really is and and when you
1388.08 -> start looking at nosql databases it's
1389.84 -> all about how do i model those
1390.96 -> relationships in the data we explained a
1392.64 -> little bit about you know this this
1393.919 -> indexing and we'll talk about and
1396.48 -> demonstrate how powerful that can be
1398.32 -> when we just break down a simple product
1399.919 -> catalog right this is uh kind of that
1401.679 -> relational data structure we're very
1402.96 -> familiar with we have the one-to-one
1404.72 -> relationship between products books
1406.48 -> albums videos one-to-many between albums
1408.4 -> and tracks many to many between videos
1410.4 -> and actors and this gives us you know
1412.799 -> the ability to run any query you pull up
1414.88 -> any list of products using you know but
1417.44 -> multiple queries uh in a nosql database
1420.88 -> maybe a different approach maybe a more
1422.799 -> naive approach i wouldn't necessarily
1424.24 -> recommend doing this but this is
1425.84 -> actually for this data structure
1427.039 -> probably not such a bad idea is to
1429.52 -> create these kind of hierarchical data
1431.12 -> structures as documents
1432.96 -> um not necessarily you know individual
1435.039 -> rows we'll look at what happens when we
1436.4 -> treat all this data as individual rows
1437.84 -> in a second
1438.96 -> but this is obviously a better data
1440.559 -> structure to retrieve all my products i
1442.24 -> mean for that access pattern or get any
1443.84 -> given product because i always need the
1445.36 -> hierarchy of data so i would create
1447.039 -> these types of hierarchical documents
1448.48 -> and say get me this product by id give
1450.32 -> me all the products just scan the table
1452.32 -> i don't have to write queries what
1454 -> happens on the tables when we load the
1455.76 -> data in and look at the cut the time
1457.36 -> complexity of those queries it becomes
1458.799 -> apparent why no sql databases start to
1460.799 -> shine right so this is treating all the
1462.4 -> data separate rows we're gonna put all
1463.84 -> the data on different tables and we're
1465.6 -> gonna start to look at that time
1466.64 -> complexity you can start to see as we
1468.64 -> join more and more tables into these
1470.64 -> queries the time complexity starts to
1472.4 -> increase
1473.36 -> exponentially right and this is not
1475.679 -> something that gets better i mean this
1477.279 -> sql is pretty rudimentary stuff if you
1479.6 -> get into production databases on a lot
1481.36 -> of complex schemas you know they're
1482.799 -> going to look a lot worse than this and
1484.48 -> it's and on large data sets that really
1486.48 -> starts to chew you up right now let's
1488.64 -> take all those rows and we're going to
1490.32 -> stick them in again another naive
1491.679 -> approach here i wouldn't necessarily
1493.279 -> recommend modeling your data this way
1494.72 -> but to emphasize the point we're just
1496.159 -> going to take all those rows and stick
1497.44 -> them on the same table
1499.039 -> what ends up happening here is a parent
1500.64 -> now all of a sudden we have partitions
1502.159 -> partitions are groupings of objects you
1503.76 -> know right the one-to-one joins i just
1505.279 -> kind of created one object because
1506.799 -> they're the same they're very small
1508.64 -> objects a single item is not a big deal
1511.12 -> as we start to you know look at these
1512.88 -> one-to-many joints now hey guess what
1515.279 -> it's pre-joined all those rows exist in
1517.12 -> the same table you know if these data
1518.64 -> these are all small objects but if they
1520.159 -> were different objects with different
1521.44 -> access patterns you can start to see how
1522.96 -> the efficiency drives here if i need to
1524.559 -> update a single song item i don't have
1526.159 -> to update the entire album i only update
1528.64 -> the song item right within the album and
1531.039 -> this gives me a lot of flexibility when
1532.64 -> i'm accessing the data right but i mean
1534.32 -> for the bulk access pattern and get me
1536 -> this the album there's no join right the
1538.559 -> data is kind of pre-joined it's all on
1540.559 -> the same index essentially that's what
1542.08 -> joint does right it joins indexes if the
1544 -> index if you're joining on attributes
1545.84 -> that aren't indexed then and that's
1547.76 -> probably not a very good query so if i
1549.84 -> put all the objects on the same index i
1551.76 -> get a lot of flexibility right and so
1553.279 -> this is demonstrating that right select
1554.88 -> the movie by movie title this is our
1556.88 -> many to many if you look at that kind of
1558.64 -> graph relationship that a many many
1560.24 -> construct gives us with that mapping
1561.84 -> table those are kind of the edges i kind
1563.919 -> of loaded the mapping table into the
1565.919 -> movie partition and on those mapping
1568.559 -> edges i added a little bit of the data
1570.559 -> from the actors row their their their
1573.2 -> gender their birth date right things
1575.2 -> that are kind of you know that that's
1576.64 -> data that doesn't change right
1578.88 -> i don't have to worry about
1580.64 -> uh you know that data updating
1582 -> frequently and and you know even if the
1583.76 -> data updates intermittently then we've
1585.36 -> got some systems to be able to handle
1586.96 -> that now streams and lambda can
1588.32 -> guarantee those updates but this is that
1590.559 -> a kind of a directed graph right if i
1592.799 -> look at the movie it knows about its
1594.159 -> actors the actor doesn't know much about
1595.679 -> the movie
1596.96 -> but if i go and i flip the keys on the
1599.44 -> table into an index right so now the
1601.279 -> index has a
1602.88 -> partition key which is actually the sort
1604.4 -> key of the table and
1605.84 -> and the sort key is the is the partition
1607.6 -> key from the table
1609.2 -> and kind of see how we can see a whole
1610.72 -> bunch of different relationships
1612 -> expressed here like all the i want to
1614 -> get all the books for a given author i
1615.52 -> want to get you know all this all the
1617.039 -> albums the song that has been on right i
1618.96 -> mean i was i want to get you know hey
1620.96 -> all the all the
1622.72 -> movies that an actor's been in this is
1624.32 -> the other side of the many to many and i
1626 -> you know i haven't denormalized any of
1627.52 -> the movie data here but you easily could
1630.08 -> you know depending on your access
1631.36 -> patterns uh and if that's like imdb i
1634.159 -> might want to see that i might want to
1635.36 -> see hey here's the movies he's been in
1636.799 -> the roles he's played the summary of the
1638.32 -> movie and i'll click here for more right
1640.88 -> and so this is kind of traversing the
1642.72 -> graph right if i want to go and see all
1644.64 -> the movies that directors you know
1646.399 -> directed i can see all the albums that a
1648.799 -> musician has produced
1650.64 -> and again as i start to extend this
1652.48 -> concept into the extended attributes
1654.399 -> right these things can have whole items
1656.24 -> can have state give me everything a
1657.52 -> given state can be every given state in
1659.2 -> a given you know
1661.279 -> area or whatnot you can start to see how
1663.679 -> these relational access patterns can be
1666.24 -> expressed without having to
1668.559 -> denormalize the data very much right we
1670.72 -> get into many many relationships
1672.24 -> depending on the nature of the
1673.2 -> relationship we probably want to do a
1674.48 -> little bit of denormalization but again
1677.2 -> if you have things like you know a
1679.6 -> standard access pattern standard
1681.279 -> relational queries we can absolutely you
1683.6 -> know express that those relationships in
1685.36 -> a nosql database it's kind of like there
1687.2 -> is no such thing as non-relational data
1689.2 -> that's what i like to say if you hear me
1690.799 -> if you've kind of listened to me for a
1691.919 -> while you know i don't say that much
1694.48 -> all right let's talk a lot about complex
1696.159 -> queries complex queries of things like
1698.24 -> counts sums aggregations the types of
1700.72 -> things that you know are hard to do with
1702.96 -> nosql right because they're kind of
1704.32 -> computed on the fly right and one of the
1706.399 -> things we learned at amazon we started
1708 -> doing these types of queries right give
1709.36 -> me all the downloads for a given song
1711.44 -> well i don't want to go count all the
1712.72 -> download events every time someone comes
1714.48 -> in and says give me the download you
1715.84 -> know somebody goes to their ui and it
1717.279 -> shows a counter for downloads for a
1718.72 -> given you know track and amazon music of
1720.72 -> course not so we have some summary data
1722.799 -> that gets me it gets kind of you know
1724.559 -> scanned and and and compiled and then
1727.2 -> updated on a regular interval and that's
1729.12 -> what you see and that's kind of the same
1730.799 -> way we do things with dynamodb with
1733.36 -> dynamodb we have a built-in change data
1735.52 -> capture pipeline that we can uh you know
1737.919 -> automate these types of activities and
1740.08 -> that's what a lot of people do and
1741.679 -> there's a lot of really good use cases
1743.12 -> for this all the all the data events all
1744.88 -> the writes the updates the inserts the
1746.72 -> deletes any type of data change event
1749.52 -> hits dynamodb stream triggers a lambda
1751.919 -> function and we can do those things like
1753.919 -> maintain
1754.96 -> you know summary aggregation reports
1757.36 -> just right back to it to an item on the
1759.36 -> table that contains all those summary
1760.88 -> metrics now every time someone wants to
1762.399 -> come in and say hey you know what's my
1764.399 -> you know last 30 day running total for x
1767.84 -> that's what your lambda maintains so i
1769.52 -> just select the item i don't have to
1771.039 -> compute the result this is a really
1773.36 -> really fantastic way to do this at the
1775.279 -> same time we can start pushing that data
1777.36 -> out to third-party systems or external
1780.159 -> systems like elasticsearch for you know
1782.399 -> higher level indexing functions if i
1783.919 -> need full text geospatial or index
1785.76 -> intersections things like this i can
1787.279 -> push subsets of my data into
1788.559 -> elasticsearch and to support those
1790.559 -> access patterns
1792.08 -> we can put the data into kinesis fire
1793.679 -> hose roll it up into parquet files and
1795.44 -> shove it you know into s3 this gives us
1797.76 -> the ability to query the data from
1799.12 -> athena i like to do this a lot with
1800.88 -> these running aggregation workflows as
1802.64 -> i'd like to kind of maintain these
1803.84 -> running aggregations and then at the
1805.679 -> same time you know have this audit trail
1807.679 -> event flow because i can always query
1809.52 -> the athena
1810.64 -> you know data at the end of the day and
1812.48 -> i can make sure that my summary
1813.679 -> aggregations are completely accurate
1815.52 -> because one thing about lambda is it's
1816.88 -> guaranteed at least once execution so
1818.96 -> the container does fail the lambda
1820.48 -> function might you know re-execute on
1822.159 -> the same data once or you know very
1824.159 -> infrequently but it can happen more than
1826.32 -> once so it's nice to have the audit
1828.64 -> trail because i could just write the
1829.84 -> queries for the summary of the data the
1831.84 -> summary against the data let that
1833.919 -> execute at the end of the day it's going
1835.679 -> to take a while to run against that s3
1837.44 -> data but when it comes back i can just
1839.6 -> you know sanity check my summary
1841.52 -> aggregation reports to make sure that
1843.12 -> nothing's you know off if i if i need a
1845.52 -> fine degree of accuracy that's a good
1847.2 -> pattern
1848.399 -> notify third parties of change execute
1851.36 -> downstream workflows this is again it's
1853.52 -> code so you can do anything with code
1855.279 -> and it's all up to you
1856.96 -> uh dynabody streams and lambda is
1859.039 -> completely managed behind the scenes
1860.64 -> it's nothing you need to worry about as
1862.24 -> your table expands so do the stream
1864.08 -> processors
1865.279 -> as the workload on the table expands the
1867.2 -> stream processors will scale up and down
1869.12 -> behind the table but there is one thing
1871.2 -> you need to worry about is the uh
1874.08 -> lambda execution timeline or or timing
1877.6 -> right if you if you if the work that
1879.6 -> you're doing within your land of
1880.72 -> functions is too much then they start
1882.559 -> falling behind the stream and if the
1884.08 -> stream activity is constant and steady
1886.159 -> then eventually you're gonna run out of
1888.399 -> stream buffer and after 24 hours you
1890.96 -> start to lose you know data off the edge
1893.039 -> of the stream so you know those those
1894.72 -> like what they call iterator age we want
1897.2 -> to make sure that the iterator age and
1898.72 -> your lambda functions is is relatively
1901.039 -> low or manageable and that if it's
1903.279 -> increasing it's through bursts and not
1905.039 -> through straight you know steady
1906.399 -> activity now you can increase the
1908.279 -> parallelization of your lambda functions
1911.2 -> so that if i if i have a function that's
1913.039 -> a little heavy it can i can run more
1915.2 -> concurrency and execute more batches and
1917.76 -> con you know as concurrent operations up
1920.08 -> to five uh but honestly my advice is if
1923.36 -> you have work that's slowing down your
1924.72 -> lambda functions then let's go ahead and
1926.32 -> offload that and pass it off to a step
1928.64 -> function or push it into sqs or
1930.559 -> something like that and let it let it
1932.399 -> execute asynchronously not in line
1935.76 -> with the lambda function because again
1937.2 -> that iterator age you just don't want it
1938.72 -> to get too old otherwise the processing
1940.96 -> you're trying to do is stale right and
1942.64 -> that's no good
1943.76 -> so
1944.48 -> this is kind of the overview right the
1945.84 -> serverless app the front end
1947.76 -> right and we come into
1950.159 -> the system dynamodb is the backbone it's
1952.159 -> the it's where all the data stays you
1954.159 -> know gets stored for a serverless app
1955.76 -> but on the back end of dynamodb is
1957.519 -> another ecosystem of tooling that gives
1959.2 -> you an amazing collection of
1960.84 -> functionality that people can use
1963.84 -> to extend the application so to speak so
1966.32 -> you know it's like dynamodb at this
1967.919 -> point it's more of a
1969.84 -> it's more than an application you know
1973.039 -> than a database it's a data processing
1974.799 -> engine there's a whole set of operations
1977.12 -> that can occur behind the data because
1978.64 -> of this change data capture processing
1980.559 -> pipeline and that at least once
1982 -> execution guarantee from lambda and
1983.6 -> we're going to demonstrate some of that
1985.2 -> as we get into the next portion of our
1987.519 -> presentation all right so that's what
1989.44 -> i've got for you from the presentation
1990.88 -> today we're going to talk a little bit
1992 -> about some of the functionality now i
1994.08 -> get into some demonstrations some
1995.44 -> hands-on i'm going to show you guys some
1996.799 -> code i'm going to show you guys some
1998.399 -> workflows some exciting new
2000.48 -> functionality around particle and the
2001.919 -> new apis and and actually demonstrate
2004.08 -> why why is it that this single table
2006.08 -> design that we just talked about is so
2007.919 -> much more efficient than than doing
2009.519 -> things the way we used to relationally
2011.12 -> right why shouldn't i put all my data on
2012.48 -> many tables right we're going to
2013.519 -> demonstrate some of that stuff for you
2015.36 -> again let's we're going to talk about
2016.799 -> that in a minute and let's get into the
2018.559 -> demos
2023.2 -> all right so for our first demo we're
2024.64 -> going to take a look at the particle api
2026.88 -> particle is a sql-like construct that is
2030.399 -> provided for users to access their
2031.84 -> dynamodb tables and gives you a familiar
2034.24 -> syntax to work with your data
2036.799 -> it doesn't necessarily support all the
2038.32 -> functions of sql right like a relational
2040.88 -> database doesn't you know nosql does not
2043.12 -> join tables but it supports all the
2045.279 -> relevant operators
2046.96 -> and it and it removes the requirement of
2049.359 -> the of the developer to understand the
2051.28 -> intricacies so to speak of the dynamodb
2053.679 -> query api right with the low level api
2056.32 -> or the document api
2058.159 -> users are responsible for determining
2059.76 -> which actions to opt to run whether it's
2061.599 -> a get item a put item or a query or a
2064.159 -> table scan and with the particle api it
2067.2 -> works a little differently you give us
2068.72 -> some cr query criteria based on those
2071.679 -> criteria we kind of look at it and say
2073.2 -> hey you gave me enough i understand what
2074.96 -> i need to do whether it's going to be a
2076.24 -> query or a get item or a scan depending
2078.639 -> on what you you give us
2080.879 -> the the request router is going to make
2082.56 -> a decision about what operation to run
2084.48 -> right so with this comes a couple of
2086.56 -> caveats right sometimes maybe you know
2088.72 -> you don't necessarily want things to
2090.399 -> happen right i don't necessarily want
2092.079 -> table scans to occur if i don't give it
2093.76 -> enough data in my query i'd rather just
2095.599 -> fail and so we're going to show you how
2097.52 -> to do that so let's take a quick look at
2099.119 -> that table that we're going to be
2100.079 -> working with this is a data table that's
2102.56 -> uh contains synthetic stock trades i
2105.28 -> might use this for a lot of my demos
2106.8 -> right we can actually use it a little
2108 -> later when we talk about global tables
2109.68 -> and export s3 and whatnot we're going to
2111.359 -> push this raw data out there and take a
2113.52 -> look at how we can get at it
2115.359 -> but for this uh demo we're really just
2117.2 -> going to be querying the items on the
2118.4 -> table we're going to be looking for you
2119.76 -> know what is particle doing when we make
2121.76 -> these uh these these queries what type
2123.76 -> of operations are running uh so the
2126.079 -> first thing we want to do is set up a
2127.76 -> user policy you know for our particle
2130.8 -> you know process right whatever's
2132.56 -> running we want of course least
2134.16 -> privileged access and in this case we're
2136.48 -> going to give it uh you know access to
2138.16 -> the particle api only insert select
2140.4 -> delete so we're going to assume this is
2141.68 -> a maybe a workload that's running in our
2144.88 -> production environment and these are the
2146.8 -> operations that we need access to
2148.8 -> and so let's go ahead and switch over to
2150.64 -> that client view and see what happens
2153.839 -> when we actually execute some of these
2155.2 -> queries right so
2156.8 -> as you can see here we're we're now a
2158.64 -> different user i'm the demo user with
2160.64 -> our policy that we just defined and i
2163.2 -> can't even see the tables right i don't
2164.96 -> have list tables process or privileges i
2167.52 -> only have you know particle privileges
2169.599 -> right so
2170.72 -> let's go ahead and run a simple particle
2172.56 -> query we're going to select star from
2174.64 -> the data table
2176.32 -> where
2177.359 -> our pk
2178.96 -> is equal to some string value
2181.76 -> and our sk is equal to some other string
2185.359 -> value and we'll grab those in a second
2187.04 -> here one thing about particle remember
2190.079 -> uh syntax is important you might and
2192.48 -> statement there or it's going to fail
2194.64 -> uh quotes go around entities things like
2196.72 -> tables attributes what not double quotes
2199.2 -> things like values single quotes and
2201.359 -> very important because it'll crash uh if
2203.76 -> you don't uh you know it won't parse the
2206.16 -> queries properly if you're not
2207.76 -> surrounding like your string values with
2209.2 -> those single quotes and your entities
2211.04 -> with those doubles all right so let's
2212.72 -> grab the data we need
2214.4 -> this is uh our partition key
2217.599 -> s
2219.119 -> 3
2220.4 -> 97
2221.839 -> and our sort key
2223.599 -> is going to be
2226.839 -> h is it
2229.68 -> 1768
2232 -> 68 okay so we should have a transaction
2234.56 -> on the table uh with that particular
2236.96 -> partition key in that particular sort
2238.32 -> key and with these query conditions i'm
2240 -> giving particle enough to know that it
2242.24 -> should run again item right because i'm
2243.68 -> giving it both a partition key
2245.68 -> equality condition and sort key equality
2247.76 -> condition that's great that tells
2249.76 -> particle give me give me that item
2255.04 -> so
2256.4 -> uh
2258 -> select star from need to add our from
2260.72 -> condition
2262.16 -> there we go we add from and we should be
2264 -> able to run this now
2266.56 -> there we go okay so we got our item this
2268.56 -> is the item we expected to see
2270.48 -> and we've got a uh you know exactly what
2273.52 -> we expected right and the partition key
2275.2 -> has 43 and hash 97 and again this is a
2278.24 -> get item if i remove the sort key
2280.8 -> condition i can
2282.88 -> force it to run a query in this
2284.4 -> particular case if there was more than
2286 -> one item in this partition it would
2287.599 -> bring back more than one item but
2288.88 -> there's not there's only one item so it
2290.56 -> only brings back the one item
2292.48 -> however if i provide only the sort key
2295.119 -> condition then i'm not giving it enough
2297.2 -> information to know whether it can run a
2298.8 -> query or a get item so it's going to
2300.64 -> default to a table scan and let's see
2302.56 -> what what that looks like it actually
2304.16 -> brings back the same item but we got a
2305.92 -> little thing here that says hey query
2307.76 -> can execution did not finish searching
2309.76 -> all the items what does that mean it
2311.599 -> means it brought back more than a
2312.8 -> megabytes worth of data
2314.72 -> the item we were looking for happened to
2316.16 -> be in that megabyte but as you can see
2318.16 -> this is a little tiny item and i just
2319.599 -> burned a megabyte with the worth of rcu
2321.839 -> to get it you know i can scan the rest
2323.52 -> of the table it doesn't bring back any
2324.96 -> other items this is uh this is not the
2327.52 -> operation we want to execute right i
2329.359 -> would rather this query fail than burn
2332 -> that kind of you know rcu right so how
2334.56 -> do we make that happen and if we go back
2336.72 -> and take a look at our policy editor we
2338.96 -> can we can see what kind of tweaks we
2341.28 -> need to make so to speak to our policy
2343.2 -> to actually ensure that scans don't
2346 -> happen
2347.2 -> one second grab this policy condition
2350.72 -> and let's edit this policy one more time
2354.48 -> i'm going to go to the json view and i'm
2356.64 -> just going to add a deny operation here
2358.96 -> right and the deny condition is going to
2361.04 -> be on particle select which is odd
2363.28 -> because again i'm allowing particle
2364.88 -> select but what i'm denying inside of
2366.56 -> particle select is a very specific
2368.24 -> operation which is the full table stand
2370.4 -> right when the full table scan condition
2372 -> is true
2373.04 -> then that operation is going to get
2374.48 -> denied and so let's see what ends up
2376.56 -> happening if i review and save this
2378.079 -> policy change we can now i've got a
2381.04 -> couple old versions we need to step on
2384.079 -> and so now this is my new policy here
2385.92 -> give it a second to propagate
2387.839 -> but essentially i'm allowing this uh
2390.079 -> process now to insert select and delete
2393.119 -> but select is only going to work when
2394.88 -> full table scan is false the full table
2396.96 -> scan is true then that's going to deny
2398.8 -> that operation okay so let's see what
2400.48 -> happens when i go back over to the
2402 -> console
2403.04 -> and look at that
2404.8 -> and run our query
2406.8 -> so this is the same query again we
2408.319 -> completed before we brought this item
2410 -> back we paginated it across if i run now
2412.64 -> it should come up it's uh error
2415.04 -> executing the command right access
2416.72 -> denied we don't have authorization to
2418.88 -> run that scan so this is how we're going
2421.52 -> to limit particle operations from
2423.92 -> executing
2426.24 -> procedures that we don't want them to so
2428.079 -> to speak we can give them a table scan
2430.56 -> deny
2431.68 -> insert select delete operations
2433.52 -> specifically and conditions on those
2435.119 -> operations accordingly
2437.119 -> to control what the particle api is
2439.52 -> doing so don't be afraid of it actually
2441.599 -> i'm loving it more and more the more i
2443.44 -> use particle the more i i'm embracing it
2445.839 -> it has a whole bunch of functionality in
2447.44 -> there that's not necessarily available
2448.96 -> in the document api
2450.72 -> batch execute statements with multiple
2452.4 -> conditions is probably the number one
2453.92 -> thing right in the document api when i
2456.319 -> do a a
2458.079 -> a batch get
2459.839 -> uh or
2461.28 -> i can't i can't add multiple conditions
2463.599 -> i can't run multiple queries as as a
2466 -> batch and particle you can so there's a
2468.16 -> lot of functionality there for you and
2470.319 -> with this fine-grained access control
2473.04 -> you can you can govern what's happening
2476.16 -> at a very granular level so
2479.119 -> great technology and let's take a look
2481.28 -> at our next demo here
2489.599 -> for our next demo we're going to do is
2491.2 -> take a look at a streams lambda
2492.839 -> aggregation and we're actually going to
2494.72 -> be doing two things we'll be aggregating
2496.319 -> the items that are coming into our
2497.52 -> trades table
2499.359 -> uh into a summary table this can be by
2501.28 -> security and the other thing we're going
2503.52 -> to be doing is pushing that raw data out
2505.92 -> through kinesis firehose stream into an
2507.839 -> s3 bucket and we're gonna be able to run
2510.079 -> athena queries for audit trail uh and
2512.8 -> validation of our of our aggregation
2514.96 -> function right so uh take a look at the
2517.599 -> data again we're using the same table
2519.2 -> that we used last time
2521.52 -> the data table as items come into this
2523.839 -> table
2524.8 -> these are individual trades that contain
2527.04 -> information about the trade what was the
2528.8 -> security uh was the type of buy or sell
2532.079 -> how many shares what was the time stamp
2534.16 -> right what region it was generated in
2536.24 -> and what the price
2537.52 -> was so that's all the information that
2538.88 -> we actually want to capture is raw data
2541.68 -> and the way we're going to do that
2543.68 -> is by again exporting the data with
2545.839 -> kinesis fire hose
2547.839 -> using a glue catalog schema definition
2550.8 -> to be able to format the data into a
2552.4 -> parquet file
2554.079 -> drop it into s3 and then we'll be able
2556.16 -> to query that data with athena using
2558 -> straightforward sql
2559.92 -> to get those audit trail workflows which
2561.599 -> is a really good workflow right any of
2563.119 -> the change data on your table
2565.28 -> can export into an s3 bucket and be
2567.04 -> available this way
2568.72 -> you know for those audit trail workflows
2571.2 -> all right so let's take a quick look at
2572.96 -> the glue catalog definition here
2575.359 -> aws glue is really a schema definition
2577.44 -> tool
2578.4 -> you can scan existing data repositories
2580.56 -> and build schema you can logically
2582 -> define schema manually in this
2583.92 -> particular case since the data doesn't
2585.52 -> already exist it won't exist until we
2587.28 -> actually push it into the catalog we
2589.04 -> need to define the schema up front and
2591.52 -> so that's what we're going to do we're
2592.48 -> going to have the stream proc db we're
2594.319 -> going to find a single table inside of
2596.079 -> that database which is our trades table
2598.24 -> and if you look at the definition of the
2599.599 -> trades table it contains all those
2601.44 -> fields we just talked about right what
2603.2 -> is the security what is the type of the
2605.68 -> trade the shares price region timestamps
2608.16 -> so on so forth and really all again all
2610.64 -> this is a schema definition it's just
2612.56 -> telling uh whatever system is
2614.4 -> referencing glue that the data catalog
2616.319 -> that it's trying to build uh using the
2618.319 -> raw data that is trying to process
2620.4 -> it should look like this okay the next
2622.88 -> thing we want to do is configure that
2624.24 -> kinesis fire hose right fire hose is all
2626.72 -> about
2627.599 -> you know delivery streams pushing the
2629.2 -> data that it's receiving into a
2630.8 -> destination
2632.079 -> it batches these things up and drops
2634.079 -> things in and batches into that uh
2637.359 -> your particular location in this case
2638.88 -> it's a s3 bucket and i've got multiple
2641.599 -> stream processors defined because my
2643.359 -> account uses default limits
2645.76 -> for the fire hose so i didn't want to go
2647.92 -> through the hassle of increasing those
2649.2 -> limits so i just spread the data out
2650.72 -> across multiple fire hoses to increase
2652.48 -> the bandwidth or throughput
2654.48 -> of the system these limits can be
2656.56 -> increased on a per stream basis so you
2658.48 -> don't necessarily have to run multiple
2659.839 -> streams but in my case i went ahead and
2661.44 -> did that
2663.04 -> it's pretty straightforward process for
2664.72 -> configuring
2666.16 -> a delivery stream uh essentially all
2668.88 -> you're doing is telling it where to go
2670.24 -> and what catalog to use
2672.16 -> and that's that's essentially what this
2673.76 -> does it says use the glue catalog from
2675.68 -> the oregon region uh we're gonna
2678 -> transform these into apache parquet file
2680.8 -> uh we've got uh the stream uh proc db is
2684 -> database we're gonna use and the table
2686.319 -> format that you know the tables table
2688 -> schema that we're gonna use is the
2689.28 -> trades table that's defined within the
2690.96 -> stream proc db schema and that's that's
2693.52 -> it so once you've got that you tell it
2695.28 -> what location to put it put it into
2697.76 -> we've got an s3 bucket defined for the
2699.52 -> stream proc db and we're good to go so
2702.319 -> at this point now i've got a whole bunch
2704.319 -> of data streams defined we're going to
2705.839 -> be spraying the data across i've got uh
2709.119 -> you know i've got the
2710.72 -> fire hose set up for grabbing the
2714.64 -> individual
2716.8 -> or pushing the data into those uh
2718.4 -> parquet files in s3 and now what we're
2720.96 -> gonna do is take a look at the lambda
2722.319 -> function that we're gonna be running
2723.52 -> here
2724.319 -> uh which you know is not terribly
2726.48 -> complicated but it does quite a bit of
2728.079 -> work
2729.119 -> if we look at the
2730.96 -> functions that we have defined in the
2732.319 -> system right now i've got the stream
2733.76 -> processor function is the one we're
2735.04 -> going to be working with
2736.48 -> this guy's triggered off of the dynamodb
2738.24 -> table and if we look at that trigger
2740.079 -> definition it's telling us it's coming
2742.16 -> off the data table right so
2744.48 -> all the write operations off of the data
2746.88 -> table every single time an insert update
2749.44 -> delete you know any kind of write
2751.2 -> operation occurs on that table it's
2752.72 -> going to fire this stream processor
2754.4 -> function
2755.359 -> and the stream processor function again
2757.76 -> it's not complicated code but it does
2759.839 -> quite a bit of stuff so let's go ahead
2761.28 -> and just break it down bit by bit right
2763.52 -> first thing we want to do is instantiate
2765.2 -> an instance of the document client for
2766.88 -> dynamodb
2768.4 -> we're going to set some configuration
2769.68 -> parameters right we want to know what
2771.359 -> region we're in
2772.64 -> we won't don't want to process records
2774.319 -> from any other region so this is a
2776.079 -> pattern i use constantly with global
2778.24 -> tables if you have master master
2780.079 -> configuration you always want to tag
2781.68 -> those items with a source region and why
2784.8 -> do i do this because you know global
2786.56 -> tables doesn't recognize the rights that
2788.48 -> are coming as replicated you know from
2790.16 -> other regions it just you know executes
2792.56 -> the same process and executes on any
2794.24 -> right to the table and so you know if
2796.64 -> you don't want to process the items that
2798.079 -> are generated or updated in other
2799.52 -> regions then you'd want to tag those
2801.76 -> items with the region label so you know
2804.16 -> to drop them and that's what we're going
2805.599 -> to do in this logic if it's not the
2807.04 -> region that we're expecting we're just
2809.04 -> going to drop the items uh next thing we
2811.119 -> want to do is instantiate our firehose
2812.8 -> client
2813.92 -> as is for sending that data into s3
2816.4 -> uh and you know a couple things we want
2818.64 -> to do with the fire hose client you know
2820.48 -> eventually tell it what stream processor
2822.4 -> and what not to build but that happens
2824.4 -> inside this handler function okay so the
2826.4 -> handler function fires on every single
2828.319 -> record that comes into the system
2830.96 -> and you know we're going to declare some
2832.72 -> global variables here and one of the
2835.44 -> things that's important is telling us
2836.72 -> which stream processor to use remember i
2838.72 -> talked about my fire hose configuration
2840.559 -> i've got 10 stream processors team
2842.72 -> stream processors zero to nine
2844.96 -> essentially this is what i'm doing on
2846.72 -> every one of these lambda locations i'm
2848.24 -> going to use a different processor so as
2850.16 -> we start to scale the system i'm going
2851.76 -> to be spreading that data out across you
2853.68 -> know those 10
2854.88 -> fire hose streams and then the only
2856.96 -> thing i need to do again
2859.119 -> build the container for the right
2861.359 -> essentially it tells it what stream to
2862.88 -> go after
2863.92 -> it tells us you know the array of
2865.68 -> records to push onto that stream and
2867.76 -> that's the parameters object that we're
2869.2 -> going to push into the fire hose put
2871.52 -> as we get into the uh actually
2873.52 -> processing the records
2875.44 -> this is where we're going to start to
2876.64 -> look at things we care about right if
2879.04 -> it's not a insert
2881.28 -> it's not coming from my source region
2883.28 -> then i don't care about it just drop it
2885.04 -> this processor is you're going to ignore
2886.88 -> those items if it is coming from the
2888.8 -> source region i expect
2890.72 -> if it is an insert
2892.559 -> then i'm going to you know skip anything
2894.64 -> that's not actually a transaction so i
2896.88 -> know that on this table all i'm doing is
2898.559 -> inserting transactions so no problem but
2900.64 -> to future proof this uh function just in
2903.44 -> case down the road i start adding
2904.88 -> additional item types i i just want to
2907.04 -> make sure that i'm i'm actually you know
2909.359 -> processing the records that i want to
2910.96 -> process once i know that's a process
2913.119 -> that that is actually a
2915.2 -> a a transaction a stock transaction then
2918 -> i'm going to go grab all of those top
2920.64 -> level metrics from it the price the
2922.319 -> shares the region the type you know
2924 -> security timestamp all of that stuff
2927.119 -> we're going to
2928.88 -> push that into a security counter right
2931.52 -> as lambda processes records in batches
2934.16 -> if i have high velocity trades happening
2935.92 -> on single security i'm going to end up
2938.16 -> getting dozens or maybe hundreds of
2939.839 -> those in a single batch and to be able
2941.92 -> to kind of reduce the pressure on the
2943.52 -> summary table i kind of want to
2944.88 -> summarize all those items inside of the
2946.64 -> lambda function before i push a single
2948.4 -> update to the summary table right so
2950.72 -> that's what we're doing here if this is
2952.24 -> the first time i've seen a trade for
2953.599 -> this security in this batch of items
2955.52 -> we're going to create a new counter for
2956.8 -> that security and then what we're going
2958.48 -> to do is just summarize buy shares to
2960.96 -> buy total buy orders and all that buy
2963.28 -> you know by each security for buys and
2965.599 -> sells
2966.72 -> and then if we if we're processing the
2968.88 -> item then we're going to push it onto
2970.559 -> the fire hose
2972.88 -> container so that we can push it to you
2975.2 -> know through the fire hose stream into
2978.8 -> the s3 for our
2980.48 -> our
2981.68 -> historical queries right
2983.52 -> so this is kind of again this is the
2984.88 -> method that does the big work again all
2986.4 -> this code's gonna be available for you
2987.76 -> i'll post a link to the github repo uh
2989.92 -> where you can pull this down and play
2991.119 -> with it yourself uh but this this
2993.2 -> handler function is where all the work
2995.359 -> happens uh inside these event records
2998.4 -> for each loop this is where the
2999.68 -> summarization happens once we summarize
3002.079 -> the records we're going to go ahead and
3003.44 -> batch them up create wrap that fire hose
3006.48 -> uh
3007.2 -> put into a promise
3008.88 -> uh and then we're gonna start to build
3010.559 -> our update expression for the summary
3012.319 -> items all right we're going to you know
3014 -> update the totals for the buy sell you
3016.4 -> know uh
3018.4 -> trades for each individual for the
3020.079 -> individual security uh they all have the
3022.559 -> same update expression they just use
3024.079 -> different
3025.04 -> uh you know parameters we're gonna
3026.96 -> initialize counters for the next day all
3028.88 -> right this is important if i'm
3029.92 -> maintaining a summary item and it has 30
3032.079 -> days worth of trailing totals well
3033.599 -> tomorrow i need to have the counters so
3035.44 -> i can start incrementing them so on
3037.52 -> every update during the day we're going
3039.44 -> to set tomorrow's counters to zero that
3041.119 -> way we know they're going to be there
3042.8 -> when we start processing items for
3044.4 -> tomorrow's trades and then it's just a
3046.64 -> matter of getting a promise for each one
3048.16 -> of those updates once we push all those
3050.559 -> promises onto an array we're going to
3052.16 -> await those promises
3053.839 -> when everything comes back hopefully
3055.44 -> true we're going to move on and we're
3057.839 -> good to go right so this is again the
3060.319 -> kind of workflow we're going to go
3061.44 -> through for two things summary
3063.119 -> aggregation which is where pushing all
3065.04 -> those stock trades into those summary
3067.04 -> counters and historical audit trail
3070 -> queries where we're pushing all the raw
3072.24 -> trade data out into an s3 bucket that
3074.96 -> we're going to be able to query later
3076.48 -> okay right and let's take a look at what
3078.16 -> that actually looks like if i go into
3080.64 -> you know the athena console
3082.8 -> i can actually see what's been happening
3084.64 -> here as i run this scenario over and
3086.72 -> over again over the months we've stored
3088.24 -> all that historical data from all those
3090.319 -> synthetic trade transactions and what
3092.64 -> i'm going to do here is i'm actually
3094 -> going to go into the aws data catalog
3096.079 -> right which is our glue data catalog i'm
3098.079 -> going to pull the stream proc db so
3100.4 -> let's take a look at what that looks
3101.44 -> like here when we go run that query if i
3103.28 -> go and execute against the trades table
3105.28 -> you can see it takes some time to come
3106.8 -> back it's not the snappiest query
3108.8 -> performance because again it's uh
3111.92 -> you know it's part k files s3 athena is
3114.72 -> really a map reduce engine on top so
3116.96 -> yeah it took a little while to get the
3118.24 -> data at two seconds or so uh we scanned
3120.96 -> through about 29 megabytes of trade data
3123.76 -> there
3124.72 -> we got a little over three million or so
3127.44 -> trades that have been recorded 31
3129.04 -> million trades okay and so what's nice
3131.599 -> about this now is i can go do all kinds
3133.28 -> of stuff i can get my top in you know i
3135.28 -> do this is just to run whatever sql you
3137.28 -> want you can run your top end against
3139.2 -> your
3140.079 -> trade volume by security and this gives
3142.88 -> me my top end buy sell volume by volume
3146.88 -> and and again there's some nice
3148.96 -> ways to be able to go out there and look
3150.64 -> at slice and dice the data in
3152.559 -> interesting ways that
3154.319 -> you don't necessarily run at high
3155.52 -> velocity i don't need to necessarily run
3157.44 -> low latency queries against this data
3159.44 -> set
3160.4 -> it's uh it's an extremely useful way
3163.76 -> to be able to
3165.68 -> run those audit trail queries right that
3167.52 -> we all need i know i want the historical
3169.359 -> data but i don't want to have to pay to
3170.8 -> keep it online most of those audit trail
3173.119 -> queries aren't things that last or they
3175.04 -> run at a high frequency right they might
3177.28 -> run you know a couple times a month
3178.88 -> somebody needs to come and do some
3180.64 -> some troubleshooting maybe a couple
3182 -> times a day and and it's not really a
3184.16 -> real-time
3185.599 -> workflow here if it takes a second or
3187.28 -> two to get that data in this particular
3188.8 -> case it took us three seconds to scan
3190.4 -> through about 100 megabytes of data
3193.04 -> to produce these top-end
3195.28 -> aggregations uh you know that's just
3197.44 -> fine for an analyst at the end of the
3199.04 -> day who wants to run and get some
3200.319 -> numbers right this is not something that
3201.92 -> needs to run in milliseconds so a really
3204.8 -> good workflow here that we just
3206 -> demonstrated be able to take that change
3207.76 -> data through the streams uh
3210.16 -> you know cdc pipeline and do two things
3213.2 -> aggregate the data back to the table if
3215.359 -> we go back and look at that summary
3216.8 -> table we can actually see what the
3218.16 -> summary items look like
3219.92 -> this is the data table that contains the
3221.68 -> summary information that we built and
3223.92 -> you can see you know those items
3225.359 -> essentially by security partitioned on
3227.44 -> security id
3229.599 -> and
3231.04 -> they give us that daily
3232.96 -> uh
3233.76 -> 30-day trailing right in this case we
3235.52 -> built a couple days worth of
3238.48 -> trade data but as you can see it gives
3240.96 -> us exactly what we build in our stream
3243.04 -> process in our lambda function as we you
3246.079 -> know update our buy shares sell shares
3249.2 -> buy totals buy orders what not
3252.319 -> when we go back to our our summary table
3254.8 -> in our dynamodb we see exactly those
3256.8 -> data that data aggregated and pushed in
3259.2 -> on a daily basis by security so again
3262.559 -> aggregate by security uh push data out
3265.119 -> to s3 for audit trail uh uh change data
3268.4 -> captured pipelines into uh s3 for audit
3271.28 -> trail queries uh really good workflows
3273.68 -> for your lambda functions and you know
3275.92 -> just so that you guys have the the the
3278.16 -> code available there's a a nice repo up
3280.88 -> here that was put together by one of my
3282.559 -> peers uh rob mccauley and this code
3285.76 -> lives up there uh we can they'll post
3287.92 -> this in the chat uh folks can go ahead
3290.16 -> and and take a look at this repo there's
3292.319 -> a couple of things in there we didn't
3293.599 -> really go over uh cost modeling uh for
3296.88 -> uh uh
3299.04 -> table scans versus you know index
3301.44 -> queries uh auto scaling calculator what
3303.92 -> not the code that i was running today to
3306.079 -> do all this was in the table loader and
3308.4 -> it includes the lambda function and
3309.92 -> whatnot that i was just going through so
3311.599 -> if you want to go ahead and pull that
3312.799 -> stuff up and kind of play with the code
3314.48 -> yourself
3315.599 -> it's all up there for you in this repo
3318.4 -> again i hope you guys got something out
3320.72 -> today's session i'm gonna be available
3322.64 -> here for the next 20 or 30 minutes if
3324.24 -> you guys have questions and uh
3326.88 -> thanks a lot for taking the time to
3328.079 -> watch the session uh if you want to hit
3330 -> me up on twitter my name's rick houlihan
3332.319 -> and this is my twitter handle and again
3335.119 -> thanks so much for hearing this session

Source: https://www.youtube.com/watch?v=LJ4R5fnY45c