Best practices to maximize scale and performance with Amazon DynamoDB - AWS Virtual Workshop
Best practices to maximize scale and performance with Amazon DynamoDB - AWS Virtual Workshop
Amazon DynamoDB is a fully managed, multi-region, multi-active database that delivers single-digit millisecond performance at any scale and powers many of the world’s fastest growing businesses such as Dropbox, Disney+, Samsung, and Netflix. In this virtual workshop, AWS Senior Practice Manager, DynamoDB expert Rick Houlihan will go deep into a few of DynamoDB’s key features and share tips and best practices to help customers learn how to maximize scale and performance. Among these are using AWS Identity and Access Management (IAM) to restrict PartiQL from doing scans, using region tagging to facilitate global tables, and exporting to Amazon S3 for audit trail workflows that do not require low latency response or need to support ad hoc queries. Join Rick for this hands-on advanced technical session to explore each of these features in detail, with detailed examples.
Learning Objectives: * Learn about the foundation of NoSQL databases. * Dive deep into key features of Amazon DynamoDB, including global tables, PartiQL support, and export to S3. * Learn tips and best practices to maximize key features for scale and performance. Subscribe to AWS Online Tech Talks On AWS: https://www.youtube.com/@AWSOnlineTec…
☁️ AWS Online Tech Talks cover a wide range of topics and expertise levels through technical deep dives, demos, customer examples, and live Q\u0026A with AWS experts. Builders can choose from bite-sized 15-minute sessions, insightful fireside chats, immersive virtual workshops, interactive office hours, or watch on-demand tech talks at your own pace. Join us to fuel your learning journey with AWS.
#AWS
Content
1.67 -> [Music]
7.359 -> welcome everybody and thanks for taking
9.2 -> the time out of your day to come join us
10.559 -> for this webinar we're going to be
12.08 -> talking today about amazon dynamodb some
14.4 -> of the key features and best practices
17.279 -> as well as some of the high level design
18.56 -> patterns and we're going to walk through
20.08 -> some of the technologies and uh
22.08 -> uh and actually demonstrate some of
23.6 -> these features hands-on that uh in in
26.16 -> live fire demos that you can actually
27.84 -> see this stuff we're talking about
29.359 -> actually working
30.48 -> my name is ray houlihan i'm a senior
32.239 -> practice manager for nosql services
34.88 -> at aws and i focus mostly on dynamodb
37.84 -> i'm sure
38.719 -> some of you are probably familiar with
39.92 -> my content others maybe not and that's
42.719 -> what we're here for right we're going to
43.84 -> talk today a little bit about a brief
45.44 -> history of data processing why are we
47.28 -> actually looking at no sql technologies
49.36 -> again for those of you who've heard this
50.719 -> before you know just right along for a
52.399 -> few minutes i guess a lot of folks
53.92 -> obviously haven't yet are new to this
55.84 -> content and we're getting into an
57.36 -> overview of dynamodb what is this what
59.199 -> is dynamodb uh talk a little bit about
61.76 -> the data modeling best practices
64.4 -> uh for nosql in general and what it
66.08 -> really means to manage your data is
67.52 -> normalized versus denormalized schema uh
70.159 -> talk a little bit about dynamodb in the
71.68 -> serverless ecosystem
73.439 -> and and and get into you know some of
75.2 -> the modeling and best practices for real
76.88 -> world applications and showing you some
78.479 -> actual real you know technology demos
80.799 -> and and how this stuff works right so
82.32 -> the first thing we want to talk about is
84 -> the history of data processing why do we
85.6 -> want to use this new technology and
87.52 -> really what it comes down to
89.439 -> when you look at database technology
90.88 -> over the years is there's been a series
92.079 -> of innovations that have been driven by
94 -> you know this thing we call data
95.2 -> pressure data pressure is the ability of
97.28 -> the system to process the amount of data
99.68 -> that we're trying to process at a
101.6 -> reasonable cost or reasonable time and
103.759 -> when we can't do one of those things or
105.439 -> the other we're going to invent
106.88 -> something that's it that's a technology
108.32 -> trigger and we've done this many times
109.759 -> through the years the first system that
111.6 -> we're actually born with the one between
113.439 -> our ears didn't last long we needed to
115.6 -> start writing things down storing
116.96 -> structured data in some way to be able
119.119 -> to build an enterprise build business on
121.04 -> top of and that worked really well
122.56 -> ledger accounting and just kind of
124.159 -> writing things down for many many years
126.64 -> until the 1880 u.s census came along and
129.039 -> a guy named herman hollerith was tasked
130.72 -> with collating all that data processing
133.04 -> everything and what it came down to was
134.959 -> it took him eight years of the 10-year
136.879 -> cycle for the census which was obviously
138.879 -> not going to work the next time around
140.319 -> so he invented the machine readable
141.76 -> punch card and the punch card sorting
143.76 -> machine and the era of modern data
145.76 -> processing was born so you know rapidly
147.76 -> we started develop new you know
149.36 -> technologies new services uh and new
152.48 -> systems to be able to support those
153.92 -> services things like paper tape magnetic
156.16 -> tape distributed block storage random
157.92 -> access file systems and the data sprawl
160.4 -> in the 60s started to drive a team at
163.519 -> ibm led by a guy named edgar codd into
165.68 -> the relation into development of the
167.12 -> relational database right system r was
168.959 -> the first system that came out did many
171.04 -> things right it made data more
172.4 -> consistent it was very difficult to
173.92 -> maintain all these copies and
175.36 -> denormalized data on disks at the time
178.239 -> and it reduced the sprawl of the data
180.08 -> the cost of the data
181.84 -> and the ability of the the enterprise to
184.64 -> be able to store more data by
185.92 -> deduplicating this data it was
187.36 -> tremendous value
189.12 -> and you know again 40 50 years ago the
190.959 -> most expensive resource in the data
192.4 -> center was storage and and so this was a
194.56 -> really effective technology and the
196.48 -> technology for maintaining the copies of
198.159 -> data we're not really available but
200.159 -> today fast forward and you know 50 years
202.879 -> and now the most expensive
204.799 -> uh systems in the data center the most
206.64 -> expensive cost in the data center is
208.159 -> really cpu right it's pennies for
209.76 -> gigabytes to store data we talk about
211.68 -> terabytes petabytes exabytes of data
213.76 -> today so it's not about storing the data
216.239 -> anymore it's about processing it and
218 -> that's where the cost comes in so when
219.84 -> you look at nosql it's about you know
222.4 -> basically processing data at scale now
224.72 -> when you want to use new technologies
226.879 -> and no sql isn't new technology you know
228.799 -> they don't work the same way and this is
230.4 -> one of the biggest problems i see teams
232 -> today they kind of get into this you
233.519 -> know idea of i have this big data
235.12 -> challenge you know my relational
236.64 -> databases are slow you know i'm kind of
238.64 -> sharding and partitioning my data to be
240.319 -> able to handle this scale and and i and
242.799 -> there's this thing called nosql that's
244.4 -> supposed to do all this stuff for me and
245.92 -> it's great let's go get it and they kind
247.36 -> of run down this path well you know the
249.2 -> reality is that the innovators have come
250.959 -> along and they've invented something
252.239 -> here's this data pressure that we have
254.239 -> this big data challenge and they've
256.32 -> they've
257.28 -> invented something but the skill set
258.799 -> isn't there yet so people kind of go
260.479 -> they use it they try to use it the same
262 -> way they use the old technology and they
263.68 -> have a miserable experience and this is
265.919 -> typical of the nosql kind of
268.56 -> path in almost every technology path as
270.96 -> it's introduced to market you can't use
273.04 -> it the same way you use the old
274.24 -> technologies you got to learn how to use
275.68 -> it first
276.88 -> as those skills kind of become commodity
278.639 -> in the market
280 -> the early majority lands you know
282 -> developers kind of become familiar with
283.52 -> how to use it they they
285.12 -> the experience becomes a better
286.56 -> experience right so if you find yourself
288.24 -> talking about how nosql is difficult to
290.4 -> use i i don't understand it well
293.52 -> that's good that you that you recognize
295.04 -> you don't understand it let's kind of
296.639 -> learn about it you didn't always
297.919 -> understand how to use that relational
299.36 -> database either right remember that at
301.52 -> some point uh you it was new to you so
304.4 -> let's get into the new technologies and
306 -> learn how they work we'll have a better
307.36 -> experience than trying to use these
309.199 -> things the same way we use the old ones
310.88 -> and and we'll explain a little bit about
312.4 -> what that means when we get into talking
314.08 -> about some of the best practices and and
316.16 -> things for nosql databases in general
318.8 -> so now why nosql
321.36 -> sql is really about
323.12 -> why no sql it's because sql is really
325.36 -> about that normalized relational view of
327.36 -> data you know that the data is you know
329.6 -> well structured for supporting any kind
331.6 -> of access pattern it's it's it's
333.199 -> agnostic to every pattern and what that
335.039 -> really means is it's really optimized
336.72 -> for none right i have to kind of join
338.479 -> indexes across tables and build these
340.24 -> views of data
341.68 -> you know and as such it kind of scales
343.52 -> vertically it works very well on a data
345.28 -> set that's stored in a single container
346.96 -> it works less
348.4 -> efficiently when data is kind of
349.6 -> distributed across multiple containers
351.28 -> and it's kind of good for those olap
353.28 -> workloads where i don't really
354.32 -> understand what the access patterns are
355.6 -> going to be right so that's kind of the
356.72 -> category we fit no that sql works well
359.199 -> in but you know no sql on the other hand
361.12 -> works really well for oltp applications
364 -> which are applications that have
365.52 -> well-defined access patterns if i
367.28 -> understand the access pattern then i can
368.88 -> kind of model the data to kind of
370.319 -> represent those instantiated views the
372.16 -> result sets of those queries more or
374.16 -> less and and be able to to build
376.8 -> hierarchical data storage structures
378.72 -> that are are more you know denormalized
382.16 -> they scale horizontally across multiple
384.479 -> nodes
385.44 -> and it's really built for you know
386.96 -> situations where i where the access
389.039 -> patterns are well understood now this is
390.8 -> great because most applications today
393.44 -> are pretty well understood patterns
395.6 -> right i mean this is what we're writing
397.039 -> code for because the code executes and
398.88 -> it doesn't ask a different question
400.08 -> every time it executes it asks the
401.759 -> actually the same questions every time
404.08 -> and so having a database that's
406 -> optimized to be able to provide the
407.759 -> result set for the questions that your
409.28 -> applications are asking is actually
410.88 -> quite efficient we'll get into looking
412.479 -> at what that really means when we talk
414.4 -> about amazon dynamodb it's a fully
416.8 -> managed nosql database right no sql
418.8 -> databases are great but scaling the
420.8 -> infrastructure behind them is not so
422.479 -> great it's not something that you want
424.24 -> to be involved with as a business it's
425.84 -> not really core
427.199 -> a core value that you want to develop
428.88 -> and this is really where i start working
430.24 -> with customers a lot is when they get
432.319 -> into the scale aspect of things now i
434.479 -> also work with a lot of customers that
435.84 -> are just getting started and and the
437.599 -> reason why they want to use it is
438.8 -> because again a fully managed system i
440.4 -> don't have to understand how to provide
441.919 -> five nines availability it's there out
443.599 -> of the box uh and for those of us that
445.84 -> tried to manage a five nines
447.12 -> availability service you know how hard
448.8 -> that is and dynamodb gives you that at
451.12 -> the click of a button and that's the
452.8 -> kind of functionality that that startups
454.56 -> want for mission critical services right
456.479 -> so again document wide column database
459.199 -> scales to any workload it's fast and
460.8 -> consistent a scale we have workloads
462.96 -> that execute in
464.72 -> you know tens of millions of
465.84 -> transactions per second and provide low
468.16 -> single digit millisecond latency we'll
469.84 -> show you some of that stuff in a minute
471.599 -> uh fine-grained access control over all
473.759 -> your data at the item level the table
475.84 -> level the the attribute level within the
477.68 -> items you can restrict processes to see
479.759 -> certain chunks of data that you want
481.28 -> them to and others to see other other
483.68 -> bits or the whole uh items and and it's
487.199 -> fully elastic and this is one of the
489.039 -> things that we'll talk about when we get
490.319 -> into the performance characteristics of
492.08 -> dynamo but
493.36 -> there's no other database that works
494.8 -> like this it's fully on demand if you
496.56 -> need it to be you get whatever you want
498.16 -> when you want it you pay for what you
499.919 -> use when you use it and you don't even
502.24 -> pay for storage right until you store
504.16 -> items once you delete the items then the
506.08 -> storage costs go away so there's no
508 -> pre-provisioning of resources with
509.599 -> dynamodb and it's what makes it really
511.28 -> nice for the serverless kind of paradigm
513.36 -> that event driven programming model
515.519 -> that's becoming so popular today
517.68 -> now when you look at nosql databases we
519.919 -> talked a little bit about how they work
521.2 -> differently
522.64 -> dynamodb works the same way as all the
524.399 -> sql databases and that's one thing that
526.32 -> you can argue back and forth with other
527.839 -> people but the modeling of the data is
529.44 -> always the same no matter what the nosql
531.12 -> database platform is there's going to be
532.959 -> some sort of collection of objects right
534.56 -> there's in dynamodb we call these things
536.48 -> a table
537.6 -> tables have items uh items have
539.92 -> attributes and not all the items on the
542 -> table have to have the same attribute
543.6 -> okay so this is one of the
545.44 -> differentiators between a nosql database
547.76 -> and relational databases the the items
549.519 -> on the table they all
550.959 -> they're like the rows from all your
552.399 -> tables you push them you put them all
553.92 -> into one place in in mongodb they call
555.92 -> this a collection in cassandra it's
557.92 -> called a key space dynamodb calls it a
560.64 -> table regardless you're pushing items
563.04 -> into an object into a repository and all
565.92 -> of those objects have to have a unique
568 -> attribute to identify what this object
570 -> is that's the partition key in dynamodb
572.32 -> it's under bar id in mongodb or
574.32 -> documentdb and again it's a partition
576.8 -> key in cassandra
578.32 -> but this uniquely identifies that item
580.32 -> within the table so if i have a
581.6 -> partition key only table
583.6 -> in dynamodb or i have a standard
585.839 -> collection with no indexes in in mongodb
588.399 -> then then this is that key value access
590.399 -> pattern everyone talks about nosql is
592.16 -> great for and it is great for the key
593.839 -> value access pattern but the reality is
596.24 -> applications have a lot more complex
597.92 -> access patterns than that and if you
599.68 -> want to try and push all the relational
601.12 -> data that you have into these big blobs
602.88 -> to describe can be described as key
604.48 -> value data structures it's a very
606.24 -> inefficient way to model your data we'll
607.76 -> get into that in a minute but the
609.519 -> reality here is i want to create
610.88 -> collections of these objects that are
612.24 -> interesting to my application and i want
614.079 -> to treat these things more or less like
615.6 -> the rows of the tables from my
616.959 -> relational databases right you don't
618.64 -> query all the rows that are related to
620.32 -> each other all the time you query
621.92 -> subsets of those rows right you work
623.68 -> with little chunks of the relational
625.519 -> hierarchies that represent your data and
627.68 -> we kind of want to do the same thing
628.8 -> with nosql databases right to be able to
630.8 -> work with the data efficiently we got to
632.72 -> be able to get the data in and out kind
634.72 -> of the same way
636.16 -> if i'm pulling big blobs of data to
637.92 -> update small integers i'm burning a lot
639.68 -> of iops to do that and that's a very
641.68 -> common uh uh
643.6 -> bat you know
645.04 -> anti-pattern in nosql so when you hear
647.36 -> people talk about i'm using this
648.56 -> database and such and such because it
650 -> has large document support for like 16
651.92 -> megabytes you got to look at them and
653.2 -> say well why are you doing that you know
654.64 -> i mean you shouldn't be storing data in
656.56 -> big giant documents like that and if you
658.24 -> are then let's let's talk about object
660.64 -> stores like s3 for that data because you
662.48 -> shouldn't be uh
664.079 -> anyways the anti-pattern is using the
666.079 -> large documents anyways what we want is
668.32 -> to create collections of objects that
670 -> are interesting the application so we
672 -> have partition keys in
673.76 -> dynamodb we can create a additional key
677.279 -> a compound key called a a sort key which
680.56 -> gives us the ability to create
681.6 -> collections of objects within these
682.959 -> partitions right now these partitions
684.72 -> become
686.48 -> more or less groups of items that are
689.279 -> interesting to the application in
690.959 -> mongodb i'd start adding indexes to do
693.04 -> this but
694.079 -> in other nosql databases like cassandra
696.32 -> we also have the ability to add sort
697.92 -> keys
698.959 -> some even have much more complex or key
701.44 -> structures than dynamodb but regardless
703.6 -> what we're trying to do is create a
704.88 -> collection of objects and so if you
706.48 -> think about it we can start to model
708.32 -> these one to many and many many
710.079 -> relationships by creating you know
711.839 -> containers full of things if i have
713.68 -> let's say a partition key which is the
715.92 -> customer id and i have a sort key which
718.48 -> might be a customer's interaction date
721.04 -> concatenated by the interaction id
723.839 -> you know customers do what customers
725.2 -> make orders they they make payments they
727.6 -> they get shipments
729.04 -> they make returns they open tickets
730.959 -> right but i want sometimes customer logs
732.959 -> into a portal and says you know what's
734.32 -> my current you know state of my
736.32 -> interactions right with my with my
738.16 -> service i want to get my you know
739.839 -> landing page for the customer portal
742.24 -> i can query this table now and say give
744.079 -> me everything for this partition key for
746 -> customer id x with the sort key greater
748.48 -> than date y and it brings back
750.399 -> everything right all of those rows from
752.399 -> all of those tables all in one query and
754.72 -> this is where nosql databases start to
757.12 -> shine
758.56 -> against
759.6 -> the the relational database by creating
761.279 -> these grouped collections either on the
763.04 -> table or on indexes right and then we
764.88 -> can start to use range queries on those
766.959 -> sort key operators to get filtered
769.519 -> objects from those collections
771.44 -> essentially
772.399 -> recreating the relational models and
774.24 -> we'll talk about this again in more
775.6 -> detail
778.079 -> that you that you actually model in your