Best practices to maximize scale and performance with Amazon DynamoDB - AWS Virtual Workshop

Aug 16, 2023

Best practices to maximize scale and performance with Amazon DynamoDB - AWS Virtual Workshop

Amazon DynamoDB is a fully managed, multi-region, multi-active database that delivers single-digit millisecond performance at any scale and powers many of the world’s fastest growing businesses such as Dropbox, Disney+, Samsung, and Netflix. In this virtual workshop, AWS Senior Practice Manager, DynamoDB expert Rick Houlihan will go deep into a few of DynamoDB’s key features and share tips and best practices to help customers learn how to maximize scale and performance. Among these are using AWS Identity and Access Management (IAM) to restrict PartiQL from doing scans, using region tagging to facilitate global tables, and exporting to Amazon S3 for audit trail workflows that do not require low latency response or need to support ad hoc queries. Join Rick for this hands-on advanced technical session to explore each of these features in detail, with detailed examples.

Learning Objectives:
* Learn about the foundation of NoSQL databases.
* Dive deep into key features of Amazon DynamoDB, including global tables, PartiQL support, and export to S3.
* Learn tips and best practices to maximize key features for scale and performance. Subscribe to AWS Online Tech Talks On AWS:
https://www.youtube.com/@AWSOnlineTec…

Follow Amazon Web Services:
Official Website: https://aws.amazon.com/what-is-aws
Twitch: https://twitch.tv/aws
Twitter: https://twitter.com/awsdevelopers
Facebook: https://facebook.com/amazonwebservices
Instagram: https://instagram.com/amazonwebservices

☁️ AWS Online Tech Talks cover a wide range of topics and expertise levels through technical deep dives, demos, customer examples, and live Q\u0026A with AWS experts. Builders can choose from bite-sized 15-minute sessions, insightful fireside chats, immersive virtual workshops, interactive office hours, or watch on-demand tech talks at your own pace. Join us to fuel your learning journey with AWS.

#AWS

Content

1.67 -> [Music]

7.359 -> welcome everybody and thanks for taking

9.2 -> the time out of your day to come join us

10.559 -> for this webinar we're going to be

12.08 -> talking today about amazon dynamodb some

14.4 -> of the key features and best practices

17.279 -> as well as some of the high level design

18.56 -> patterns and we're going to walk through

20.08 -> some of the technologies and uh

22.08 -> uh and actually demonstrate some of

23.6 -> these features hands-on that uh in in

26.16 -> live fire demos that you can actually

27.84 -> see this stuff we're talking about

29.359 -> actually working

30.48 -> my name is ray houlihan i'm a senior

32.239 -> practice manager for nosql services

34.88 -> at aws and i focus mostly on dynamodb

37.84 -> i'm sure

38.719 -> some of you are probably familiar with

39.92 -> my content others maybe not and that's

42.719 -> what we're here for right we're going to

43.84 -> talk today a little bit about a brief

45.44 -> history of data processing why are we

47.28 -> actually looking at no sql technologies

49.36 -> again for those of you who've heard this

50.719 -> before you know just right along for a

52.399 -> few minutes i guess a lot of folks

53.92 -> obviously haven't yet are new to this

55.84 -> content and we're getting into an

57.36 -> overview of dynamodb what is this what

59.199 -> is dynamodb uh talk a little bit about

61.76 -> the data modeling best practices

64.4 -> uh for nosql in general and what it

66.08 -> really means to manage your data is

67.52 -> normalized versus denormalized schema uh

70.159 -> talk a little bit about dynamodb in the

71.68 -> serverless ecosystem

73.439 -> and and and get into you know some of

75.2 -> the modeling and best practices for real

76.88 -> world applications and showing you some

78.479 -> actual real you know technology demos

80.799 -> and and how this stuff works right so

82.32 -> the first thing we want to talk about is

84 -> the history of data processing why do we

85.6 -> want to use this new technology and

87.52 -> really what it comes down to

89.439 -> when you look at database technology

90.88 -> over the years is there's been a series

92.079 -> of innovations that have been driven by

94 -> you know this thing we call data

95.2 -> pressure data pressure is the ability of

97.28 -> the system to process the amount of data

99.68 -> that we're trying to process at a

101.6 -> reasonable cost or reasonable time and

103.759 -> when we can't do one of those things or

105.439 -> the other we're going to invent

106.88 -> something that's it that's a technology

108.32 -> trigger and we've done this many times

109.759 -> through the years the first system that

111.6 -> we're actually born with the one between

113.439 -> our ears didn't last long we needed to

115.6 -> start writing things down storing

116.96 -> structured data in some way to be able

119.119 -> to build an enterprise build business on

121.04 -> top of and that worked really well

122.56 -> ledger accounting and just kind of

124.159 -> writing things down for many many years

126.64 -> until the 1880 u.s census came along and

129.039 -> a guy named herman hollerith was tasked

130.72 -> with collating all that data processing

133.04 -> everything and what it came down to was

134.959 -> it took him eight years of the 10-year

136.879 -> cycle for the census which was obviously

138.879 -> not going to work the next time around

140.319 -> so he invented the machine readable

141.76 -> punch card and the punch card sorting

143.76 -> machine and the era of modern data

145.76 -> processing was born so you know rapidly

147.76 -> we started develop new you know

149.36 -> technologies new services uh and new

152.48 -> systems to be able to support those

153.92 -> services things like paper tape magnetic

156.16 -> tape distributed block storage random

157.92 -> access file systems and the data sprawl

160.4 -> in the 60s started to drive a team at

163.519 -> ibm led by a guy named edgar codd into

165.68 -> the relation into development of the

167.12 -> relational database right system r was

168.959 -> the first system that came out did many

171.04 -> things right it made data more

172.4 -> consistent it was very difficult to

173.92 -> maintain all these copies and

175.36 -> denormalized data on disks at the time

178.239 -> and it reduced the sprawl of the data

180.08 -> the cost of the data

181.84 -> and the ability of the the enterprise to

184.64 -> be able to store more data by

185.92 -> deduplicating this data it was

187.36 -> tremendous value

189.12 -> and you know again 40 50 years ago the

190.959 -> most expensive resource in the data

192.4 -> center was storage and and so this was a

194.56 -> really effective technology and the

196.48 -> technology for maintaining the copies of

198.159 -> data we're not really available but

200.159 -> today fast forward and you know 50 years

202.879 -> and now the most expensive

204.799 -> uh systems in the data center the most

206.64 -> expensive cost in the data center is

208.159 -> really cpu right it's pennies for

209.76 -> gigabytes to store data we talk about

211.68 -> terabytes petabytes exabytes of data

213.76 -> today so it's not about storing the data

216.239 -> anymore it's about processing it and

218 -> that's where the cost comes in so when

219.84 -> you look at nosql it's about you know

222.4 -> basically processing data at scale now

224.72 -> when you want to use new technologies

226.879 -> and no sql isn't new technology you know

228.799 -> they don't work the same way and this is

230.4 -> one of the biggest problems i see teams

232 -> today they kind of get into this you

233.519 -> know idea of i have this big data

235.12 -> challenge you know my relational

236.64 -> databases are slow you know i'm kind of

238.64 -> sharding and partitioning my data to be

240.319 -> able to handle this scale and and i and

242.799 -> there's this thing called nosql that's

244.4 -> supposed to do all this stuff for me and

245.92 -> it's great let's go get it and they kind

247.36 -> of run down this path well you know the

249.2 -> reality is that the innovators have come

250.959 -> along and they've invented something

252.239 -> here's this data pressure that we have

254.239 -> this big data challenge and they've

256.32 -> they've

257.28 -> invented something but the skill set

258.799 -> isn't there yet so people kind of go

260.479 -> they use it they try to use it the same

262 -> way they use the old technology and they

263.68 -> have a miserable experience and this is

265.919 -> typical of the nosql kind of

268.56 -> path in almost every technology path as

270.96 -> it's introduced to market you can't use

273.04 -> it the same way you use the old

274.24 -> technologies you got to learn how to use

275.68 -> it first

276.88 -> as those skills kind of become commodity

278.639 -> in the market

280 -> the early majority lands you know

282 -> developers kind of become familiar with

283.52 -> how to use it they they

285.12 -> the experience becomes a better

286.56 -> experience right so if you find yourself

288.24 -> talking about how nosql is difficult to

290.4 -> use i i don't understand it well

293.52 -> that's good that you that you recognize

295.04 -> you don't understand it let's kind of

296.639 -> learn about it you didn't always

297.919 -> understand how to use that relational

299.36 -> database either right remember that at

301.52 -> some point uh you it was new to you so

304.4 -> let's get into the new technologies and

306 -> learn how they work we'll have a better

307.36 -> experience than trying to use these

309.199 -> things the same way we use the old ones

310.88 -> and and we'll explain a little bit about

312.4 -> what that means when we get into talking

314.08 -> about some of the best practices and and

316.16 -> things for nosql databases in general

318.8 -> so now why nosql

321.36 -> sql is really about

323.12 -> why no sql it's because sql is really

325.36 -> about that normalized relational view of

327.36 -> data you know that the data is you know

329.6 -> well structured for supporting any kind

331.6 -> of access pattern it's it's it's

333.199 -> agnostic to every pattern and what that

335.039 -> really means is it's really optimized

336.72 -> for none right i have to kind of join

338.479 -> indexes across tables and build these

340.24 -> views of data

341.68 -> you know and as such it kind of scales

343.52 -> vertically it works very well on a data

345.28 -> set that's stored in a single container

346.96 -> it works less

348.4 -> efficiently when data is kind of

349.6 -> distributed across multiple containers

351.28 -> and it's kind of good for those olap

353.28 -> workloads where i don't really

354.32 -> understand what the access patterns are

355.6 -> going to be right so that's kind of the

356.72 -> category we fit no that sql works well

359.199 -> in but you know no sql on the other hand

361.12 -> works really well for oltp applications

364 -> which are applications that have

365.52 -> well-defined access patterns if i

367.28 -> understand the access pattern then i can

368.88 -> kind of model the data to kind of

370.319 -> represent those instantiated views the

372.16 -> result sets of those queries more or

374.16 -> less and and be able to to build

376.8 -> hierarchical data storage structures

378.72 -> that are are more you know denormalized

382.16 -> they scale horizontally across multiple

384.479 -> nodes

385.44 -> and it's really built for you know

386.96 -> situations where i where the access

389.039 -> patterns are well understood now this is

390.8 -> great because most applications today

393.44 -> are pretty well understood patterns

395.6 -> right i mean this is what we're writing

397.039 -> code for because the code executes and

398.88 -> it doesn't ask a different question

400.08 -> every time it executes it asks the

401.759 -> actually the same questions every time

404.08 -> and so having a database that's

406 -> optimized to be able to provide the

407.759 -> result set for the questions that your

409.28 -> applications are asking is actually

410.88 -> quite efficient we'll get into looking

412.479 -> at what that really means when we talk

414.4 -> about amazon dynamodb it's a fully

416.8 -> managed nosql database right no sql

418.8 -> databases are great but scaling the

420.8 -> infrastructure behind them is not so

422.479 -> great it's not something that you want

424.24 -> to be involved with as a business it's

425.84 -> not really core

427.199 -> a core value that you want to develop

428.88 -> and this is really where i start working

430.24 -> with customers a lot is when they get

432.319 -> into the scale aspect of things now i

434.479 -> also work with a lot of customers that

435.84 -> are just getting started and and the

437.599 -> reason why they want to use it is

438.8 -> because again a fully managed system i

440.4 -> don't have to understand how to provide

441.919 -> five nines availability it's there out

443.599 -> of the box uh and for those of us that

445.84 -> tried to manage a five nines

447.12 -> availability service you know how hard

448.8 -> that is and dynamodb gives you that at

451.12 -> the click of a button and that's the

452.8 -> kind of functionality that that startups

454.56 -> want for mission critical services right

456.479 -> so again document wide column database

459.199 -> scales to any workload it's fast and

460.8 -> consistent a scale we have workloads

462.96 -> that execute in

464.72 -> you know tens of millions of

465.84 -> transactions per second and provide low

468.16 -> single digit millisecond latency we'll

469.84 -> show you some of that stuff in a minute

471.599 -> uh fine-grained access control over all

473.759 -> your data at the item level the table

475.84 -> level the the attribute level within the

477.68 -> items you can restrict processes to see

479.759 -> certain chunks of data that you want

481.28 -> them to and others to see other other

483.68 -> bits or the whole uh items and and it's

487.199 -> fully elastic and this is one of the

489.039 -> things that we'll talk about when we get

490.319 -> into the performance characteristics of

492.08 -> dynamo but

493.36 -> there's no other database that works

494.8 -> like this it's fully on demand if you

496.56 -> need it to be you get whatever you want

498.16 -> when you want it you pay for what you

499.919 -> use when you use it and you don't even

502.24 -> pay for storage right until you store

504.16 -> items once you delete the items then the

506.08 -> storage costs go away so there's no

508 -> pre-provisioning of resources with

509.599 -> dynamodb and it's what makes it really

511.28 -> nice for the serverless kind of paradigm

513.36 -> that event driven programming model

515.519 -> that's becoming so popular today

517.68 -> now when you look at nosql databases we

519.919 -> talked a little bit about how they work

521.2 -> differently

522.64 -> dynamodb works the same way as all the

524.399 -> sql databases and that's one thing that

526.32 -> you can argue back and forth with other

527.839 -> people but the modeling of the data is

529.44 -> always the same no matter what the nosql

531.12 -> database platform is there's going to be

532.959 -> some sort of collection of objects right

534.56 -> there's in dynamodb we call these things

536.48 -> a table

537.6 -> tables have items uh items have

539.92 -> attributes and not all the items on the

542 -> table have to have the same attribute

543.6 -> okay so this is one of the

545.44 -> differentiators between a nosql database

547.76 -> and relational databases the the items

549.519 -> on the table they all

550.959 -> they're like the rows from all your

552.399 -> tables you push them you put them all

553.92 -> into one place in in mongodb they call

555.92 -> this a collection in cassandra it's

557.92 -> called a key space dynamodb calls it a

560.64 -> table regardless you're pushing items

563.04 -> into an object into a repository and all

565.92 -> of those objects have to have a unique

568 -> attribute to identify what this object

570 -> is that's the partition key in dynamodb

572.32 -> it's under bar id in mongodb or

574.32 -> documentdb and again it's a partition

576.8 -> key in cassandra

578.32 -> but this uniquely identifies that item

580.32 -> within the table so if i have a

581.6 -> partition key only table

583.6 -> in dynamodb or i have a standard

585.839 -> collection with no indexes in in mongodb

588.399 -> then then this is that key value access

590.399 -> pattern everyone talks about nosql is

592.16 -> great for and it is great for the key

593.839 -> value access pattern but the reality is

596.24 -> applications have a lot more complex

597.92 -> access patterns than that and if you

599.68 -> want to try and push all the relational

601.12 -> data that you have into these big blobs

602.88 -> to describe can be described as key

604.48 -> value data structures it's a very

606.24 -> inefficient way to model your data we'll

607.76 -> get into that in a minute but the

609.519 -> reality here is i want to create

610.88 -> collections of these objects that are

612.24 -> interesting to my application and i want

614.079 -> to treat these things more or less like

615.6 -> the rows of the tables from my

616.959 -> relational databases right you don't

618.64 -> query all the rows that are related to

620.32 -> each other all the time you query

621.92 -> subsets of those rows right you work

623.68 -> with little chunks of the relational

625.519 -> hierarchies that represent your data and

627.68 -> we kind of want to do the same thing

628.8 -> with nosql databases right to be able to

630.8 -> work with the data efficiently we got to

632.72 -> be able to get the data in and out kind

634.72 -> of the same way

636.16 -> if i'm pulling big blobs of data to

637.92 -> update small integers i'm burning a lot

639.68 -> of iops to do that and that's a very

641.68 -> common uh uh

643.6 -> bat you know

645.04 -> anti-pattern in nosql so when you hear

647.36 -> people talk about i'm using this

648.56 -> database and such and such because it

650 -> has large document support for like 16

651.92 -> megabytes you got to look at them and

653.2 -> say well why are you doing that you know

654.64 -> i mean you shouldn't be storing data in

656.56 -> big giant documents like that and if you

658.24 -> are then let's let's talk about object

660.64 -> stores like s3 for that data because you

662.48 -> shouldn't be uh

664.079 -> anyways the anti-pattern is using the

666.079 -> large documents anyways what we want is

668.32 -> to create collections of objects that

670 -> are interesting the application so we

672 -> have partition keys in

673.76 -> dynamodb we can create a additional key

677.279 -> a compound key called a a sort key which

680.56 -> gives us the ability to create

681.6 -> collections of objects within these

682.959 -> partitions right now these partitions

684.72 -> become

686.48 -> more or less groups of items that are

689.279 -> interesting to the application in

690.959 -> mongodb i'd start adding indexes to do

693.04 -> this but

694.079 -> in other nosql databases like cassandra

696.32 -> we also have the ability to add sort

697.92 -> keys

698.959 -> some even have much more complex or key

701.44 -> structures than dynamodb but regardless

703.6 -> what we're trying to do is create a

704.88 -> collection of objects and so if you

706.48 -> think about it we can start to model

708.32 -> these one to many and many many

710.079 -> relationships by creating you know

711.839 -> containers full of things if i have

713.68 -> let's say a partition key which is the

715.92 -> customer id and i have a sort key which

718.48 -> might be a customer's interaction date

721.04 -> concatenated by the interaction id

723.839 -> you know customers do what customers

725.2 -> make orders they they make payments they

727.6 -> they get shipments

729.04 -> they make returns they open tickets

730.959 -> right but i want sometimes customer logs

732.959 -> into a portal and says you know what's

734.32 -> my current you know state of my

736.32 -> interactions right with my with my

738.16 -> service i want to get my you know

739.839 -> landing page for the customer portal

742.24 -> i can query this table now and say give

744.079 -> me everything for this partition key for

746 -> customer id x with the sort key greater

748.48 -> than date y and it brings back

750.399 -> everything right all of those rows from

752.399 -> all of those tables all in one query and

754.72 -> this is where nosql databases start to

757.12 -> shine

758.56 -> against

759.6 -> the the relational database by creating

761.279 -> these grouped collections either on the

763.04 -> table or on indexes right and then we

764.88 -> can start to use range queries on those

766.959 -> sort key operators to get filtered

769.519 -> objects from those collections

771.44 -> essentially

772.399 -> recreating the relational models and

774.24 -> we'll talk about this again in more

775.6 -> detail

778.079 -> that you that you actually model in your

779.68 -> relational databases so that's what

781.6 -> we'll do on the table in a nosql

783.04 -> database we'll add additional indexes

785.68 -> mongodb uses indexes for almost all of

787.839 -> this stuff but again it's always the

789.44 -> same thing i want to create you know

791.76 -> decorate these objects with attributes

793.76 -> that will i will group using indexes or

795.76 -> partition key structures in interesting

797.92 -> ways right in dynamodb we we support

800.8 -> secondary indexes on those attributes to

802.88 -> essentially create additional partitions

804.959 -> or collections of objects on secondary

807.04 -> tables right derivative tables you can

808.8 -> think of these things like that are

810.24 -> maintained when you update the objects

812.399 -> on the table all right so you make a

814.48 -> right to the table if you have indexed

816.32 -> attributes on that item it'll be

818.16 -> projected onto these uh

820.16 -> secondary indexes and you can choose how

822.079 -> much of that item you actually want to

823.519 -> project and sometimes your access

825.36 -> pattern doesn't need all the object data

827.12 -> it only needs the keys to identify

828.8 -> certain conditions of these items maybe

831.04 -> you generate a list of items that have a

832.72 -> key condition

834 -> that violates a particular state

836.639 -> and then i'm going to go get those

837.839 -> objects one by one and process them or

840.16 -> what not it could be the access pattern

842.32 -> needs all of the attributes it could be

843.839 -> it only needs some of the attributes

845.36 -> again it depends on how much you need to

848.079 -> uh

848.88 -> get from the access pattern that you're

850.639 -> indexing for and that's a good way to

852.8 -> control your cost because you got to

854.079 -> remember these indexes just like all you

856.24 -> know databases when you write to the

857.6 -> table you write to the index in dynamodb

859.519 -> you project additional data and that

861.68 -> adds to the cost right so if i project

864.639 -> all i'm essentially doubling my rcu cost

866.88 -> every time i update the item so just

868.48 -> remember that that when when you manage

870.8 -> your projections you want you manage you

872.399 -> want to project the data you need

874.88 -> to your gsis

876.72 -> scaling dynamodb is just like all nosql

878.959 -> databases

880.24 -> except that it actually works

882 -> and douglas adams if you're familiar

883.6 -> with hitchhiker's guide to galaxy one of

885.04 -> my favorite authors i love this quote

887.199 -> right

888.16 -> we all want stuff that works we get

889.6 -> technology instead

891.04 -> but dynamodb is great technology right

892.88 -> and nosql databases in general they all

894.959 -> scale the same way dynamodb gives you a

897.68 -> fully managed service and a fully

899.199 -> distributed backplane service that

901.44 -> that does this for you

903.04 -> but essentially what we're going to do

904.32 -> is we're going to position these items

905.68 -> across some arbitrary key space

907.839 -> in dynamodb we use the partition key

909.6 -> attribute to do that we're going to hash

911.04 -> the value within that attribute and

912.48 -> that's going to generate some you know

914 -> hash value that we that we lay out

916.32 -> across this logical key space and as we

918.399 -> scale the system what i'll do is chop

920.16 -> that key space up and start to

922 -> distribute it out across multiple nodes

924.16 -> now when i query the key space i'm going

926 -> to ask for a partition key equality

928.639 -> condition right and that tells me

930.48 -> exactly which one of these partitions i

932.24 -> need to go to

933.44 -> you know or i'm going to have to scan

934.72 -> the table to find those items right so

936.8 -> this is how nosql databases maintain

938.8 -> that kind of fast and consistent

940.16 -> performance at any scale

942.079 -> is by you know essentially chunking up

944.24 -> the the workload so to speak across an

946.56 -> arbitrary number of physical storage

949.44 -> devices dynamodb is no different and one

952.32 -> of the things dynamodb does for you

953.759 -> though is it manages all of that for you

955.6 -> you don't have to you don't have to

956.88 -> create shards clusters replica sets all

959.759 -> you have to do is tell us how much

960.959 -> capacity you need and we manage that

963.199 -> under the hoods for you so if you look

964.959 -> at you know some of the some of the kind

967.04 -> of performance characteristics of a

968.72 -> dynamodb table

970.56 -> you know we have

971.839 -> a billing mode called auto scaling which

973.759 -> gives us that

974.959 -> kind of just in time capacity

976.959 -> provisioning for your standard workloads

979.279 -> right most workloads kind of follow the

980.959 -> sun or you know they have some sort of

982.639 -> curve during the day in which the demand

984.48 -> will increase gradually to a peak and

986.24 -> then kind of ebb out to a valley and

988.8 -> and when you have those types of

990.16 -> workloads this is the net effect of a

992.16 -> system like dynamodb over a legacy

994.16 -> technology legacy nosql provider or a

997.199 -> relational database where i have to kind

999.279 -> of provision for that peak you know so

1001.04 -> if you think about this workload that

1002.56 -> escalates over time and eventually hits

1005.12 -> a high peak and maybe ebbs off

1007.759 -> you know in a relational database

1009.36 -> platform all that area underneath the

1010.88 -> white line or the red line there is just

1012.88 -> wasted dollars that's going that's you

1014.88 -> know hook it up to hook the fire hose up

1016.72 -> to my wallet and let it go uh with

1019.04 -> dynamodb much tighter uh provisioning

1021.759 -> allocation of course because we can kind

1023.68 -> of see the workload scaling we have

1025.439 -> cloud watch metrics we'll react to

1027.76 -> we have some really nice tools out there

1029.36 -> for you to look at

1031.12 -> that one of my fellow essays rob

1033.12 -> mccauley actually demonstrated recently

1035.679 -> and we'll show you in a second when we

1036.959 -> get into the workshop

1038.4 -> on how to kind of look at those those

1040.24 -> auto scaling

1041.52 -> configurations but regardless it's about

1044 -> getting

1044.88 -> using what you need when you need it and

1046.88 -> and that's where dynamodb just shines

1048.559 -> over any other database you don't get

1049.919 -> that the other thing dynamodb gives you

1051.84 -> is performance scale things that you

1054 -> would are counter-intuitive as we

1055.6 -> increase the workload against the

1057.28 -> dynamodb database we're going to

1058.72 -> actually perform better and that's what

1060.08 -> this is a synthetic workload that's what

1061.76 -> this demonstrates is that actually as we

1064.08 -> peak the workload here and reads and

1065.84 -> writes you know i don't know which ones

1067.6 -> which blues reads uh orange's rights and

1070.4 -> we're actually looking at the get

1071.52 -> latency here and we're seeing how that

1073.039 -> drops as we start to peek into the

1074.64 -> millions of transactions per minute per

1076.88 -> second or over a million per second here

1079.919 -> and the reason why that happens is

1081.28 -> because dynamodb is a fully distributed

1083.679 -> backplane service and and we have you

1086.08 -> know millions of request routers out

1087.6 -> there i don't know millions thousands of

1088.88 -> request routers in a region for sure but

1090.88 -> as you start to hit the table with

1092.24 -> millions of requests per second all of

1094.559 -> those request routers become aware of

1096.08 -> your configuration data for your table

1097.919 -> your security iem permissions there's a

1099.76 -> very short-lived cache you know they'll

1101.12 -> look that data up every second or two

1102.88 -> just to make sure it's not too stale but

1104.32 -> if you're hitting me with millions of

1105.44 -> requests per second i'm not having to go

1106.96 -> look it up every time

1108.64 -> and this starts to

1110.64 -> have an impact right the impact is lower

1112.48 -> latency overall for all your requests

1114.24 -> and you can see that happening here as

1115.52 -> we drop from the four millisecond range

1117.36 -> down into the two 2.5 millisecond range

1120.48 -> as we hit millions of reads per second

1122.559 -> this is just not something you expect

1124.24 -> from a regular database a regular

1125.679 -> database as we start to peek out the

1127.2 -> workload 90 95 cpu you're going to see

1130 -> the opposite right you'll see the

1131.28 -> latency spike you'll see the server

1133.039 -> start to go sideways eventually it just

1134.72 -> goes offline

1136.48 -> and dynamodb doesn't actually go offline

1139.039 -> when you when you when you actually ask

1141.039 -> for more capacity than you have

1142.559 -> provisioned in dynamodb dynamodb has a

1145.28 -> this thing we call a burst bucket which

1146.72 -> is five minutes of unused capacity and

1148.799 -> this is a snapchat it's a well-known

1150.48 -> dynamodb customer super bowl sunday this

1152.64 -> is 2019 super bowl uh you know they hit

1155.6 -> about 13 or 12 11 12 million requests

1158.08 -> per second i think they they peaked out

1159.76 -> somewhere around there

1161.12 -> the provision capacity at the during the

1163.039 -> day was about 5.5 million steady because

1165.76 -> that was the floor of their auto scaling

1167.36 -> algorithm we typically recommend

1168.88 -> customers do this on event days they

1170.88 -> expected a lot of traffic if there's a

1172.72 -> big event day like a prime day a super

1174.64 -> bowl something like that then go ahead

1176.48 -> and set your autoscaling floor up to you

1178.32 -> what you would expect the daily peak to

1179.919 -> be right through the event and then drop

1181.919 -> it back down to your normal operating

1183.679 -> levels you know there's no reason to to

1185.44 -> have to challenge the system

1187.36 -> if you know that it's coming and maybe

1189.2 -> set your ceilings up above that so

1190.799 -> you'll scale up if you get too much

1192.24 -> traffic in this case

1193.679 -> you can see what happened the table just

1195.12 -> flew up above its auto scaling floor

1197.919 -> but for the five minutes proceeding it

1199.52 -> wasn't using its full table allocation

1201.44 -> so you know we'll take all that unused

1203.28 -> capacity and apply it to your spikes to

1205.52 -> your peak loads to keep your system

1207.36 -> running even though you're asking for

1209.12 -> you know a transient spike in workload

1211.52 -> that exceeds your provision throughput

1214 -> what happens with a relational database

1215.6 -> when you do this right it's dead it's

1217.6 -> going to be on the floor it might

1218.72 -> actually be sideways and need life

1220.88 -> support at that point when you run it to

1222.72 -> this type of workload who knows what

1224.24 -> happens right when those types of

1225.76 -> systems run out of

1227.44 -> of infrastructure

1229.36 -> capacity dynamodb has more capacity than

1231.679 -> you need whenever

1233.2 -> so there's nothing to worry about that

1234.72 -> now on demand that's another workload uh

1237.2 -> capacity mode that you can look at for

1238.799 -> your workloads

1240.08 -> this gives us a you know kind of uh

1243.52 -> you know whatever you need when you need

1245.039 -> it within some limits you'll always be

1246.559 -> able to allocate at least twice as much

1248.24 -> throughput as you've previously peaked

1249.919 -> to on the table but you're going to get

1251.2 -> it instantaneously right auto scaling

1252.96 -> takes you know 10 to 12 minutes got to

1255.36 -> read those cloud watch metrics we've got

1257.44 -> to respond and react

1259.12 -> to what's happening

1260.72 -> and then apply new allocations to the

1262.559 -> table so on demand happens right away

1264.159 -> but it's more expensive right so uh it's

1266.4 -> good for workloads that are very

1267.6 -> transient right spiky workloads where

1269.52 -> you don't really know when it's coming

1270.72 -> but you need to have the capacity when

1272.08 -> it's there we have lots of customers

1273.6 -> that had you know we're provisioning for

1275.039 -> that peak workload because they couldn't

1276.799 -> you know throttle anything while they

1278.559 -> waited uh so they were paying a lot uh

1281.919 -> and that's not something you have to do

1283.2 -> anymore with on demand so depending on

1284.72 -> the workload can make a lot of sense for

1286.32 -> you

1287.36 -> and i always recommend people kind of

1288.72 -> start there and and see what it looks

1290.799 -> like after a week or so after a couple

1292.559 -> weeks uh it's and then you can use that

1294.559 -> that previous kind of those previous

1296.72 -> metrics from cloudwatch to configure

1298.88 -> your auto scaling

1300.48 -> the other thing you get with dynamodb we

1302.159 -> mentioned a little earlier is five nines

1303.919 -> availability right out of the box you

1305.44 -> know if you drop a the dynamodb table

1307.44 -> into a single region you get four knives

1309.039 -> availability if you click the button

1310.88 -> enable global tables and replicate your

1312.72 -> data to another region we're going to

1314 -> give you five nines availability and

1315.28 -> that's an sla guarantee you know i know

1317.52 -> a lot of services out there and i've

1318.88 -> actually participated in all these

1320.159 -> services over the years where i've had

1321.44 -> the question you know do we really have

1323.6 -> five nines even though you know of

1325.28 -> course the the rfis and rfps that we're

1328.159 -> responding to are all saying you know

1330.08 -> you must be five nines what do the

1331.52 -> vendors say they all say we're five

1332.96 -> nines are they five nines you know how

1334.72 -> do they measure five nines we measure it

1336.88 -> it's up there on the console for you uh

1339.2 -> and it's an sla guarantee so you know

1341.6 -> those are and those service credits are

1342.88 -> automatic and so it's not something

1344.48 -> that's you know we've had to we have to

1346.159 -> you know dole out you know uh it's

1348.24 -> certainly you know we maintain this

1349.679 -> service as a five nine service

1352.4 -> all right so before we get into the

1353.919 -> demos i'm gonna talk a little bit about

1355.76 -> data modeling

1357.36 -> uh in nosql we talked about relational

1359.6 -> data a bit and how you lay that data out

1361.76 -> into the

1362.72 -> uh into the tables but you know it

1364.72 -> really is about relational data when we

1366.4 -> talk about the data that your

1367.44 -> applications are using there is not an

1369.039 -> application service that i've ever built

1370.64 -> that i haven't you know hasn't you know

1372.24 -> involved using relational data it

1374.4 -> doesn't matter if it's social networking

1375.84 -> i.t systems management document

1377.52 -> management any anything you're trying to

1380.24 -> develop for the data has relationships

1382.799 -> if we can't model those relationships in

1384.559 -> the database then the database is kind

1386.08 -> of useless it really is and and when you

1388.08 -> start looking at nosql databases it's

1389.84 -> all about how do i model those

1390.96 -> relationships in the data we explained a

1392.64 -> little bit about you know this this

1393.919 -> indexing and we'll talk about and

1396.48 -> demonstrate how powerful that can be

1398.32 -> when we just break down a simple product

1399.919 -> catalog right this is uh kind of that

1401.679 -> relational data structure we're very

1402.96 -> familiar with we have the one-to-one

1404.72 -> relationship between products books

1406.48 -> albums videos one-to-many between albums

1408.4 -> and tracks many to many between videos

1410.4 -> and actors and this gives us you know

1412.799 -> the ability to run any query you pull up

1414.88 -> any list of products using you know but

1417.44 -> multiple queries uh in a nosql database

1420.88 -> maybe a different approach maybe a more

1422.799 -> naive approach i wouldn't necessarily

1424.24 -> recommend doing this but this is

1425.84 -> actually for this data structure

1427.039 -> probably not such a bad idea is to

1429.52 -> create these kind of hierarchical data

1431.12 -> structures as documents

1432.96 -> um not necessarily you know individual

1435.039 -> rows we'll look at what happens when we

1436.4 -> treat all this data as individual rows

1437.84 -> in a second

1438.96 -> but this is obviously a better data

1440.559 -> structure to retrieve all my products i

1442.24 -> mean for that access pattern or get any

1443.84 -> given product because i always need the

1445.36 -> hierarchy of data so i would create

1447.039 -> these types of hierarchical documents

1448.48 -> and say get me this product by id give

1450.32 -> me all the products just scan the table

1452.32 -> i don't have to write queries what

1454 -> happens on the tables when we load the

1455.76 -> data in and look at the cut the time

1457.36 -> complexity of those queries it becomes

1458.799 -> apparent why no sql databases start to

1460.799 -> shine right so this is treating all the

1462.4 -> data separate rows we're gonna put all

1463.84 -> the data on different tables and we're

1465.6 -> gonna start to look at that time

1466.64 -> complexity you can start to see as we

1468.64 -> join more and more tables into these

1470.64 -> queries the time complexity starts to

1472.4 -> increase

1473.36 -> exponentially right and this is not

1475.679 -> something that gets better i mean this

1477.279 -> sql is pretty rudimentary stuff if you

1479.6 -> get into production databases on a lot

1481.36 -> of complex schemas you know they're

1482.799 -> going to look a lot worse than this and

1484.48 -> it's and on large data sets that really

1486.48 -> starts to chew you up right now let's

1488.64 -> take all those rows and we're going to

1490.32 -> stick them in again another naive

1491.679 -> approach here i wouldn't necessarily

1493.279 -> recommend modeling your data this way

1494.72 -> but to emphasize the point we're just

1496.159 -> going to take all those rows and stick

1497.44 -> them on the same table

1499.039 -> what ends up happening here is a parent

1500.64 -> now all of a sudden we have partitions

1502.159 -> partitions are groupings of objects you

1503.76 -> know right the one-to-one joins i just

1505.279 -> kind of created one object because

1506.799 -> they're the same they're very small

1508.64 -> objects a single item is not a big deal

1511.12 -> as we start to you know look at these

1512.88 -> one-to-many joints now hey guess what

1515.279 -> it's pre-joined all those rows exist in

1517.12 -> the same table you know if these data

1518.64 -> these are all small objects but if they

1520.159 -> were different objects with different

1521.44 -> access patterns you can start to see how

1522.96 -> the efficiency drives here if i need to

1524.559 -> update a single song item i don't have

1526.159 -> to update the entire album i only update

1528.64 -> the song item right within the album and

1531.039 -> this gives me a lot of flexibility when

1532.64 -> i'm accessing the data right but i mean

1534.32 -> for the bulk access pattern and get me

1536 -> this the album there's no join right the

1538.559 -> data is kind of pre-joined it's all on

1540.559 -> the same index essentially that's what

1542.08 -> joint does right it joins indexes if the

1544 -> index if you're joining on attributes

1545.84 -> that aren't indexed then and that's

1547.76 -> probably not a very good query so if i

1549.84 -> put all the objects on the same index i

1551.76 -> get a lot of flexibility right and so

1553.279 -> this is demonstrating that right select

1554.88 -> the movie by movie title this is our

1556.88 -> many to many if you look at that kind of

1558.64 -> graph relationship that a many many

1560.24 -> construct gives us with that mapping

1561.84 -> table those are kind of the edges i kind

1563.919 -> of loaded the mapping table into the

1565.919 -> movie partition and on those mapping

1568.559 -> edges i added a little bit of the data

1570.559 -> from the actors row their their their

1573.2 -> gender their birth date right things

1575.2 -> that are kind of you know that that's

1576.64 -> data that doesn't change right

1578.88 -> i don't have to worry about

1580.64 -> uh you know that data updating

1582 -> frequently and and you know even if the

1583.76 -> data updates intermittently then we've

1585.36 -> got some systems to be able to handle

1586.96 -> that now streams and lambda can

1588.32 -> guarantee those updates but this is that

1590.559 -> a kind of a directed graph right if i

1592.799 -> look at the movie it knows about its

1594.159 -> actors the actor doesn't know much about

1595.679 -> the movie

1596.96 -> but if i go and i flip the keys on the

1599.44 -> table into an index right so now the

1601.279 -> index has a

1602.88 -> partition key which is actually the sort

1604.4 -> key of the table and

1605.84 -> and the sort key is the is the partition

1607.6 -> key from the table

1609.2 -> and kind of see how we can see a whole

1610.72 -> bunch of different relationships

1612 -> expressed here like all the i want to

1614 -> get all the books for a given author i

1615.52 -> want to get you know all this all the

1617.039 -> albums the song that has been on right i

1618.96 -> mean i was i want to get you know hey

1620.96 -> all the all the

1622.72 -> movies that an actor's been in this is

1624.32 -> the other side of the many to many and i

1626 -> you know i haven't denormalized any of

1627.52 -> the movie data here but you easily could

1630.08 -> you know depending on your access

1631.36 -> patterns uh and if that's like imdb i

1634.159 -> might want to see that i might want to

1635.36 -> see hey here's the movies he's been in

1636.799 -> the roles he's played the summary of the

1638.32 -> movie and i'll click here for more right

1640.88 -> and so this is kind of traversing the

1642.72 -> graph right if i want to go and see all

1644.64 -> the movies that directors you know

1646.399 -> directed i can see all the albums that a

1648.799 -> musician has produced

1650.64 -> and again as i start to extend this

1652.48 -> concept into the extended attributes

1654.399 -> right these things can have whole items

1656.24 -> can have state give me everything a

1657.52 -> given state can be every given state in

1659.2 -> a given you know

1661.279 -> area or whatnot you can start to see how

1663.679 -> these relational access patterns can be

1666.24 -> expressed without having to

1668.559 -> denormalize the data very much right we

1670.72 -> get into many many relationships

1672.24 -> depending on the nature of the

1673.2 -> relationship we probably want to do a

1674.48 -> little bit of denormalization but again

1677.2 -> if you have things like you know a

1679.6 -> standard access pattern standard

1681.279 -> relational queries we can absolutely you

1683.6 -> know express that those relationships in

1685.36 -> a nosql database it's kind of like there

1687.2 -> is no such thing as non-relational data

1689.2 -> that's what i like to say if you hear me

1690.799 -> if you've kind of listened to me for a

1691.919 -> while you know i don't say that much

1694.48 -> all right let's talk a lot about complex

1696.159 -> queries complex queries of things like

1698.24 -> counts sums aggregations the types of

1700.72 -> things that you know are hard to do with

1702.96 -> nosql right because they're kind of

1704.32 -> computed on the fly right and one of the

1706.399 -> things we learned at amazon we started

1708 -> doing these types of queries right give

1709.36 -> me all the downloads for a given song

1711.44 -> well i don't want to go count all the

1712.72 -> download events every time someone comes

1714.48 -> in and says give me the download you

1715.84 -> know somebody goes to their ui and it

1717.279 -> shows a counter for downloads for a

1718.72 -> given you know track and amazon music of

1720.72 -> course not so we have some summary data

1722.799 -> that gets me it gets kind of you know

1724.559 -> scanned and and and compiled and then

1727.2 -> updated on a regular interval and that's

1729.12 -> what you see and that's kind of the same

1730.799 -> way we do things with dynamodb with

1733.36 -> dynamodb we have a built-in change data

1735.52 -> capture pipeline that we can uh you know

1737.919 -> automate these types of activities and

1740.08 -> that's what a lot of people do and

1741.679 -> there's a lot of really good use cases

1743.12 -> for this all the all the data events all

1744.88 -> the writes the updates the inserts the

1746.72 -> deletes any type of data change event

1749.52 -> hits dynamodb stream triggers a lambda

1751.919 -> function and we can do those things like

1753.919 -> maintain

1754.96 -> you know summary aggregation reports

1757.36 -> just right back to it to an item on the

1759.36 -> table that contains all those summary

1760.88 -> metrics now every time someone wants to

1762.399 -> come in and say hey you know what's my

1764.399 -> you know last 30 day running total for x

1767.84 -> that's what your lambda maintains so i

1769.52 -> just select the item i don't have to

1771.039 -> compute the result this is a really

1773.36 -> really fantastic way to do this at the

1775.279 -> same time we can start pushing that data

1777.36 -> out to third-party systems or external

1780.159 -> systems like elasticsearch for you know

1782.399 -> higher level indexing functions if i

1783.919 -> need full text geospatial or index

1785.76 -> intersections things like this i can

1787.279 -> push subsets of my data into

1788.559 -> elasticsearch and to support those

1790.559 -> access patterns

1792.08 -> we can put the data into kinesis fire

1793.679 -> hose roll it up into parquet files and

1795.44 -> shove it you know into s3 this gives us

1797.76 -> the ability to query the data from

1799.12 -> athena i like to do this a lot with

1800.88 -> these running aggregation workflows as

1802.64 -> i'd like to kind of maintain these

1803.84 -> running aggregations and then at the

1805.679 -> same time you know have this audit trail

1807.679 -> event flow because i can always query

1809.52 -> the athena

1810.64 -> you know data at the end of the day and

1812.48 -> i can make sure that my summary

1813.679 -> aggregations are completely accurate

1815.52 -> because one thing about lambda is it's

1816.88 -> guaranteed at least once execution so

1818.96 -> the container does fail the lambda

1820.48 -> function might you know re-execute on

1822.159 -> the same data once or you know very

1824.159 -> infrequently but it can happen more than

1826.32 -> once so it's nice to have the audit

1828.64 -> trail because i could just write the

1829.84 -> queries for the summary of the data the

1831.84 -> summary against the data let that

1833.919 -> execute at the end of the day it's going

1835.679 -> to take a while to run against that s3

1837.44 -> data but when it comes back i can just

1839.6 -> you know sanity check my summary

1841.52 -> aggregation reports to make sure that

1843.12 -> nothing's you know off if i if i need a

1845.52 -> fine degree of accuracy that's a good

1847.2 -> pattern

1848.399 -> notify third parties of change execute

1851.36 -> downstream workflows this is again it's

1853.52 -> code so you can do anything with code

1855.279 -> and it's all up to you

1856.96 -> uh dynabody streams and lambda is

1859.039 -> completely managed behind the scenes

1860.64 -> it's nothing you need to worry about as

1862.24 -> your table expands so do the stream

1864.08 -> processors

1865.279 -> as the workload on the table expands the

1867.2 -> stream processors will scale up and down

1869.12 -> behind the table but there is one thing

1871.2 -> you need to worry about is the uh

1874.08 -> lambda execution timeline or or timing

1877.6 -> right if you if you if the work that

1879.6 -> you're doing within your land of

1880.72 -> functions is too much then they start

1882.559 -> falling behind the stream and if the

1884.08 -> stream activity is constant and steady

1886.159 -> then eventually you're gonna run out of

1888.399 -> stream buffer and after 24 hours you

1890.96 -> start to lose you know data off the edge

1893.039 -> of the stream so you know those those

1894.72 -> like what they call iterator age we want

1897.2 -> to make sure that the iterator age and

1898.72 -> your lambda functions is is relatively

1901.039 -> low or manageable and that if it's

1903.279 -> increasing it's through bursts and not

1905.039 -> through straight you know steady

1906.399 -> activity now you can increase the

1908.279 -> parallelization of your lambda functions

1911.2 -> so that if i if i have a function that's

1913.039 -> a little heavy it can i can run more

1915.2 -> concurrency and execute more batches and

1917.76 -> con you know as concurrent operations up

1920.08 -> to five uh but honestly my advice is if

1923.36 -> you have work that's slowing down your

1924.72 -> lambda functions then let's go ahead and

1926.32 -> offload that and pass it off to a step

1928.64 -> function or push it into sqs or

1930.559 -> something like that and let it let it

1932.399 -> execute asynchronously not in line

1935.76 -> with the lambda function because again

1937.2 -> that iterator age you just don't want it

1938.72 -> to get too old otherwise the processing

1940.96 -> you're trying to do is stale right and

1942.64 -> that's no good

1943.76 -> so

1944.48 -> this is kind of the overview right the

1945.84 -> serverless app the front end

1947.76 -> right and we come into

1950.159 -> the system dynamodb is the backbone it's

1952.159 -> the it's where all the data stays you

1954.159 -> know gets stored for a serverless app

1955.76 -> but on the back end of dynamodb is

1957.519 -> another ecosystem of tooling that gives

1959.2 -> you an amazing collection of

1960.84 -> functionality that people can use

1963.84 -> to extend the application so to speak so

1966.32 -> you know it's like dynamodb at this

1967.919 -> point it's more of a

1969.84 -> it's more than an application you know

1973.039 -> than a database it's a data processing

1974.799 -> engine there's a whole set of operations

1977.12 -> that can occur behind the data because

1978.64 -> of this change data capture processing

1980.559 -> pipeline and that at least once

1982 -> execution guarantee from lambda and

1983.6 -> we're going to demonstrate some of that

1985.2 -> as we get into the next portion of our

1987.519 -> presentation all right so that's what

1989.44 -> i've got for you from the presentation

1990.88 -> today we're going to talk a little bit

1992 -> about some of the functionality now i

1994.08 -> get into some demonstrations some

1995.44 -> hands-on i'm going to show you guys some

1996.799 -> code i'm going to show you guys some

1998.399 -> workflows some exciting new

2000.48 -> functionality around particle and the

2001.919 -> new apis and and actually demonstrate

2004.08 -> why why is it that this single table

2006.08 -> design that we just talked about is so

2007.919 -> much more efficient than than doing

2009.519 -> things the way we used to relationally

2011.12 -> right why shouldn't i put all my data on

2012.48 -> many tables right we're going to

2013.519 -> demonstrate some of that stuff for you

2015.36 -> again let's we're going to talk about

2016.799 -> that in a minute and let's get into the

2018.559 -> demos

2023.2 -> all right so for our first demo we're

2024.64 -> going to take a look at the particle api

2026.88 -> particle is a sql-like construct that is

2030.399 -> provided for users to access their

2031.84 -> dynamodb tables and gives you a familiar

2034.24 -> syntax to work with your data

2036.799 -> it doesn't necessarily support all the

2038.32 -> functions of sql right like a relational

2040.88 -> database doesn't you know nosql does not

2043.12 -> join tables but it supports all the

2045.279 -> relevant operators

2046.96 -> and it and it removes the requirement of

2049.359 -> the of the developer to understand the

2051.28 -> intricacies so to speak of the dynamodb

2053.679 -> query api right with the low level api

2056.32 -> or the document api

2058.159 -> users are responsible for determining

2059.76 -> which actions to opt to run whether it's

2061.599 -> a get item a put item or a query or a

2064.159 -> table scan and with the particle api it

2067.2 -> works a little differently you give us

2068.72 -> some cr query criteria based on those

2071.679 -> criteria we kind of look at it and say

2073.2 -> hey you gave me enough i understand what

2074.96 -> i need to do whether it's going to be a

2076.24 -> query or a get item or a scan depending

2078.639 -> on what you you give us

2080.879 -> the the request router is going to make

2082.56 -> a decision about what operation to run

2084.48 -> right so with this comes a couple of

2086.56 -> caveats right sometimes maybe you know

2088.72 -> you don't necessarily want things to

2090.399 -> happen right i don't necessarily want

2092.079 -> table scans to occur if i don't give it

2093.76 -> enough data in my query i'd rather just

2095.599 -> fail and so we're going to show you how

2097.52 -> to do that so let's take a quick look at

2099.119 -> that table that we're going to be

2100.079 -> working with this is a data table that's

2102.56 -> uh contains synthetic stock trades i

2105.28 -> might use this for a lot of my demos

2106.8 -> right we can actually use it a little

2108 -> later when we talk about global tables

2109.68 -> and export s3 and whatnot we're going to

2111.359 -> push this raw data out there and take a

2113.52 -> look at how we can get at it

2115.359 -> but for this uh demo we're really just

2117.2 -> going to be querying the items on the

2118.4 -> table we're going to be looking for you

2119.76 -> know what is particle doing when we make

2121.76 -> these uh these these queries what type

2123.76 -> of operations are running uh so the

2126.079 -> first thing we want to do is set up a

2127.76 -> user policy you know for our particle

2130.8 -> you know process right whatever's

2132.56 -> running we want of course least

2134.16 -> privileged access and in this case we're

2136.48 -> going to give it uh you know access to

2138.16 -> the particle api only insert select

2140.4 -> delete so we're going to assume this is

2141.68 -> a maybe a workload that's running in our

2144.88 -> production environment and these are the

2146.8 -> operations that we need access to

2148.8 -> and so let's go ahead and switch over to

2150.64 -> that client view and see what happens

2153.839 -> when we actually execute some of these

2155.2 -> queries right so

2156.8 -> as you can see here we're we're now a

2158.64 -> different user i'm the demo user with

2160.64 -> our policy that we just defined and i

2163.2 -> can't even see the tables right i don't

2164.96 -> have list tables process or privileges i

2167.52 -> only have you know particle privileges

2169.599 -> right so

2170.72 -> let's go ahead and run a simple particle

2172.56 -> query we're going to select star from

2174.64 -> the data table

2176.32 -> where

2177.359 -> our pk

2178.96 -> is equal to some string value

2181.76 -> and our sk is equal to some other string

2185.359 -> value and we'll grab those in a second

2187.04 -> here one thing about particle remember

2190.079 -> uh syntax is important you might and

2192.48 -> statement there or it's going to fail

2194.64 -> uh quotes go around entities things like

2196.72 -> tables attributes what not double quotes

2199.2 -> things like values single quotes and

2201.359 -> very important because it'll crash uh if

2203.76 -> you don't uh you know it won't parse the

2206.16 -> queries properly if you're not

2207.76 -> surrounding like your string values with

2209.2 -> those single quotes and your entities

2211.04 -> with those doubles all right so let's

2212.72 -> grab the data we need

2214.4 -> this is uh our partition key

2217.599 -> s

2219.119 -> 3

2220.4 -> 97

2221.839 -> and our sort key

2223.599 -> is going to be

2226.839 -> h is it

2229.68 -> 1768

2232 -> 68 okay so we should have a transaction

2234.56 -> on the table uh with that particular

2236.96 -> partition key in that particular sort

2238.32 -> key and with these query conditions i'm

2240 -> giving particle enough to know that it

2242.24 -> should run again item right because i'm

2243.68 -> giving it both a partition key

2245.68 -> equality condition and sort key equality

2247.76 -> condition that's great that tells

2249.76 -> particle give me give me that item

2255.04 -> so

2256.4 -> uh

2258 -> select star from need to add our from

2260.72 -> condition

2262.16 -> there we go we add from and we should be

2264 -> able to run this now

2266.56 -> there we go okay so we got our item this

2268.56 -> is the item we expected to see

2270.48 -> and we've got a uh you know exactly what

2273.52 -> we expected right and the partition key

2275.2 -> has 43 and hash 97 and again this is a

2278.24 -> get item if i remove the sort key

2280.8 -> condition i can

2282.88 -> force it to run a query in this

2284.4 -> particular case if there was more than

2286 -> one item in this partition it would

2287.599 -> bring back more than one item but

2288.88 -> there's not there's only one item so it

2290.56 -> only brings back the one item

2292.48 -> however if i provide only the sort key

2295.119 -> condition then i'm not giving it enough

2297.2 -> information to know whether it can run a

2298.8 -> query or a get item so it's going to

2300.64 -> default to a table scan and let's see

2302.56 -> what what that looks like it actually

2304.16 -> brings back the same item but we got a

2305.92 -> little thing here that says hey query

2307.76 -> can execution did not finish searching

2309.76 -> all the items what does that mean it

2311.599 -> means it brought back more than a

2312.8 -> megabytes worth of data

2314.72 -> the item we were looking for happened to

2316.16 -> be in that megabyte but as you can see

2318.16 -> this is a little tiny item and i just

2319.599 -> burned a megabyte with the worth of rcu

2321.839 -> to get it you know i can scan the rest

2323.52 -> of the table it doesn't bring back any

2324.96 -> other items this is uh this is not the

2327.52 -> operation we want to execute right i

2329.359 -> would rather this query fail than burn

2332 -> that kind of you know rcu right so how

2334.56 -> do we make that happen and if we go back

2336.72 -> and take a look at our policy editor we

2338.96 -> can we can see what kind of tweaks we

2341.28 -> need to make so to speak to our policy

2343.2 -> to actually ensure that scans don't

2346 -> happen

2347.2 -> one second grab this policy condition

2350.72 -> and let's edit this policy one more time

2354.48 -> i'm going to go to the json view and i'm

2356.64 -> just going to add a deny operation here

2358.96 -> right and the deny condition is going to

2361.04 -> be on particle select which is odd

2363.28 -> because again i'm allowing particle

2364.88 -> select but what i'm denying inside of

2366.56 -> particle select is a very specific

2368.24 -> operation which is the full table stand

2370.4 -> right when the full table scan condition

2372 -> is true

2373.04 -> then that operation is going to get

2374.48 -> denied and so let's see what ends up

2376.56 -> happening if i review and save this

2378.079 -> policy change we can now i've got a

2381.04 -> couple old versions we need to step on

2384.079 -> and so now this is my new policy here

2385.92 -> give it a second to propagate

2387.839 -> but essentially i'm allowing this uh

2390.079 -> process now to insert select and delete

2393.119 -> but select is only going to work when

2394.88 -> full table scan is false the full table

2396.96 -> scan is true then that's going to deny

2398.8 -> that operation okay so let's see what

2400.48 -> happens when i go back over to the

2402 -> console

2403.04 -> and look at that

2404.8 -> and run our query

2406.8 -> so this is the same query again we

2408.319 -> completed before we brought this item

2410 -> back we paginated it across if i run now

2412.64 -> it should come up it's uh error

2415.04 -> executing the command right access

2416.72 -> denied we don't have authorization to

2418.88 -> run that scan so this is how we're going

2421.52 -> to limit particle operations from

2423.92 -> executing

2426.24 -> procedures that we don't want them to so

2428.079 -> to speak we can give them a table scan

2430.56 -> deny

2431.68 -> insert select delete operations

2433.52 -> specifically and conditions on those

2435.119 -> operations accordingly

2437.119 -> to control what the particle api is

2439.52 -> doing so don't be afraid of it actually

2441.599 -> i'm loving it more and more the more i

2443.44 -> use particle the more i i'm embracing it

2445.839 -> it has a whole bunch of functionality in

2447.44 -> there that's not necessarily available

2448.96 -> in the document api

2450.72 -> batch execute statements with multiple

2452.4 -> conditions is probably the number one

2453.92 -> thing right in the document api when i

2456.319 -> do a a

2458.079 -> a batch get

2459.839 -> uh or

2461.28 -> i can't i can't add multiple conditions

2463.599 -> i can't run multiple queries as as a

2466 -> batch and particle you can so there's a

2468.16 -> lot of functionality there for you and

2470.319 -> with this fine-grained access control

2473.04 -> you can you can govern what's happening

2476.16 -> at a very granular level so

2479.119 -> great technology and let's take a look

2481.28 -> at our next demo here

2489.599 -> for our next demo we're going to do is

2491.2 -> take a look at a streams lambda

2492.839 -> aggregation and we're actually going to

2494.72 -> be doing two things we'll be aggregating

2496.319 -> the items that are coming into our

2497.52 -> trades table

2499.359 -> uh into a summary table this can be by

2501.28 -> security and the other thing we're going

2503.52 -> to be doing is pushing that raw data out

2505.92 -> through kinesis firehose stream into an

2507.839 -> s3 bucket and we're gonna be able to run

2510.079 -> athena queries for audit trail uh and

2512.8 -> validation of our of our aggregation

2514.96 -> function right so uh take a look at the

2517.599 -> data again we're using the same table

2519.2 -> that we used last time

2521.52 -> the data table as items come into this

2523.839 -> table

2524.8 -> these are individual trades that contain

2527.04 -> information about the trade what was the

2528.8 -> security uh was the type of buy or sell

2532.079 -> how many shares what was the time stamp

2534.16 -> right what region it was generated in

2536.24 -> and what the price

2537.52 -> was so that's all the information that

2538.88 -> we actually want to capture is raw data

2541.68 -> and the way we're going to do that

2543.68 -> is by again exporting the data with

2545.839 -> kinesis fire hose

2547.839 -> using a glue catalog schema definition

2550.8 -> to be able to format the data into a

2552.4 -> parquet file

2554.079 -> drop it into s3 and then we'll be able

2556.16 -> to query that data with athena using

2558 -> straightforward sql

2559.92 -> to get those audit trail workflows which

2561.599 -> is a really good workflow right any of

2563.119 -> the change data on your table

2565.28 -> can export into an s3 bucket and be

2567.04 -> available this way

2568.72 -> you know for those audit trail workflows

2571.2 -> all right so let's take a quick look at

2572.96 -> the glue catalog definition here

2575.359 -> aws glue is really a schema definition

2577.44 -> tool

2578.4 -> you can scan existing data repositories

2580.56 -> and build schema you can logically

2582 -> define schema manually in this

2583.92 -> particular case since the data doesn't

2585.52 -> already exist it won't exist until we

2587.28 -> actually push it into the catalog we

2589.04 -> need to define the schema up front and

2591.52 -> so that's what we're going to do we're

2592.48 -> going to have the stream proc db we're

2594.319 -> going to find a single table inside of

2596.079 -> that database which is our trades table

2598.24 -> and if you look at the definition of the

2599.599 -> trades table it contains all those

2601.44 -> fields we just talked about right what

2603.2 -> is the security what is the type of the

2605.68 -> trade the shares price region timestamps

2608.16 -> so on so forth and really all again all

2610.64 -> this is a schema definition it's just

2612.56 -> telling uh whatever system is

2614.4 -> referencing glue that the data catalog

2616.319 -> that it's trying to build uh using the

2618.319 -> raw data that is trying to process

2620.4 -> it should look like this okay the next

2622.88 -> thing we want to do is configure that

2624.24 -> kinesis fire hose right fire hose is all

2626.72 -> about

2627.599 -> you know delivery streams pushing the

2629.2 -> data that it's receiving into a

2630.8 -> destination

2632.079 -> it batches these things up and drops

2634.079 -> things in and batches into that uh

2637.359 -> your particular location in this case

2638.88 -> it's a s3 bucket and i've got multiple

2641.599 -> stream processors defined because my

2643.359 -> account uses default limits

2645.76 -> for the fire hose so i didn't want to go

2647.92 -> through the hassle of increasing those

2649.2 -> limits so i just spread the data out

2650.72 -> across multiple fire hoses to increase

2652.48 -> the bandwidth or throughput

2654.48 -> of the system these limits can be

2656.56 -> increased on a per stream basis so you

2658.48 -> don't necessarily have to run multiple

2659.839 -> streams but in my case i went ahead and

2661.44 -> did that

2663.04 -> it's pretty straightforward process for

2664.72 -> configuring

2666.16 -> a delivery stream uh essentially all

2668.88 -> you're doing is telling it where to go

2670.24 -> and what catalog to use

2672.16 -> and that's that's essentially what this

2673.76 -> does it says use the glue catalog from

2675.68 -> the oregon region uh we're gonna

2678 -> transform these into apache parquet file

2680.8 -> uh we've got uh the stream uh proc db is

2684 -> database we're gonna use and the table

2686.319 -> format that you know the tables table

2688 -> schema that we're gonna use is the

2689.28 -> trades table that's defined within the

2690.96 -> stream proc db schema and that's that's

2693.52 -> it so once you've got that you tell it

2695.28 -> what location to put it put it into

2697.76 -> we've got an s3 bucket defined for the

2699.52 -> stream proc db and we're good to go so

2702.319 -> at this point now i've got a whole bunch

2704.319 -> of data streams defined we're going to

2705.839 -> be spraying the data across i've got uh

2709.119 -> you know i've got the

2710.72 -> fire hose set up for grabbing the

2714.64 -> individual

2716.8 -> or pushing the data into those uh

2718.4 -> parquet files in s3 and now what we're

2720.96 -> gonna do is take a look at the lambda

2722.319 -> function that we're gonna be running

2723.52 -> here

2724.319 -> uh which you know is not terribly

2726.48 -> complicated but it does quite a bit of

2728.079 -> work

2729.119 -> if we look at the

2730.96 -> functions that we have defined in the

2732.319 -> system right now i've got the stream

2733.76 -> processor function is the one we're

2735.04 -> going to be working with

2736.48 -> this guy's triggered off of the dynamodb

2738.24 -> table and if we look at that trigger

2740.079 -> definition it's telling us it's coming

2742.16 -> off the data table right so

2744.48 -> all the write operations off of the data

2746.88 -> table every single time an insert update

2749.44 -> delete you know any kind of write

2751.2 -> operation occurs on that table it's

2752.72 -> going to fire this stream processor

2754.4 -> function

2755.359 -> and the stream processor function again

2757.76 -> it's not complicated code but it does

2759.839 -> quite a bit of stuff so let's go ahead

2761.28 -> and just break it down bit by bit right

2763.52 -> first thing we want to do is instantiate

2765.2 -> an instance of the document client for

2766.88 -> dynamodb

2768.4 -> we're going to set some configuration

2769.68 -> parameters right we want to know what

2771.359 -> region we're in

2772.64 -> we won't don't want to process records

2774.319 -> from any other region so this is a

2776.079 -> pattern i use constantly with global

2778.24 -> tables if you have master master

2780.079 -> configuration you always want to tag

2781.68 -> those items with a source region and why

2784.8 -> do i do this because you know global

2786.56 -> tables doesn't recognize the rights that

2788.48 -> are coming as replicated you know from

2790.16 -> other regions it just you know executes

2792.56 -> the same process and executes on any

2794.24 -> right to the table and so you know if

2796.64 -> you don't want to process the items that

2798.079 -> are generated or updated in other

2799.52 -> regions then you'd want to tag those

2801.76 -> items with the region label so you know

2804.16 -> to drop them and that's what we're going

2805.599 -> to do in this logic if it's not the

2807.04 -> region that we're expecting we're just

2809.04 -> going to drop the items uh next thing we

2811.119 -> want to do is instantiate our firehose

2812.8 -> client

2813.92 -> as is for sending that data into s3

2816.4 -> uh and you know a couple things we want

2818.64 -> to do with the fire hose client you know

2820.48 -> eventually tell it what stream processor

2822.4 -> and what not to build but that happens

2824.4 -> inside this handler function okay so the

2826.4 -> handler function fires on every single

2828.319 -> record that comes into the system

2830.96 -> and you know we're going to declare some

2832.72 -> global variables here and one of the

2835.44 -> things that's important is telling us

2836.72 -> which stream processor to use remember i

2838.72 -> talked about my fire hose configuration

2840.559 -> i've got 10 stream processors team

2842.72 -> stream processors zero to nine

2844.96 -> essentially this is what i'm doing on

2846.72 -> every one of these lambda locations i'm

2848.24 -> going to use a different processor so as

2850.16 -> we start to scale the system i'm going

2851.76 -> to be spreading that data out across you

2853.68 -> know those 10

2854.88 -> fire hose streams and then the only

2856.96 -> thing i need to do again

2859.119 -> build the container for the right

2861.359 -> essentially it tells it what stream to

2862.88 -> go after

2863.92 -> it tells us you know the array of

2865.68 -> records to push onto that stream and

2867.76 -> that's the parameters object that we're

2869.2 -> going to push into the fire hose put

2871.52 -> as we get into the uh actually

2873.52 -> processing the records

2875.44 -> this is where we're going to start to

2876.64 -> look at things we care about right if

2879.04 -> it's not a insert

2881.28 -> it's not coming from my source region

2883.28 -> then i don't care about it just drop it

2885.04 -> this processor is you're going to ignore

2886.88 -> those items if it is coming from the

2888.8 -> source region i expect

2890.72 -> if it is an insert

2892.559 -> then i'm going to you know skip anything

2894.64 -> that's not actually a transaction so i

2896.88 -> know that on this table all i'm doing is

2898.559 -> inserting transactions so no problem but

2900.64 -> to future proof this uh function just in

2903.44 -> case down the road i start adding

2904.88 -> additional item types i i just want to

2907.04 -> make sure that i'm i'm actually you know

2909.359 -> processing the records that i want to

2910.96 -> process once i know that's a process

2913.119 -> that that is actually a

2915.2 -> a a transaction a stock transaction then

2918 -> i'm going to go grab all of those top

2920.64 -> level metrics from it the price the

2922.319 -> shares the region the type you know

2924 -> security timestamp all of that stuff

2927.119 -> we're going to

2928.88 -> push that into a security counter right

2931.52 -> as lambda processes records in batches

2934.16 -> if i have high velocity trades happening

2935.92 -> on single security i'm going to end up

2938.16 -> getting dozens or maybe hundreds of

2939.839 -> those in a single batch and to be able

2941.92 -> to kind of reduce the pressure on the

2943.52 -> summary table i kind of want to

2944.88 -> summarize all those items inside of the

2946.64 -> lambda function before i push a single

2948.4 -> update to the summary table right so

2950.72 -> that's what we're doing here if this is

2952.24 -> the first time i've seen a trade for

2953.599 -> this security in this batch of items

2955.52 -> we're going to create a new counter for

2956.8 -> that security and then what we're going

2958.48 -> to do is just summarize buy shares to

2960.96 -> buy total buy orders and all that buy

2963.28 -> you know by each security for buys and

2965.599 -> sells

2966.72 -> and then if we if we're processing the

2968.88 -> item then we're going to push it onto

2970.559 -> the fire hose

2972.88 -> container so that we can push it to you

2975.2 -> know through the fire hose stream into

2978.8 -> the s3 for our

2980.48 -> our

2981.68 -> historical queries right

2983.52 -> so this is kind of again this is the

2984.88 -> method that does the big work again all

2986.4 -> this code's gonna be available for you

2987.76 -> i'll post a link to the github repo uh

2989.92 -> where you can pull this down and play

2991.119 -> with it yourself uh but this this

2993.2 -> handler function is where all the work

2995.359 -> happens uh inside these event records

2998.4 -> for each loop this is where the

2999.68 -> summarization happens once we summarize

3002.079 -> the records we're going to go ahead and

3003.44 -> batch them up create wrap that fire hose

3006.48 -> uh

3007.2 -> put into a promise

3008.88 -> uh and then we're gonna start to build

3010.559 -> our update expression for the summary

3012.319 -> items all right we're going to you know

3014 -> update the totals for the buy sell you

3016.4 -> know uh

3018.4 -> trades for each individual for the

3020.079 -> individual security uh they all have the

3022.559 -> same update expression they just use

3024.079 -> different

3025.04 -> uh you know parameters we're gonna

3026.96 -> initialize counters for the next day all

3028.88 -> right this is important if i'm

3029.92 -> maintaining a summary item and it has 30

3032.079 -> days worth of trailing totals well

3033.599 -> tomorrow i need to have the counters so

3035.44 -> i can start incrementing them so on

3037.52 -> every update during the day we're going

3039.44 -> to set tomorrow's counters to zero that

3041.119 -> way we know they're going to be there

3042.8 -> when we start processing items for

3044.4 -> tomorrow's trades and then it's just a

3046.64 -> matter of getting a promise for each one

3048.16 -> of those updates once we push all those

3050.559 -> promises onto an array we're going to

3052.16 -> await those promises

3053.839 -> when everything comes back hopefully

3055.44 -> true we're going to move on and we're

3057.839 -> good to go right so this is again the

3060.319 -> kind of workflow we're going to go

3061.44 -> through for two things summary

3063.119 -> aggregation which is where pushing all

3065.04 -> those stock trades into those summary

3067.04 -> counters and historical audit trail

3070 -> queries where we're pushing all the raw

3072.24 -> trade data out into an s3 bucket that

3074.96 -> we're going to be able to query later

3076.48 -> okay right and let's take a look at what

3078.16 -> that actually looks like if i go into

3080.64 -> you know the athena console

3082.8 -> i can actually see what's been happening

3084.64 -> here as i run this scenario over and

3086.72 -> over again over the months we've stored

3088.24 -> all that historical data from all those

3090.319 -> synthetic trade transactions and what

3092.64 -> i'm going to do here is i'm actually

3094 -> going to go into the aws data catalog

3096.079 -> right which is our glue data catalog i'm

3098.079 -> going to pull the stream proc db so

3100.4 -> let's take a look at what that looks

3101.44 -> like here when we go run that query if i

3103.28 -> go and execute against the trades table

3105.28 -> you can see it takes some time to come

3106.8 -> back it's not the snappiest query

3108.8 -> performance because again it's uh

3111.92 -> you know it's part k files s3 athena is

3114.72 -> really a map reduce engine on top so

3116.96 -> yeah it took a little while to get the

3118.24 -> data at two seconds or so uh we scanned

3120.96 -> through about 29 megabytes of trade data

3123.76 -> there

3124.72 -> we got a little over three million or so

3127.44 -> trades that have been recorded 31

3129.04 -> million trades okay and so what's nice

3131.599 -> about this now is i can go do all kinds

3133.28 -> of stuff i can get my top in you know i

3135.28 -> do this is just to run whatever sql you

3137.28 -> want you can run your top end against

3139.2 -> your

3140.079 -> trade volume by security and this gives

3142.88 -> me my top end buy sell volume by volume

3146.88 -> and and again there's some nice

3148.96 -> ways to be able to go out there and look

3150.64 -> at slice and dice the data in

3152.559 -> interesting ways that

3154.319 -> you don't necessarily run at high

3155.52 -> velocity i don't need to necessarily run

3157.44 -> low latency queries against this data

3159.44 -> set

3160.4 -> it's uh it's an extremely useful way

3163.76 -> to be able to

3165.68 -> run those audit trail queries right that

3167.52 -> we all need i know i want the historical

3169.359 -> data but i don't want to have to pay to

3170.8 -> keep it online most of those audit trail

3173.119 -> queries aren't things that last or they

3175.04 -> run at a high frequency right they might

3177.28 -> run you know a couple times a month

3178.88 -> somebody needs to come and do some

3180.64 -> some troubleshooting maybe a couple

3182 -> times a day and and it's not really a

3184.16 -> real-time

3185.599 -> workflow here if it takes a second or

3187.28 -> two to get that data in this particular

3188.8 -> case it took us three seconds to scan

3190.4 -> through about 100 megabytes of data

3193.04 -> to produce these top-end

3195.28 -> aggregations uh you know that's just

3197.44 -> fine for an analyst at the end of the

3199.04 -> day who wants to run and get some

3200.319 -> numbers right this is not something that

3201.92 -> needs to run in milliseconds so a really

3204.8 -> good workflow here that we just

3206 -> demonstrated be able to take that change

3207.76 -> data through the streams uh

3210.16 -> you know cdc pipeline and do two things

3213.2 -> aggregate the data back to the table if

3215.359 -> we go back and look at that summary

3216.8 -> table we can actually see what the

3218.16 -> summary items look like

3219.92 -> this is the data table that contains the

3221.68 -> summary information that we built and

3223.92 -> you can see you know those items

3225.359 -> essentially by security partitioned on

3227.44 -> security id

3229.599 -> and

3231.04 -> they give us that daily

3232.96 -> uh

3233.76 -> 30-day trailing right in this case we

3235.52 -> built a couple days worth of

3238.48 -> trade data but as you can see it gives

3240.96 -> us exactly what we build in our stream

3243.04 -> process in our lambda function as we you

3246.079 -> know update our buy shares sell shares

3249.2 -> buy totals buy orders what not

3252.319 -> when we go back to our our summary table

3254.8 -> in our dynamodb we see exactly those

3256.8 -> data that data aggregated and pushed in

3259.2 -> on a daily basis by security so again

3262.559 -> aggregate by security uh push data out

3265.119 -> to s3 for audit trail uh uh change data

3268.4 -> captured pipelines into uh s3 for audit

3271.28 -> trail queries uh really good workflows

3273.68 -> for your lambda functions and you know

3275.92 -> just so that you guys have the the the

3278.16 -> code available there's a a nice repo up

3280.88 -> here that was put together by one of my

3282.559 -> peers uh rob mccauley and this code

3285.76 -> lives up there uh we can they'll post

3287.92 -> this in the chat uh folks can go ahead

3290.16 -> and and take a look at this repo there's

3292.319 -> a couple of things in there we didn't

3293.599 -> really go over uh cost modeling uh for

3296.88 -> uh uh

3299.04 -> table scans versus you know index

3301.44 -> queries uh auto scaling calculator what

3303.92 -> not the code that i was running today to

3306.079 -> do all this was in the table loader and

3308.4 -> it includes the lambda function and

3309.92 -> whatnot that i was just going through so

3311.599 -> if you want to go ahead and pull that

3312.799 -> stuff up and kind of play with the code

3314.48 -> yourself

3315.599 -> it's all up there for you in this repo

3318.4 -> again i hope you guys got something out

3320.72 -> today's session i'm gonna be available

3322.64 -> here for the next 20 or 30 minutes if

3324.24 -> you guys have questions and uh

3326.88 -> thanks a lot for taking the time to

3328.079 -> watch the session uh if you want to hit

3330 -> me up on twitter my name's rick houlihan

3332.319 -> and this is my twitter handle and again

3335.119 -> thanks so much for hearing this session

Source: https://www.youtube.com/watch?v=LJ4R5fnY45c