Graph Database introduction, deep-dive and demo with Amazon Neptune - AWS Virtual Workshop

Aug 16, 2023

Graph Database introduction, deep-dive and demo with Amazon Neptune - AWS Virtual Workshop

With Amazon Neptune you can build and run identity, knowledge, fraud graph, and other applications with performance that scales to more than 100,000 queries per second. Neptune allows you to deploy graph applications using open-source APIs such as Gremlin, openCypher and SPARQL. Since Neptune is a fully managed database service, there is no need to worry about hardware provisioning, software patching, setup or backups. Many people are not familiar with graph databases which is why this workshop will introduce the cutting-edge use cases for graph databases that span fraud detection to personalization. This workshop will cover the architecture and all of the key features of Neptune. We will also use this time to do a demo of Neptune in action.

Learning Objectives:
* Objective 1: Understand the benefits of Amazon Neptune by going over use cases including fraud detection, personalization and advertising targeting.
* Objective 2: Dive deep into how open-source APIs can be used to deploy applications in Neptune.
* Objective 3: We will also use this time to do a demo of Neptune in action.

***To learn more about the services featured in this talk, please visit: https://aws.amazon.com/neptune/ Subscribe to AWS Online Tech Talks On AWS:
https://www.youtube.com/@AWSOnlineTec…

Follow Amazon Web Services:
Official Website: https://aws.amazon.com/what-is-aws
Twitch: https://twitch.tv/aws
Twitter: https://twitter.com/awsdevelopers
Facebook: https://facebook.com/amazonwebservices
Instagram: https://instagram.com/amazonwebservices

☁️ AWS Online Tech Talks cover a wide range of topics and expertise levels through technical deep dives, demos, customer examples, and live Q\u0026A with AWS experts. Builders can choose from bite-sized 15-minute sessions, insightful fireside chats, immersive virtual workshops, interactive office hours, or watch on-demand tech talks at your own pace. Join us to fuel your learning journey with AWS.

#AWS

Content

3.06 -> [Music]

7.279 -> hello everyone

8.559 -> my name is dave beckberger i am a senior

10.8 -> graph architect on the amazon neptune

12.799 -> service team and i'm here today to talk

14.799 -> to you a little bit about graph

15.679 -> databases and then it's going to go into

17.68 -> a little bit of a deep dive in a demo on

19.84 -> amazon neptune which is aws's

21.92 -> purpose-built graph database offering

24.48 -> so let's jump right into it

26.88 -> so today when we work with customers we

29.76 -> see graphs being used for all types of

31.599 -> applications why is this

34 -> because graphs are very good at modeling

35.84 -> relationships that are not necessarily

37.84 -> easily represented or retrieved with

40.399 -> other types of databases out there today

43.12 -> if we look at this simple graph

44.399 -> representation it's pretty easy without

46 -> me having to tell you any other

47.2 -> additional information that you can

48.559 -> figure out that alice lives in any town

50.8 -> and that she works with bob

53.12 -> this is because greg you know this

55.039 -> represents how graphs and the graph the

57.68 -> graph way of looking at problems is very

60 -> intuitive to people

61.6 -> because they rep it does a very good job

63.28 -> of representing the natural way that we

65.519 -> think about data and connections in this

67.84 -> example you know

70 -> we're looking at these and we have what

71.76 -> we would consider from a graph

72.96 -> perspective a couple of nodes those

74.799 -> nodes representing the entities or the

76.56 -> real world objects here in this case

78.08 -> being alice bob in any town

80.32 -> and we have these these lines or these

82.32 -> connections between things which

83.759 -> represent the relationships between

85.439 -> these real world objects in this case

87.2 -> lives in and works with and when we look

89.52 -> at this and because of the way that

91.2 -> graphs and graph databases store this

93.28 -> data it really they really allow you to

96.079 -> explore these relationships and patterns

98.24 -> and this type of connected data in ways

100.479 -> that other can't uh whether ways of

102.72 -> other data stores and data structures

104.56 -> can't

106.88 -> customers are also very excited about

108.479 -> graphs and they're really especially

109.92 -> excited about managed graph services

112 -> since neptune was released in may of

114.079 -> 2018 customers have built many types of

116.64 -> applications on top of neptune but when

118.88 -> we think about it uh when we when we're

120.88 -> working with customers we kind of

122.159 -> broadly general generalize excuse me

124.88 -> these these

127.119 -> applications into a couple of common use

129.44 -> cases we see

130.72 -> the first common use case we see with

132.48 -> neptune is fraud detection and this is

134.72 -> exactly what you think it is we're

136.08 -> trying to use graphs to help find the

137.84 -> bad guys graphs are uniquely helpful for

140.48 -> fraud detection as they really enable us

143.44 -> as developers and as users of the system

145.68 -> to find the deep links and deep

147.28 -> connections and patterns of connections

149.52 -> in the data that really don't aren't

152 -> easily uh easily found using other sorts

155.04 -> of uh systems out there

157.92 -> the second use case we we generally see

159.92 -> is what we call identity graphs and

161.84 -> identity graphs are really based on the

164.16 -> concept of you know something like a

166.64 -> user is going to come to your website

168.4 -> for multiple different uh

170.56 -> different areas maybe they are going to

172.08 -> connect to it from their phone and their

173.599 -> work computer and their home computer

175.599 -> and their tablet and we really want to

177.28 -> be able to connect these these

179.599 -> these disparate interactions with the

181.84 -> user together in such a way that we can

184.8 -> kind of general uh generate a golden

187.04 -> record of kind of the the canonical form

189.68 -> of that data that a user can actually

192.72 -> so we can actually use that to help

194.159 -> provide things like personalized

195.599 -> recommendations or marketing ad

197.84 -> segmentation things like that

200.239 -> third use case we we often see with

202.239 -> customers are one of knowledge graphs or

204.56 -> knowledge organizations

206.319 -> and this is a really about connecting

208.159 -> disparate data silos inside of a company

210.64 -> together in such a way that you can

211.92 -> really get a holistic view of a specif

214.64 -> of of pieces of information and things

217.2 -> that are connected to that piece of

218.4 -> information

219.92 -> let's say you're something like an

221.36 -> e-commerce website and you have a

223.36 -> database that contains all of your

225.04 -> information about the products you sell

226.72 -> and one that contains all this

228.08 -> information about your customers and one

230.239 -> that contains all the information about

231.599 -> your inventory inside your warehouse

233.92 -> maybe you have another one that's

234.799 -> shipping and you want to be able to

235.76 -> connect all of these together in such a

237.439 -> way that you can look at all of the

239.76 -> information around a specific product or

241.76 -> customer or user something like that

244.159 -> and that would kind of fall into what we

245.439 -> call a knowledge graph type use case

248.64 -> the last one we have is the last common

250.4 -> use case we've seen is security graphs

252.239 -> and we've recently seen a real uptick in

254.239 -> customers interested in using graphs for

256.639 -> security based systems

258.32 -> security in general is sort of a graph

260.479 -> problem because it's really you know

262.24 -> security of for anything being physical

264.72 -> or logical or application security or

268 -> your cloud infrastructure security is

270.4 -> really about layers about multiple

272.4 -> different layers of security all being

274.08 -> connected together

275.68 -> in such a way that you want to be able

277.84 -> to look at that to be able to find

279.28 -> potential uh paths of of potential

283.6 -> um you know malfeasance or paths through

286.56 -> the graph and how things may or may not

289.199 -> be exposed to the internet for say when

291.52 -> they should or shouldn't be um so that's

293.68 -> sort of kind of what us you know one one

296.08 -> way we can look at security graphs and

298.4 -> this is really kind of a really

299.6 -> interesting and fast upgrading or

301.52 -> upcoming segment for uh graph based

304 -> solutions

305.28 -> so what are some other common business

306.96 -> problems that you think uh that would

308.8 -> work with customers when we think about

310.32 -> graphs and graph type problems problems

312.16 -> really needing those connect highly

313.44 -> connected data well as i kind of

315.68 -> mentioned already the first one is

316.96 -> people come to us saying they need to do

318.32 -> things like be better at detecting fraud

320.24 -> and fraudulent transactions inside their

322.08 -> system

324.16 -> maybe they want their customers to have

326.08 -> a better or more personalized

327.759 -> recommendation experience than they're

329.44 -> able to provide today

331.68 -> there's that knowledge graph use case if

333.199 -> we want to connect together those siloed

335.12 -> data sources inside our enterprise to

336.88 -> really kind of build out a you know an

338.4 -> entire

339.44 -> an entire platform that contains all of

341.84 -> the knowledge inside of our system

344.32 -> maybe you have those multiple websites

346.24 -> or you have multiple applications and

347.919 -> you need to link together disparate

349.919 -> customer identities in these systems to

351.759 -> kind of get that canonical or that

353.52 -> golden record

355.199 -> you know that

356.639 -> maybe you have machine learning

358.319 -> algorithms and you want to be able to

359.44 -> use the connections in your data to

361.44 -> improve those algorithms to give you

363.039 -> better sorts of answers

366.319 -> these sorts of questions or ones that

368 -> kind of are if you went out and you you

370.08 -> looked online you did some searching

371.68 -> around you would probably come back

373.039 -> these sorts of questions are really good

375.84 -> common business use cases for graph type

378.479 -> problems but there's also a wide array

381.12 -> of not so easily recognized graph

383.039 -> problems ones that when you

384.56 -> you know at first glance may not make

386.08 -> sense or you may not think of them as

387.44 -> graph problems but they really do lend

389.68 -> themselves very well to being solved

391.44 -> using graphs

392.8 -> first one example is you know what are

394.72 -> the risks in your it infrastructure your

396.639 -> supply chain you know any sort of i.t

399.28 -> infrastructure or supply chain it tends

401.759 -> to get very complicated very quickly

403.6 -> there's a lot of you know in the case of

404.96 -> a supply chain you have a lot of people

407.039 -> that are you know you have a lot of

408.4 -> products each of those products has its

409.919 -> own bill of materials each of the bill

411.759 -> of the items in those bill of materials

413.919 -> has one or more suppliers and those

415.759 -> suppliers have suppliers and those

417.52 -> suppliers have suppliers have suppliers

419.199 -> and being able to kind of look at the

421.12 -> overall risk portfolio of your supply

423.759 -> chain or your iq

425.199 -> infrastructure is a great use for our

427.039 -> graph

428.16 -> where did this data come from this is

430 -> this sort of question of being able to

431.599 -> track the lineage or the provenance of

433.84 -> data is one that we often talk with data

435.759 -> engineering teams about uh

438.4 -> these lends itself very well to a graph

440.4 -> because if you think if you can think

442.08 -> about it when you're sort of working

443.759 -> with you know any sort of data

445.199 -> engineering problem it's really about

446.72 -> taking data from one source doing some

448.72 -> sort of extract transformation and load

451.84 -> type process and usually loading it into

453.52 -> something else maybe you're combining or

455.599 -> aggregating that data but you probably

457.68 -> but when you aggregate that data you

459.199 -> still want to be able to track where did

460.639 -> this data come from maybe you need to be

463.039 -> able to track this to be able to comply

464.479 -> with privacy regulations like cccp or

466.879 -> gdpr and you want to be able to track

470 -> not only where this data is currently

471.919 -> but where where all the places it was

473.36 -> used intermittent

475.12 -> intermediately to be able to you know

478.24 -> clean up those things to be able to

479.84 -> understand the accuracy and the

482.96 -> efficacy of the data that you're working

484.56 -> with

486.479 -> why don't your search results relate to

488.56 -> the specific question was asking this is

490.4 -> a common use case we we see with

492.16 -> customers is they have search results

493.759 -> but their search results are a little

495.12 -> bit lackluster and a little

497.12 -> less clear than they would like them to

498.96 -> be so being able to use a graph to be

501.12 -> able to build that you know

503.759 -> those sorts of knowledge graphs where

505.039 -> you're connecting these data together to

506.4 -> be able to give you more relevant and

508.72 -> related answers to the types of

510.4 -> questions people are asking

512.959 -> how does person x have access to

515.36 -> information why

517.68 -> this is a security another security

519.919 -> graph type use case where you really

521.279 -> want to be able to you know maybe map

522.88 -> out the permissions to that folders and

525.519 -> files have by active directory groups i

528.16 -> was working with one customer where

529.839 -> their specific use case was they were

531.519 -> you know looking they wanted to

533.2 -> specifically look for

534.88 -> uh access or people that have acts that

536.959 -> had access to certain files and folders

539.519 -> through a very very long set of

541.2 -> connections because if they had a you

542.72 -> know if

544.399 -> i was giving direct access to this file

546.72 -> or folder it probably means i was

548.16 -> intended to have it but if i had to do

549.76 -> that if i had access to this file and

551.839 -> folder through multiple sets of groups

553.6 -> and permissions maybe it was an

555.2 -> unintended consequence of giving me a

557.519 -> permission was giving me access to this

559.36 -> this critical business information and

561.44 -> they want to be able to kind of track

562.64 -> that down and look for that sort of

564.399 -> thing

566.24 -> and you know things like

568 -> about your cloud infrastructure your

569.44 -> cloud infrastructure is a very good

571.76 -> example of a security graph type use

573.6 -> cases

574.8 -> being able to look at how different

576.48 -> things are used inside of it through a

578.64 -> wide array and a very large number of

581.36 -> variably connected data if you think

582.88 -> about your cloud infrastructure you're

584.16 -> going to have things like iam policies

586.399 -> which are connected to roles which are

587.92 -> then going to be connected to one of any

589.6 -> number of different types of entities be

591.44 -> they lambda functions or databases or

594.08 -> ec2 instances and you have a very

596.32 -> variably connected set of

598.8 -> entities that you're trying to look at

600.64 -> but when i really kind of sit down if i

602.16 -> wanted to kind of boil down graph

603.76 -> questions the types of questions that

605.2 -> graphs are good at answering

607.12 -> really for me it comes down to the where

609.04 -> why and how questions you know where are

610.56 -> the risks where did the data come from

612.959 -> why don't the search results do

614.399 -> something how about this person how is

616.72 -> this role being used things like that

618.88 -> and these tend to be good uh graph

621.12 -> questions because they have a few things

622.56 -> in common first they tend to navigate

625.279 -> variably connected structures of data

627.44 -> you know especially if you wanted to

628.48 -> think about this

629.68 -> in terms of looking at your cloud

631.519 -> infrastructure

632.64 -> the cloud infrastructure as we kind of

635.2 -> discussed a minute ago is really it's a

637.6 -> highly variable set of different

639.04 -> entities you have you know vpcs you have

641.92 -> enis you have iam rules iam policy all

645.12 -> of these are connected together against

646.64 -> other ec2 instances or databases and

648.959 -> being able to look at that and easily

651.44 -> move through that sort of information is

653.68 -> an area where graphs tend to excel

657.279 -> they also send to excel at questions

659.36 -> where you need to filter or compute a

661.279 -> result based on the strength weight or

663.2 -> quality of a relationship so in the case

664.959 -> of something like a supply chain risk

666.88 -> management being able to look at not

668.8 -> only the fact that these two things are

670.32 -> connected but how important is this

672.32 -> supplier to this person what other sorts

675.279 -> of

676.32 -> info what other sorts of backup

678.16 -> suppliers may they have for specific

679.839 -> things to be able to use that to help

681.12 -> calculate an overall risk of your supply

683.12 -> chain or where the risk or to find the

684.959 -> riskiest parts of your supply chain

687.04 -> is an example where using you know where

689.279 -> using the the connections in your data

691.76 -> is extremely important to be able to get

693.36 -> that answer

695.44 -> and finally recursing or requiring

698 -> traversing unknown numbers of

699.68 -> connections and this is really an area

701.6 -> where graphs really do excel

704.16 -> um and this is where you have questions

706 -> that are a bit open-ended you know let's

708 -> take a look at the example of how does

709.519 -> person x have access to information why

711.839 -> you know they may have been given direct

713.279 -> access to this information or they may

715.36 -> have this information or access to this

717.44 -> information through a wide array of

719.68 -> different connections through maybe

721.04 -> different active directory groups and

722.48 -> things of that nature

724.72 -> but you won't know exactly the number of

726.72 -> connections or how they're connected at

728.72 -> the time you're initially or you're

730.399 -> you're originally looking at the query

732.56 -> all you know is you want to find out how

733.839 -> two people or in two entities are

736.56 -> related inside this so this is really

738.72 -> sort of when i think about graphs and

740.399 -> graph type problems these are the sorts

741.839 -> of problems i really look for

744.88 -> where graphs benefit the you know the

747.36 -> the end use case quite

749.44 -> significantly

753.12 -> and why is this well there's a few

754.639 -> challenges around using

756.8 -> many other technologies with highly

758.959 -> connected data

760.32 -> first there they tend to be a little

762.079 -> unnatural for querying that data

764.56 -> and this tends to lead to

766.48 -> an inefficient processing of that sort

769.279 -> of information

771.36 -> and most other databases out there or

773.2 -> other data technologies out there tend

774.56 -> to have a rigid schema that's really

776.32 -> inflexible for rapidly changing data

779.92 -> that we that most of the types of use

782.639 -> cases we've talked about today be they

784.32 -> fraud graphs or knowledge graphs or

786.24 -> security graphs or identity graphs

788.639 -> really tend to require

790.8 -> so let's dive a little bit into that

792.24 -> what is it about graphs that actually

794.8 -> make them better to handle this sort of

796.88 -> highly connected data

798.639 -> well the first aspect here is the query

800.88 -> languages the query languages that we

802.639 -> use with graphs are really optimized to

805.04 -> use the connections

807.04 -> to move through through the network of

809.2 -> data that you're looking at and this

810.8 -> comes down to the fact that graphs

812.399 -> databases and manage graph services are

814.399 -> based on graph theory and one of the

815.839 -> kind of key pieces of graph theory is

817.68 -> this concept of traversing your data or

819.76 -> moving from point a to point b so if we

822 -> what looked at this specific example

824.32 -> we're looking at this and it says dave

826.079 -> works at amazon um the little gremlin

828.639 -> guy is sort of uh representing that's

831.44 -> that's the logo for the apache tinker

833.12 -> pop gremlin project and that's

834.56 -> representing kind of where we are in our

836.8 -> information today when i write queries

839.199 -> and graph query languages

841.36 -> as opposed to kind of uh you know when

844 -> the way you we work with them we'll see

845.6 -> an example of this later it really

847.76 -> taught you know they really work by

849.279 -> moving data from point a to point b so

851.519 -> you're i'm moving through my my graph or

853.519 -> my network from dave to amazon if we

856.16 -> want to contrast this with something

857.6 -> like a relational database relational

860 -> databases work on relational algebra set

862.72 -> set out algebra and they work by

864.8 -> combining sets of data so if we wanted

866.399 -> to kind of look at this in the same you

868.56 -> know the same thing uh the same example

870.48 -> here i would probably have a table

872.56 -> called something like person in a table

874.399 -> called something like company i would

876.48 -> perform a joint on this in order to get

878.24 -> the fact that dave is a person at a

880.16 -> company um

881.92 -> that i work at that company through some

883.68 -> sort of foreign key between those tables

886.24 -> and

887.04 -> the way you know at its very core kind

888.72 -> of the way that that relational

891.04 -> databases work as opposed to moving from

892.88 -> point a to point b in a graph you know

895.12 -> when i move it from point a to point b i

896.639 -> don't necessarily mean unless i

897.839 -> explicitly ask for it i don't

899.199 -> necessarily maintain the history of

900.959 -> everywhere i've been in relational

902.88 -> databases as i'm joining these tables

904.8 -> together i'm building a bigger and

906.32 -> bigger table in memory

908.639 -> in theory in memory that basically is

911.04 -> containing all of the information and

912.56 -> all of the history of where i've been

915.279 -> this is why when you start running large

917.44 -> queries that have to traverse or move

919.6 -> through a lot of data in order to get

921.279 -> there

922.32 -> you know graph databases because i'm

924.56 -> moving point a to point b as opposed to

926.48 -> building a bigger and bigger in-memory

928.16 -> table or more efficient uh from a memory

930.32 -> perspective and a speed perspective

932.32 -> we're gonna do it

933.68 -> the other aspect there is graph

935.6 -> databases are really optimized for

937.839 -> processing connected data at the kind of

940.48 -> engine level let's you know because

942.399 -> graph databases store not just the

944.959 -> entities but the connections

947.04 -> this really gives them the you know uh

949.36 -> the advantage of the fact that the

950.56 -> connections that you're working on are

952 -> data itself this means that they're

953.6 -> physically saved to disk so when i need

956.24 -> to actually retrieve this data in order

958.24 -> to know i want to move from point a to

959.68 -> point b across the connection i'm really

962.079 -> just reading data again i'm reading data

964.639 -> off of disk to retrieve that information

967.12 -> if we contrast this again with something

968.48 -> like a relational database

970.32 -> the worksack connection in this example

972.48 -> is really metadata it would be

973.68 -> represented through something like a

975.12 -> foreign key between those per that

976.639 -> person and company table

978.639 -> so when i want to find out what company

980.8 -> somebody works at i need to actually

982.48 -> calculate that at random time i need to

984.32 -> to

984.959 -> i need to run that relational algebra to

986.88 -> calculate that as opposed to being able

988.72 -> to retrieve from disk so when i start

991.279 -> needing to process

992.639 -> hundreds or thousands or hundreds of

994.72 -> thousands or millions or billions of

996.16 -> these sorts of relationships the fact

998 -> that i can retrieve them from this

999.279 -> versus calculate them at runtime really

1001.36 -> does lead to a much more efficient

1002.88 -> processing of that sort of information

1007.6 -> and the last kind of uh item i wanted to

1009.44 -> touch on here is a little bit about

1010.72 -> schema flexibility

1012.8 -> when we're looking at these two uh

1015.199 -> these two examples up here we see that

1017.68 -> we have you know these are both

1019.04 -> representations of family trees

1021.68 -> with a graph and with with amazon

1023.759 -> neptune and with most graphs they they

1025.6 -> tend they're they're they're what's

1026.72 -> known as a schema-less database i

1028.48 -> personally not my favorite terminology

1030.4 -> because if you have data you have schema

1032.799 -> so i like to think of it more in terms

1034.4 -> of explicit versus implicit schema so in

1037.199 -> the case of a graph when i start

1039.919 -> with its schema less nature or its

1041.839 -> implicit nature of schema i can just

1044.799 -> start writing information to my system i

1047.36 -> can write a person and i can start

1049.44 -> writing a property of a first name or a

1050.88 -> last name

1052.4 -> i don't have to declare these ahead of

1053.919 -> time i don't have to set up tables or

1056.16 -> keys or constraints around that i can

1058.4 -> just start writing that information to

1060.16 -> my system as i as the data that's coming

1063.039 -> in or the maybe the attributes of that

1064.48 -> data changes or we add new

1066.64 -> uh maybe we add a new type of data a new

1068.72 -> new set of entities to my graph i can

1070.559 -> just start writing those and they will

1071.76 -> be automatically included into the

1073.44 -> schema of my graph

1075.919 -> this provides a lot of flexibility

1077.52 -> especially as data evolves over time if

1079.919 -> we want yet again want to compare that

1081.679 -> and contrast that a little bit to a

1082.88 -> relational database relational databases

1085.28 -> have uh explicit schema i need to

1087.28 -> declare ahead of time that i have an

1088.88 -> individual and that individual has a

1090.559 -> first and a last name

1092.24 -> um

1093.36 -> you know you do this in in in pretty

1095.36 -> much any relational database today

1097.679 -> and this is very powerful until you want

1099.6 -> to start evolving this at scale and

1101.36 -> speed where you have to you know every

1103.039 -> time you go and do this you're gonna

1104.16 -> have to run some sort of

1105.919 -> um you know schema migration or schema

1108.559 -> diff process in order to basically bring

1110.48 -> things up to speed versus being able to

1112.08 -> just start writing the new entities as

1113.84 -> they come in

1115.679 -> there's also a bonus here which is the

1117.2 -> fact that graphs tend to be easier to

1120 -> understand by new people or

1121.36 -> non-technical people that aren't

1122.64 -> familiar maybe with the domain this is

1124.799 -> because when we look at these

1126.32 -> representations you know the

1127.6 -> representation of the graph is a lot

1129.84 -> more natural to the way we already are

1131.36 -> looking at the domain than necessarily

1133.919 -> looking at some erd diagram that you

1135.679 -> have to kind of dissect and put back

1137.84 -> together in order to figure out how

1139.44 -> everything's going to be related between

1141.12 -> them

1143.919 -> so it's a little bit about graphs and

1146 -> and why graphs are better for some sorts

1148 -> of problems than other technologies but

1150.4 -> why is it customers come to us why do

1152.32 -> they want to use a graph database

1153.84 -> service you know as we kind of mentioned

1155.84 -> uh other data traditional database

1157.6 -> technologies out there really aren't

1159.2 -> built to scale when you want to do like

1161.679 -> deep link querying or deep link analysis

1163.6 -> across billions of interconnected

1165.28 -> entities either you know hundreds of

1166.96 -> thousands millions or billions of

1168.4 -> interconnected entities for that matter

1171.2 -> and because of this they really

1172.4 -> challenge it really challenges those

1174.08 -> sorts of engines to deliver the low

1175.84 -> latency required for real-time

1178 -> inspection for of

1180 -> you know things like fraudulent you know

1181.679 -> looking for things like fraudulent

1182.88 -> activity or personalized recommendations

1185.36 -> or other potential you know potential

1187.2 -> malicious activities or things like that

1190.24 -> self-managed solutions also tend to be

1192 -> complex expensive and inflexible

1194.96 -> especially if you want to try and

1196.16 -> optimize them and scale them to the

1197.679 -> global nature of many of today's

1199.44 -> applications they require a lot of

1201.44 -> hardware management provisional

1204.24 -> provisioning you have to manually scale

1206.4 -> up and down these sorts of items and

1208.64 -> being able to do this in a compliant way

1211.36 -> as schema changes and evolves over time

1213.76 -> is very uh you know complex this really

1216.64 -> leads to a lot of these solutions not

1218.24 -> being able to evolve at the speed and

1220.72 -> scale that the landscape of today's data

1222.72 -> is changing

1223.919 -> and this is really why we went and built

1225.76 -> amazon neptune

1227.28 -> amazon neptune is a fully managed

1230.08 -> purpose built graph database

1232.559 -> built for the cloud it was released in

1234.72 -> may of 2018

1236.24 -> and since then we have been continuously

1238.08 -> improving it with new features and

1239.84 -> functionality based on feedback from

1242.24 -> customers

1243.6 -> as i said it's a fully managed service

1245.36 -> so aws takes care of all the hardware

1247.44 -> management we manage the os the database

1250 -> server and we uh we manage all of this

1252.64 -> through

1253.52 -> uh you know

1255.28 -> we manage all of this through a couple

1256.72 -> of ways you know all it takes is you as

1258.72 -> a user a few clicks in the management

1260.96 -> console or a few calls to the api to

1263.44 -> provision a new neptune cluster and

1265.6 -> it'll be up and running in a matter of

1267.2 -> minutes

1270 -> you know one of the core needs of any of

1271.52 -> these sorts of systems is the ability to

1273.28 -> scale based on demand so it's critical

1275.28 -> for it for most businesses to be able to

1277.44 -> support burst you know traffic that that

1280.72 -> fluctuates over over time maybe it's

1283.28 -> seasonal traffic built to burst on black

1285.44 -> friday or thanksgiving maybe it's

1288 -> something uh maybe you only your

1290.559 -> application gets more traffic in the

1292.32 -> evening versus the day and being able to

1294.08 -> dynamically scale is kind of a critical

1297.039 -> aspect of the way we built neptune from

1299.919 -> the uh from the get-go

1302.799 -> and you know the the last aspect here i

1304.799 -> want to kind of talk about is cost

1306.159 -> reduction

1307.2 -> in in many

1308.72 -> and many other database offerings

1310.799 -> because they don't scale very linearly

1313.76 -> you tend to over provision hardware

1316.32 -> and database servers and v cpus things

1319.039 -> like that

1320.48 -> to meet your peak demands because of the

1322.96 -> scalable nature of of neptune

1325.52 -> we really allow you to reduce the cost

1327.919 -> for you know to reduce the cost overall

1329.919 -> and add a lot more predictability to

1332 -> your cost by being able to fluctuate

1334.08 -> yours your hardware that you're running

1336.159 -> your database on up and down as demand

1338.08 -> happens

1339.2 -> neptune uses a pay-as-you-go model so

1341.039 -> you're going to only pay for the amount

1342.32 -> of time your database server is actually

1343.919 -> running so this helps reduce costs

1346 -> further by not over provisioning

1348.24 -> overspending and under utilizing this

1351.6 -> uh as we mentioned neptune at its very

1353.6 -> core is a purpose bill craft database

1355.2 -> that's optimized to store and map

1357.2 -> billions of relationships between

1359.36 -> entities it does this to enable

1361.44 -> real-time connections with millisecond

1363.84 -> query response times it does this

1366.159 -> through

1367.12 -> support for

1368.64 -> the three open standard query languages

1370.96 -> open

1372.24 -> open specification or open standard

1373.84 -> query languages those being open cipher

1376.08 -> gremlin and sparkle

1378.4 -> does this through being able are by

1380.72 -> supporting the two leading graph models

1383.36 -> out there the first graph model out

1384.799 -> there is property graph property graph

1387.28 -> at its very core represents data through

1389.6 -> the use of nodes representing real world

1392.32 -> entities edges representing connections

1394.72 -> between those real world entities and

1396.559 -> attributes representing properties of

1398.72 -> either a node or an edge i think that's

1400.96 -> kind of a key piece to

1402.64 -> add in there one of the unique aspects

1404.96 -> of property graphs is the ability to

1406.96 -> associate

1408.559 -> uh associate properties not only with

1411.2 -> the entities themselves as you can with

1413.28 -> many other database technologies but

1415.2 -> also the ability to associate those

1417.6 -> properties specifically with the

1419.6 -> connection between entities so maybe you

1422 -> have a person and maybe they have a

1423.679 -> product and you have an edge that says

1425.6 -> maybe they were bought well you can also

1427.52 -> store you know besides just saying the

1429.52 -> fact that you know a bought product b

1432.159 -> you can also add a properties or

1434.64 -> metadata to that edge maybe you want to

1436.4 -> add where that you know how they bought

1438.32 -> it to or the date they bought that as a

1440.88 -> property of that connection itself so

1443.12 -> it's kind of a unique aspect of property

1445.12 -> graphs if you choose the property graph

1447.36 -> model in neptune you have the ability to

1449.12 -> query that data through one of two open

1452 -> open specification query languages one

1454.32 -> of those being open cipher query

1456.24 -> language which is a c

1457.6 -> provides a sql inspired syntax for

1459.44 -> customers to use the second being the

1461.279 -> apache tinkerpop gremlin query language

1463.6 -> uh which is a very powerful query

1465.2 -> language that look

1467.279 -> well we'll look at both of these here in

1469.039 -> a little bit but uh look and compare

1471.2 -> them but gremlin is a very almost stream

1473.84 -> processing oriented query type language

1477.039 -> the second graph model that we store in

1479.2 -> neptune or that you can choose to use in

1481.279 -> neptune is the resource prescription

1483.6 -> framework or rdf

1485.2 -> you might have heard this referred to as

1486.88 -> the semantic web construct but rdf

1489.52 -> represents data inside a graph as a set

1491.919 -> of triples each of those triples

1493.84 -> containing a subject a predicate and an

1496.32 -> object so if we go back to the earlier

1498.48 -> example of dave works at amazon

1500.48 -> if we wanted to look at this with a in

1503.039 -> an rdf

1504.559 -> with an rdf data model we would

1505.919 -> represent that data as the subject would

1507.679 -> be dave the object would be worksat and

1510.4 -> the predicate would be amazon

1513.679 -> if you choose to use the rdf data model

1515.76 -> inside neptune we support querying that

1517.84 -> data using sparkle1.1.1

1520.799 -> which is a w3c standard query language

1523.6 -> for

1524.48 -> querying rdf data

1530.96 -> you have thousands of customers are

1532.48 -> using to neptune today in production um

1535.039 -> this is just kind of a really quick

1537.2 -> short list of some of our neptune

1539.039 -> customers across different verticals and

1540.799 -> different use cases

1543.279 -> so

1544.24 -> you know

1545.36 -> what what is neptune how does neptune

1547.12 -> work

1550 -> let's take a moment and look under the

1551.84 -> hood of neptune and how neptune is built

1553.919 -> what its architecture looks like and

1555.919 -> then talk a little bit about some of the

1557.679 -> the the the features and functionality

1560 -> of of neptune

1561.76 -> so when i think about neptune at the

1563.36 -> high level i think the architecture when

1565.679 -> i think about the architecture of

1566.64 -> neptune i sort of think of it in these

1568.32 -> sort of five

1570.32 -> basic areas the application layer the

1573.12 -> compute layer the shared storage layer

1575.36 -> set of service features and service

1576.799 -> integrations one of the unique aspects

1579.039 -> of the general architecture of neptune

1582 -> is that we were with with neptune

1585.12 -> we were able to break out the compute

1587.36 -> layer from the shared storage layer it's

1589.12 -> kind of a very key piece of the

1591.44 -> architecture this really enables us to

1593.84 -> provide a lot of these these features

1595.6 -> that your applications work so let's

1597.52 -> dive in and take a little bit of look

1599.279 -> and take a moment to look at some of

1600.72 -> these

1602.48 -> the first layer here we wanted to talk

1603.76 -> about is that application layer this is

1605.52 -> where you as the developer really live

1607.679 -> and work this is where you're building

1609.12 -> your social networking application or

1610.799 -> your fraud detection application maybe

1612.799 -> it's a knowledge graph or a security

1614.32 -> graph or one of the other use cases we

1615.679 -> talked about but this is where you're

1617.52 -> interacting with it and you do this

1619.279 -> through any of our three query languages

1622.32 -> as we mentioned and this basically is a

1625.12 -> for

1626.24 -> for property graph we basically have a

1627.919 -> set of endpoints exposed both via http

1631.84 -> or

1632.96 -> a websocket or bulk connecting

1634.96 -> connection

1636.799 -> uh if you're using uh property graphs to

1639.039 -> be able to write that data to and from

1640.88 -> the system so you're you're using a set

1642.48 -> of open source drivers or open uh you

1644.64 -> know drivers or

1646.32 -> rest rest and rest api in calls to

1649.2 -> interact with this system this system uh

1652.08 -> these applications are going to interact

1653.919 -> with this compute layer as i mentioned

1655.84 -> the compute layer in in neptune is

1657.84 -> separated out from the storage layer

1660.72 -> uh in a kind of a unique way for cloud

1663.2 -> data or basis

1665.279 -> cloud-based graph database excuse me and

1668.08 -> the compute instances are built to allow

1670.64 -> you to scale dynamic to dynamically

1672.96 -> scale as your application requires it's

1675.279 -> a instance based database so you your

1678.32 -> compute layers are can have up to 16

1680.799 -> different instances there's always one

1682.399 -> writer instance or one primary instance

1684.96 -> that primary instance can be scaled

1686.96 -> anywhere from our smallest being a t3

1689.52 -> medium

1690.48 -> up to an r524xl

1693.2 -> or

1694.559 -> recently we also if you have a high

1696.08 -> memory demands on your application we

1697.679 -> also recently released support for the

1699.44 -> x2g family in lines so you can scale

1701.919 -> that writer up from

1703.52 -> with to any of the sizes available

1705.2 -> within there

1706.399 -> in a vertical manner for reads most

1709.2 -> applicants most graph applications tend

1710.88 -> to be very read heavy

1712.72 -> so we allow you to scale out to up to 15

1715.76 -> different read replicas in on top of

1717.679 -> that same data you're storing that data

1719.84 -> one time each of these read replicas

1721.6 -> basically will read that same data and

1723.44 -> those those read replicas allow you to

1725.679 -> scale

1726.64 -> yet again uh also vertically from a t3

1729.279 -> medium all the way up to r424xl or x2gs

1732.88 -> as well as scale horizontally so for

1735.84 -> from an instance perspective the right

1737.919 -> instances can scale vertically and the

1739.679 -> read instances can scale both

1741.12 -> horizontally and vertically

1743.52 -> all of this compute instance is

1745.36 -> separated out from our shared storage

1747.279 -> layer uh we'll jump into a lot more of

1749.039 -> the details about this in a few minutes

1752 -> but at a high level this is a shared uh

1755.12 -> this is a strong a feature of the cloud

1757.36 -> native aspect of neptune is this shared

1759.84 -> storage layer that's separated from the

1761.52 -> compute layer

1762.799 -> when you write data into neptune that

1764.48 -> data is automatically going to be stored

1766.32 -> six times twice in each of three

1768 -> availability zones

1769.84 -> the data scales independent of your

1771.76 -> compute but it scales automatically for

1773.52 -> you so when you start at the neptune

1775.279 -> cluster it'll start with a provision

1777.44 -> space for 10 gigabytes of data and it

1779.6 -> will as you add more and more data that

1781.44 -> data will that provision space will grow

1783.679 -> up to 128 terabytes

1786.159 -> because it's a fully managed service we

1788.399 -> have a lot of features around automated

1790.32 -> backup and restore functionality for you

1793.679 -> and we have this ability we call it the

1795.279 -> database fast clone which is really

1797.36 -> about enabling you as a user to quickly

1800.48 -> make a copy of your database to maybe do

1802.399 -> something like try out

1804.32 -> a new

1805.44 -> engine version or to try something in

1807.36 -> your application without having to take

1809.6 -> down your production cluster in order to

1811.52 -> make this happen

1813.52 -> neptune also comes with a set of service

1815.52 -> features um

1817.12 -> that are part of the actual native

1818.88 -> implementation itself the first one

1820.48 -> being the ability to bulk load if you're

1822.72 -> going to bulk load data into neptune we

1825.039 -> provide an end point inside of the

1828.08 -> system that allows you to trigger this

1830.08 -> bulk load from an s from s3 files

1833.679 -> uh the s3 files have to be one of a few

1836.08 -> sets of

1837.76 -> specified formats if you're doing

1839.36 -> property graph there's two different csv

1841.2 -> formats you can use if you're doing rdf

1843.84 -> there is three different rdf data

1845.52 -> formats you can use once your data is

1847.52 -> formatted into that you trigger this

1849.039 -> bulk load this bulk load then reads in

1851.12 -> all of that data optimizes and

1852.88 -> parallelizes the load to give you the

1854.559 -> maximum performance for getting that

1856.559 -> bulk data into your system

1859.2 -> we also have neptune streams neptune

1861.039 -> streams is a cdc type functionality on

1864.08 -> top of neptune that enables

1866.24 -> you to basically get a continuous data

1868.88 -> stream or a continuous stream of data

1871.12 -> that

1871.84 -> data changes that are happening inside

1873.279 -> of your graph this can be used to

1875.279 -> trigger some sort of down

1876.799 -> downstream workflows or to be able to

1879.12 -> pot to

1881.12 -> basically propagate those changes to

1882.88 -> other external systems

1885.2 -> we also have a status endpoint on the

1887.36 -> cluster this cluster status endpoint

1889.519 -> basically tells you just basic

1891.519 -> information about the health of your

1892.799 -> cluster

1894.399 -> we have a query profiling and explaining

1896.24 -> endpoints that allow you to take sparkle

1898.72 -> gremlin or open cipher queries run them

1901.2 -> through these and get a very detailed

1903.2 -> look about how the engine is processing

1905.44 -> that data this allows you to then

1907.6 -> go a step further and be able to tune

1909.2 -> and optimize those queries based on the

1911.2 -> information being returned to you from

1913.679 -> those endpoints

1915.279 -> we also have a a feature that allows you

1917.919 -> to auto scale read replicas we'll talk a

1920.399 -> bit about this in a moment but this

1921.76 -> basically uh

1923.12 -> at a high level this feature enables you

1925.919 -> as the user to specify a set of

1928.72 -> thresholds at which point the your

1931.36 -> cluster will will

1932.72 -> will scale up and down to match the

1934.96 -> workload demands based on those

1936.559 -> thresholds

1938.72 -> neptune also comes with a few sets of

1940.559 -> service integrations uh the first

1942.48 -> service integration is an integration

1944.64 -> with the aws backup service so if you're

1946.399 -> an enterprise or you're in your account

1948.72 -> you use aws backup you can now backup

1951.2 -> that data or you can now back up your

1953.2 -> neptune data using that same service

1955.679 -> we also have a feature called neptune ml

1957.919 -> but we'll talk a bit about that here in

1959.76 -> a moment but at a high level what

1961.12 -> neptune ml is is it's an integration

1963.84 -> between neptune and sagemaker to enable

1967.039 -> customers of

1968.64 -> the neptune graph database to build and

1970.96 -> automate the process of building graph

1972.96 -> neural network-based machine learning

1974.48 -> models and using those models to run

1976.72 -> real-time predictions inside of

1979.12 -> queries

1980.72 -> we also have an integration with amazon

1982.64 -> open search and our integration with

1984.399 -> amazon open search enables when you you

1987.12 -> as a user to query

1989.44 -> open search for full text search type

1991.919 -> responses

1993.44 -> or full text search type queries and get

1995.679 -> those data get that information

1997.679 -> propagated back into your your query

1999.919 -> itself

2002.64 -> as i mentioned kind of one of the key

2004.559 -> pieces of the the architecture is this

2006.96 -> this cloud native storage layer that we

2009.2 -> have inside neptune itself

2012.159 -> this is a piece of really

2013.84 -> battle-hardened technology in inside

2015.919 -> amazon where when you write data as i

2018.559 -> mentioned previously that data is

2020.399 -> already is automatically written in a

2022.159 -> highly available and secure manner this

2024.64 -> data is replicated six times two as i

2027.12 -> said two times in each of three

2028.32 -> availability zones

2030.399 -> it's also continuously backed up to

2032.08 -> amazon s3 this storage layer itself was

2034.72 -> built for 11 9's of durability

2037.6 -> and it's based on this this this

2039.679 -> basically 10 gigabyte segment sizes this

2042.399 -> 10 gigabyte segment size as i said when

2044.48 -> you start a cluster you get one ten

2046.24 -> gigabytes

2047.2 -> uh one ten gigabyte data size or data

2050.32 -> segment as you add more and more data to

2052.48 -> your cluster we automatically provision

2054.56 -> these these additional data segments for

2057.04 -> you

2057.76 -> and this data segment is really the the

2059.679 -> unit of repair that this that this

2062.48 -> system automatically uses to heal itself

2064.8 -> and to rebalance hot spots inside of the

2067.76 -> inside of the storage uh layer things

2070 -> like that

2070.879 -> a quorum system is used for reads and

2072.72 -> writes so it's very latency tolerant and

2075.28 -> as i said as i mentioned this storage

2077.359 -> volume will automatically grow up to 128

2080.32 -> different terabytes

2081.919 -> excuse me up to 128 terabytes

2086.159 -> read replicas uh we replicas beyond just

2088.72 -> scaling uh

2090.48 -> late or scaling the number of

2092 -> transactions you can use there's a

2093.28 -> couple of uh really key aspects about

2095.359 -> read replicas

2096.72 -> as i mentioned neptune is an

2098.32 -> instance-based database and you can run

2100.88 -> it with as few as a single instance in

2102.72 -> which case you know that instance will

2104.16 -> be a writer

2105.52 -> when you're doing the if you're running

2107.2 -> in something like a single instance node

2110.4 -> with only a single instance excuse me

2112.96 -> if you have a if there's something that

2114.88 -> goes on with that writer node for you

2116.32 -> for some reason that node fails it will

2118.88 -> automatically be rebooted by the system

2120.72 -> but that can take several minutes to

2122.48 -> come up so you during that time you

2124.72 -> won't be able to serve any traffic

2126.24 -> inside of your apple or any traffic from

2128.48 -> your cluster to any sort of applications

2130.24 -> using it

2131.599 -> if you set up a read replica that read

2133.44 -> replica will basically be used as a

2135.76 -> failover so if that primary writer node

2138.32 -> goes down for some reason that read

2140.72 -> replica will be promoted to the writer

2142.88 -> and the writer node will be uh what was

2144.88 -> the writer node will be restarted as the

2146.72 -> read replicate itself and this really

2149.68 -> uh you know this automatic failure

2151.359 -> detection and things like that will

2153.28 -> really

2154.32 -> really

2156 -> minimize the amount of time that your

2157.68 -> application won't be able to serve

2158.96 -> traffic

2160.64 -> the other reason customers tend to use a

2162.56 -> lot of read replicas is to scale out the

2164.48 -> read traffic as i mentioned you can

2166.079 -> scale out the read traffic across your

2168.48 -> different

2169.52 -> um you know up to 15 different read

2171.119 -> replicas and

2172.4 -> balance the right the reply excuse me

2174.96 -> the read requests and responses based

2177.119 -> across that uh using the reader endpoint

2179.92 -> so that sort of kind of uh brings us to

2182.32 -> what

2183.44 -> how is how is traffic how do you work

2185.28 -> with the neptune how do you actually

2186.48 -> write read and write data to it when you

2188.88 -> work with neptune you're basically given

2190.56 -> or i should say when you create a

2191.68 -> neptune cluster you're giving two

2193.52 -> different endpoints you're giving a

2195.2 -> writer end point and a reader endpoint

2197.04 -> the writer endpoint also referred to as

2199.2 -> the cluster endpoint always points to

2201.2 -> the current writer always points to the

2202.72 -> point

2203.52 -> so the current primary instance uh this

2205.839 -> is the one you would use to do any sort

2207.28 -> of mutation queries on top of your data

2210.32 -> if you have read heavy queries as many

2212.48 -> applications do we also provide you a

2214.4 -> read endpoint

2216.4 -> and this read endpoint distributes those

2218.8 -> requests as they come into different

2220.72 -> read replicas so if you have three

2223.2 -> different read replicas this reader

2225.119 -> endpoint will round robin uh distribute

2228.48 -> requests across the i should say this

2232.24 -> will round robin uh for a specified

2234.48 -> period of time will will point to

2236 -> different things so your request will be

2237.44 -> distributed across your different

2239.04 -> instances that being said this reader

2241.44 -> endpoint is

2243.2 -> is time-based there's no fairness or

2245.119 -> round-robin guarantees but this is often

2247.599 -> sufficient for many customers uh one

2250.96 -> point to note here is if you're using

2252.8 -> gremlin with their websockets or using

2254.64 -> the oc bolt connections

2256.4 -> those

2257.28 -> connections once created are sticky so

2259.599 -> they'll stick to a specific read replica

2262.4 -> so something to be aware of

2264.32 -> that you may end up with hot spots in

2266.24 -> your applications if you're using web

2268.64 -> sockets or bolt

2270.96 -> customers also can build custom request

2273.44 -> distribution strategies we've seen

2275.28 -> customers we've had customers that have

2276.64 -> built them around fairness or to

2278.24 -> optimize cash locality um these have to

2280.88 -> be done at the application level if you

2282.48 -> want to do this sort of

2285.04 -> if you want to build these sorts of

2286.079 -> custom distribution strategies

2288.16 -> and you also need to be aware of

2289.76 -> potential failover scenarios and how you

2292.16 -> are going to handle that

2295.2 -> another key aspect of the neptune

2297.68 -> architecture is caching within neptune

2300.32 -> there's three different types of cache

2301.839 -> that are

2302.96 -> used

2303.92 -> the first being a buffer cache uh this

2306.079 -> is always on as soon as you spin up any

2308.24 -> instance this buffer cache is enabled

2310.24 -> and it's used to store pages of data of

2313.28 -> graph data from that shared storage

2315.2 -> layer in local memory this is really

2317.52 -> enables much faster queries by being

2320 -> able to pull frequently used data from

2322.8 -> the local instance memory as opposed to

2325.44 -> having to go across

2327.2 -> and fetch that from the actual storage

2329.119 -> layer itself as i said this is always on

2331.44 -> it's enabled by default

2333.44 -> there's nothing you as the user needs to

2334.96 -> do here

2335.92 -> one thing we i would recommend is

2338.8 -> to monitor there there's a cloud watch

2340.48 -> metrics called buffer cache hit ratio

2342.96 -> and you want to really want to make sure

2344.16 -> that's up in the high you know high 99

2346.72 -> you know high 90 percents uh

2349.839 -> and you can use that by monitoring that

2351.839 -> or you can figure that out by monitoring

2353.2 -> that metric inside cloudwatch

2355.68 -> the second type of cache we have is an

2357.76 -> optional cache it's a lookup cache this

2360.4 -> only works if you're uh this lookup

2362 -> cache i should say is only available

2364.16 -> if you have r5d instances because what

2366.64 -> it's doing is it's using the nvme

2369.52 -> based ssd

2371.119 -> data or sorry ssd disks to store

2374.079 -> property values and literals for use

2376.32 -> cases where you have to where you're

2378.079 -> frequently returning large numbers of

2380 -> these property values to the users this

2382.8 -> is something you can set up on a per

2384.72 -> instance basis and will automatically be

2387.52 -> used uh

2388.88 -> be used when uh to store that data that

2390.88 -> data will automatically be populated

2392.32 -> into those nvme

2394.079 -> disks as it's being used

2396.96 -> the second one is the query results

2398.8 -> cache the query results cache uses the

2400.64 -> instance memory uh it's also an optional

2402.72 -> opt-in cache and it basically it's

2405.44 -> exactly what you expect it to be it

2407.68 -> stores the query results so on a per

2410 -> query basis you can specify that you

2412.64 -> want to use this query results cache to

2415.76 -> store that data

2417.52 -> into uh to cache those query results for

2420.56 -> a specified amount of time

2422.48 -> this is really used for cases where you

2424.72 -> have maybe the same query is highly

2426.96 -> repeated but the data on the data being

2428.88 -> returned doesn't change

2431.04 -> um the other and probably most common

2433.52 -> use case we've seen with customers is

2434.88 -> for pagination if you want to be able to

2437.2 -> run a query one time that basically goes

2440 -> out pulls a lot of data orders it and

2442 -> returns the first 10 results and then

2443.68 -> you want to page the second 10 results

2445.2 -> in the 310 results um this sort of query

2448.4 -> results cache is really

2450.48 -> very helpful there because you will have

2452.4 -> to you know you run that query one time

2454.24 -> you can store those cash uh you can cash

2456.319 -> those results for for a ttl and the next

2459.52 -> time you go to pull the second page

2460.8 -> it'll be in uh very very quick mil you

2463.359 -> know a few milliseconds sort of response

2465.92 -> time to get that back because you're

2467.2 -> pulling it directly from this query

2468.56 -> results cache

2471.44 -> security is also all as as with all

2473.92 -> databases is a very key aspect of the

2476.64 -> neptune architecture

2478.64 -> neptune is built with network isolation

2481.119 -> it is a pro it is a vpc only database so

2484.319 -> any applications or there is no public

2486.48 -> endpoint so any applications

2488.48 -> or

2489.44 -> users that want to do it have to have

2490.88 -> access inside that vpc in order to talk

2493.28 -> to neptune

2495.04 -> data is encrypted with neptune um data

2497.44 -> is encrypted at rest using aws's kms you

2500.56 -> can either specify a specific key or use

2502.72 -> the default key for your application

2505.359 -> encryption is also handled in transit

2507.76 -> through ssl

2510.4 -> from a user perspective

2512.64 -> neptune is integrated with iam

2514.64 -> authentication so you can use iam

2516.56 -> policies in order to manage access into

2519.359 -> your neptune uh or into neptune the

2522.4 -> database

2523.44 -> any neptune cluster can be iam enabled

2525.92 -> so every request to that database where

2528.319 -> we require iam authentication order to

2531.04 -> make it happen and we we have a a

2534.16 -> fine-grained set of action-based access

2536 -> controls to to really granularly uh

2539.04 -> grant people access to different actions

2541.2 -> and to different data plane type actions

2543.2 -> inside the system

2545.839 -> as i mentioned uh neptune is a fully uh

2548.64 -> managed system so we have a variety of

2550.88 -> automated backup and restore

2552.24 -> functionalities

2553.76 -> daily automated backups uh will occur

2556.16 -> during a window that you can specify in

2557.839 -> which case uh the full storage volume

2559.599 -> snapshot is taken

2561.04 -> the retention period on these automated

2562.8 -> backups can be set uh configured by the

2565.599 -> the customer to be between 1 and 35 days

2568.72 -> you can also manually create snapshots

2571.2 -> of your data to back up an entire

2572.56 -> database instance uh manual snapshots

2575.44 -> can also be shared across accounts so if

2577.52 -> you have a use case where you need to be

2579.2 -> able to take a snapshot of your data and

2581.28 -> move it to maybe a prod account or from

2583.76 -> a prod account to a qa account to a test

2586.4 -> account things like that you can do that

2588.079 -> and they can be copied across regions

2590.319 -> um you have you know not only do we have

2592.16 -> the ability to take a snapshot but you

2593.52 -> can restore those database snapshots uh

2596 -> to a new database instance this allows

2597.839 -> you to also do things like if you want

2599.28 -> to change the parameter security groups

2601.119 -> around it to test out new settings you

2602.96 -> can do that

2604.079 -> and neptune also supports point in time

2605.92 -> restore so you can restore the database

2608.079 -> instance to any specific time down to a

2610.24 -> one second granularity

2612.88 -> um not listed here i mentioned i briefly

2614.72 -> earlier is also a fast clone capability

2616.56 -> which basically

2618.72 -> quickly allows you to make an entire

2620.4 -> clone of your database to be able to

2622 -> test out new and different features

2626.48 -> another key aspect is the ability to

2628.4 -> monitor your cluster as it's running

2630.319 -> neptune integrates with cloudwatch and

2632.16 -> cloudtrail to monitor to

2634.72 -> cloud watching cloudtrail services to

2636.24 -> basically

2637.359 -> log all all the api calls that neptune

2639.76 -> makes as well as look at different

2641.28 -> metrics around the database we want to

2643.119 -> look at the summary of the cpu and

2644.72 -> memory utilization of all of the

2646.24 -> instances in our system we can look at

2648.079 -> things like what is the query throughput

2650.079 -> what are the success and

2651.599 -> our error rates of that what is my read

2653.68 -> and write throughput how much storage is

2656 -> my system using

2657.92 -> there's also the uh there's an optional

2660.88 -> mechanism to

2662.48 -> enable audit logs audit logs will

2664.8 -> contain a very granular uh set of

2666.96 -> information about every query that's

2668.4 -> been uh or every query that's being

2670.56 -> processed by the thing or by the the

2672.96 -> engine it's going to have information

2675.2 -> like time stamps what are the servers

2677.28 -> and client hosts what what message was

2680.16 -> sent across and that's all going to be

2682.8 -> stored inside of your cloudwatch logs

2686.96 -> so it's a little bit about the

2687.839 -> architecture let's talk a little bit

2689.68 -> about some of the new features and

2691.599 -> functionality that neptune has and then

2693.599 -> we'll jump into a bit of a deep dive and

2695.52 -> demo on how that works

2698.72 -> one of the newest features we have for

2701.04 -> neptune is a python integration uh

2703.599 -> specifically kind of targeted and

2704.8 -> running some graph analytics and what

2706.079 -> this is is this is an open source python

2709.04 -> python integration

2710.56 -> that allows you to easily read and write

2712.8 -> data stored in neptune by removing the

2714.76 -> undifferentiated heavy work

2716.96 -> of taking of managing connections and

2720.079 -> and working with data or getting data to

2722.64 -> and from the format that graph query

2724.88 -> languages return to kind of something

2727.119 -> that's more usable for most uh customers

2729.599 -> in this case we're using

2731.44 -> pandas data frames is sort of the lingua

2733.68 -> franca between these two

2735.68 -> this library basically enables you to

2738.72 -> read and write data from neptune and to

2740.8 -> pull that data into pythons where you

2742.64 -> can use

2743.68 -> any of the popular open source tool

2745.68 -> python tools out there to do further

2747.44 -> analysis so maybe you wanted to run

2750 -> some sort of analytics or algorithm on

2751.839 -> top of it you could pull data down using

2753.52 -> this independence data frame and then

2755.359 -> use a network analysis library like

2757.119 -> igraph or networkx to run some sort of

2759.839 -> uh you know small scale analysis on top

2762 -> of your data this also comes with a set

2764.079 -> of sample application notebooks on how

2766 -> it's

2766.72 -> how you can actually use it

2770.079 -> the next feature i wanted to highlight

2771.52 -> here is was released in the spring of

2775.28 -> 2022 here

2776.96 -> and that is our support for open cypher

2779.28 -> so open cypher is a widely adopted open

2782.079 -> query language for property graphs it

2784.079 -> provides an intuitive way to work with

2785.76 -> property graphs by providing developers

2788.72 -> business analysts or data scientists a

2791.359 -> sql inspired syntax that has a familiar

2795.28 -> familiar structure that you can use to

2797.44 -> compose queries for graph applications

2800.4 -> customers this really enables customers

2802.8 -> that come from a relational background

2804.48 -> or familiar with sql to have a very sm a

2807.2 -> very a smooth on-ramp onto working with

2809.76 -> graph databases

2811.44 -> one one of the unique aspects of our uh

2814.56 -> integra our our support for opencypher

2817.119 -> is we have built uh this in such a way

2820.079 -> that you can load your data your

2821.44 -> property graph data into neptune once

2823.44 -> and then you could use open cipher or

2825.28 -> the gremlin query language on top of

2826.96 -> that same data this really allows

2829.2 -> customers that either are migrating from

2830.88 -> other systems

2832.48 -> or have data an application already in

2834.64 -> neptune to start getting uh able to use

2836.64 -> open cypher very easily

2838.64 -> we also also uh neptune also supports uh

2841.68 -> the open

2842.8 -> the open cipher bolt protocol this one

2845.92 -> basically allows customers that are

2847.44 -> running current workloads

2849.28 -> uh to migrate those workloads to neptune

2852.16 -> with a minimal amount of changes to

2854.24 -> their

2856.839 -> application the next feature i wanted to

2858.88 -> talk about here is neptune ml as i

2861.04 -> alluded to earlier neptune ml is an

2863.52 -> integration between amazon neptune and

2865.599 -> sagemaker to enable graph developers to

2869.2 -> make prediction our machine learning

2870.64 -> based predictions on graph data without

2872.4 -> having a ton of machine learning

2873.92 -> expertise it does this by automating a

2876.4 -> lot of the choices you would need to

2878 -> make in order to build your machine

2879.359 -> learning model being which model is the

2881.52 -> best one to use

2884 -> what training instant sizes i need what

2886 -> what processing instance sizes things

2887.599 -> like that this is all based on

2889.599 -> state-of-the-art machine learning uh

2891.52 -> techniques specifically using gnns or

2894 -> graph neural network based machine

2895.839 -> learning techniques and these have been

2897.839 -> shown in some some

2899.599 -> external studies to be up to 15 more

2902.559 -> accurate than some other uh

2905.359 -> machine learning based models uh machine

2907.119 -> learning based paradigms out there

2909.119 -> neptune ml is really built to scale to

2911.359 -> the large data sets that we have our

2913.839 -> applica our customers of neptune using

2915.52 -> today because you know when you're

2917.52 -> working with knowledge graphs or fraud

2918.96 -> detection or product recommendations

2920.24 -> you're talking about extremely large

2922.559 -> amounts of entities you're working at

2924 -> maybe up to billions of relationships

2926.96 -> neptune ml went

2928.64 -> ga a little about a year ago and

2931.44 -> recently we have added support for

2932.88 -> custom models which allows uh customers

2935.839 -> that have expertise already in gnns to

2938.24 -> build their own custom model

2939.52 -> implementations of python but still be

2941.68 -> able to use that with the rest of the

2943.119 -> neptune ml framework

2945.44 -> and we added support for sparkle so now

2947.52 -> you can whether you're using property

2949.2 -> graph or you're using rdf you can use

2952.319 -> neptune ml to enable machine learning

2954.8 -> based predictions on that graph data

2958.4 -> we also recently announced fine grain

2960.319 -> access control from for data plane

2962.079 -> actions uh this is really about

2965.04 -> allowing

2966.24 -> the customers to specify at a very

2968.559 -> granular level which sort of actions a

2971.68 -> specific iam role can take so this

2974.319 -> allows you to specify roles that are

2976.079 -> maybe only have access to read or only

2978.559 -> have access to write or only have access

2980.8 -> to trigger bulk loads things of that

2983.04 -> nature this really allows for kind of

2985.44 -> least privileged access

2987.28 -> allows you to set up least privileged

2988.559 -> access for applications that are using

2990.16 -> neptune to give them only the specific

2992.96 -> permissions that they need this is now

2994.96 -> the default when you turn on iam

2996.96 -> authentication as of the 1.2.0.0 release

3000.8 -> and it really allows you to create those

3003.04 -> separate policies to any data plane api

3008.24 -> another

3009.2 -> feature we recently added to simplify

3011.2 -> some of the operational headaches is the

3013.44 -> auto scaling read replicas as we talked

3015.92 -> about earlier neptune is an instance

3017.52 -> based database it allows you to scale

3019.76 -> read replicas up to 15 of them and auto

3022.48 -> scaling read replicas allows you to

3024.4 -> specify a minimum and a maximum capacity

3026.88 -> for that a scaling threshold based on

3029.44 -> some of the cloud watch metrics and then

3031.2 -> it will automate the scaling activities

3033.04 -> as your workload demands so as your

3035.2 -> workload ramps up it will scale out

3037.44 -> additional read replicas

3039.359 -> as your workload demand uh

3041.839 -> eases it will scale those back so it

3044 -> really helps automate someone it helps

3046.319 -> automate the process of doing that as

3049.28 -> well as provide you some cost

3050.8 -> optimization by not having to run

3052.48 -> additional read replicas when they're

3054.079 -> not needed

3056.48 -> another operational simplicity that

3058.72 -> feature that we recently released is one

3061.04 -> i'm very excited about and that's

3062.24 -> neptune global databases neptune global

3064.72 -> databases allows you to deploy neptune

3066.96 -> clusters across multiple aws regions for

3070.24 -> fast cross region disaster recovery or

3072.48 -> low latency queries the two reasons uh

3075.119 -> you know at its core this allows neptune

3077.599 -> global databases allows you to set up

3079.2 -> one primary cluster in up to five

3082 -> different

3083.04 -> secondary clusters that are read only in

3085.52 -> different aws regions um two main

3088.319 -> reasons which we've talked with

3090 -> customers are interested in this feature

3091.599 -> the first is disaster recovery in a

3093.92 -> scenario where uh you know you may have

3096.559 -> a regional outage customers want to be

3098.96 -> able to easily

3100.64 -> have their data

3102.48 -> actually easily maintain business

3103.839 -> continuity of their data

3106.16 -> when regions out by failing over to

3108.16 -> another region the second common use

3110.72 -> case we have uh talked with customers

3112.96 -> about is

3114.24 -> for global applications have been able

3116.8 -> to co-locate your data closer to where

3119.52 -> your uh user is to now enable lower

3123.2 -> latency reads as part of a kind of a

3125.119 -> global data distribution strategy

3128.96 -> this is as i mentioned this is this is a

3131.2 -> managed feature so there's you know

3133.44 -> it'll also would allow for things like

3135.359 -> fast cross region migrations uh and it's

3138 -> got a low replica lag between these

3140.079 -> these regions so the data is going to

3141.44 -> automatically be written into the

3142.96 -> primary region and then that will be

3144.72 -> automatically replicated out with a low

3146.8 -> latency to the other aws regions that

3148.96 -> you have clusters configured for

3154.079 -> beyond just uh you know feature or

3156.88 -> features new features and optimizations

3159.44 -> for operations we're actually have

3161.68 -> several new features related to cost

3164.16 -> optimization for users of their system

3167.04 -> first being our support for graviton 2.

3170.559 -> graviton 2 in our testing has shown to

3173.2 -> improve query latency and lower the cost

3175.2 -> compared to some x86 instance sizes

3179.68 -> so

3180.4 -> not only do you get faster queries you

3182 -> get to pay less for them so that's

3184.319 -> always a real benefit to customers and

3186.72 -> this is partially uh there's a lot of

3188.64 -> reasons behind this but part of this is

3190.079 -> also we inherit some of the benefits of

3192 -> the aws nitrate system for private

3194.72 -> networking and fast uh local storage

3198.079 -> as of today we support the t4g series uh

3201.44 -> for low or for low cost development and

3204.72 -> test type workloads we also support the

3206.96 -> r6 g-sys series and recently we added

3209.44 -> support for the x2 g series

3211.839 -> for any sorts of uh

3214.72 -> excuse me for any sorts of uh use cases

3217.52 -> where being able to have a lot of memory

3219.52 -> and a lot of buffer cache is very is

3222.16 -> beneficial

3225.2 -> as of april we also offer now offer a

3228.319 -> free trial offer for neptune

3231.359 -> this free trial offer does not limit the

3232.96 -> features of neptunes this means if

3234.48 -> you're in an organization that has not

3235.839 -> created a neptune cluster you can get

3237.599 -> started using this uh are trying out

3240.079 -> neptune for free and this means you can

3241.76 -> use any of our three query or any other

3243.52 -> two graph models or three query

3244.88 -> languages to do it for eligible

3246.64 -> customers you're going to get up to 750

3248.8 -> hours of t3 medium instances 10 million

3252 -> ios a gigabyte of storage a gigabyte of

3255.28 -> backup and this is going to be free for

3256.96 -> 30 days with no restrictions after that

3259.76 -> 30 days it's a pay it's going to reserve

3261.839 -> report revert to a pay-as-you-go model

3264.079 -> so you're only going to pay for the

3265.2 -> resources you consumed no up-front

3267.44 -> licensing costs

3269.76 -> so

3270.72 -> now that we're uh you know kind of walk

3272.559 -> through a bit about what net what graphs

3274.319 -> are what is amazon neptune some of the

3276.24 -> features of it

3277.44 -> let's do a deep dive and demo and take a

3279.28 -> look about how you can set up a neptune

3280.88 -> cluster and then why you might want to

3282.24 -> use this with some

3283.68 -> for some of the common use cases we

3285.119 -> discussed

3288.48 -> so what we're looking at here is we're

3290.559 -> looking at the neptune console what i'm

3292.799 -> going to walk through is you know if

3294.319 -> you're coming in here you're new to

3295.839 -> neptune and you want to create a

3296.88 -> database let's take a look at what you

3298.4 -> need to do in order to make that happen

3300.48 -> we're going to start here we're going to

3301.599 -> click on create database

3303.839 -> it's going to provide you a list of

3305.92 -> options you can specify that the version

3309.44 -> that you want to have

3311.2 -> of neptune especially if you're testing

3313.599 -> it out i almost always recommend you try

3315.52 -> out the newest version of the system you

3318.4 -> can specify a specific identifier for

3320.96 -> this in this case it defaults to a name

3323.44 -> of database one you also specify the

3325.839 -> template and the template

3327.359 -> uh here basically lets you choose or

3329.839 -> specifies the set of instance sizes and

3332.559 -> some basic configuration parameters that

3334.48 -> you have um when you select the dev and

3337.28 -> test instance you

3339.28 -> it provides you access to the t3 medium

3342.24 -> burstable class of instances these are

3344.48 -> really only good for development and

3346 -> testing type scenarios so if you want to

3347.599 -> create a production type cluster you

3349.68 -> want to select the production template

3351.04 -> which is what i'm going to do here

3352.96 -> um

3353.92 -> from there i can select

3355.68 -> the instant sizes i have

3357.839 -> as you can see we have quite a few

3359.28 -> instant sizes to choose from i'm just

3360.799 -> going to leave it at the default one

3362.48 -> right now of r6g extra large

3365.599 -> you can also uh in a production when you

3368.319 -> select the production template you also

3370 -> get the option to create a read replica

3372.559 -> and then different availabilities right

3374.079 -> out of the box out of the box

3377.2 -> you can specify if you have very

3379.04 -> specific vpcs you want it in you can

3380.799 -> specify that sort of things we also have

3382.96 -> the ability to create a notebook i

3385.359 -> didn't mention this until now but one of

3387.839 -> the features that comes with neptune is

3389.92 -> this construct of a neptune notebook a

3392.559 -> neptune notebook is a free open source

3394.88 -> package we'll see it here in just a

3396.96 -> moment and it runs on top of the jupiter

3400.64 -> web-based ide and provides the ability

3403.52 -> to run things uh you know to be able to

3405.839 -> interact with your cluster as sort of

3407.2 -> the ide for neptune

3410.24 -> um so we can automatically create one

3411.92 -> here uh i already have a few a database

3414.24 -> up but uh so i'm not going to do it here

3415.839 -> but if i clicked create database at this

3418.319 -> point what would go through is it would

3419.92 -> go in it would provision out that

3422.16 -> storage it would provision out the

3423.68 -> instances you had and it would spin all

3425.2 -> of this up and within a few minutes you

3427.359 -> would end up with a

3430.319 -> database that shows up here it would

3432.4 -> show up as available in this case as you

3434.16 -> can see i have two databases once uh

3436.799 -> sorry two clusters excuse me one called

3439.2 -> air routes one's called altimeter uh

3441.359 -> they both have a single writer instance

3443.44 -> and they're both currently available so

3445.599 -> let's let's jump in to one of these and

3448 -> start taking a look at what it looks

3449.44 -> like to actually work and interact with

3451.44 -> with neptune

3453.599 -> as i mentioned uh

3455.44 -> we're going to

3457.2 -> be using what is known as the neptune

3460.64 -> notebooks that this is a jupiter

3462.48 -> notebook so this is a free open source

3464.16 -> package

3468.559 -> if i come in here to notebooks you can

3470.24 -> see i have one notebook running i can

3472.96 -> open it from there

3474.4 -> um

3475.68 -> this ide

3477.68 -> is exactly uh this is the the ide we

3480.96 -> provide into neptune it's uh you can run

3483.119 -> this

3484.079 -> either as a hosted infrastructure piece

3486.16 -> of infrastructure or not as you can see

3488.48 -> when i click on this it pops up a hosted

3491.44 -> uh info a hosted jupyter notebook this

3493.76 -> is hosted as part of sagemaker notebooks

3496.559 -> you can either use this open source

3499.04 -> neptune notebooks package through these

3500.96 -> hosted instances as i am here or if you

3503.52 -> have your own jupyter server you can run

3505.359 -> and install this and as long as you have

3506.799 -> connectivity into

3508.64 -> the as long as you have connectivity

3510.72 -> into your neptune uh the vpc where

3513.44 -> neptune exists you'll be able to run

3515.44 -> these sorts of things

3516.88 -> um when we come in here we'll see

3518.88 -> there's a set of

3520.88 -> notebooks in here automatically we have

3522.559 -> ones around

3523.839 -> a few getting started notebooks that

3525.92 -> really go into talking about

3528.4 -> what it you know what is a little bit

3529.68 -> about neptune notebooks how you can use

3531.76 -> that to access the graphs a few

3533.839 -> different examples

3535.2 -> we have some that uh we have some

3537.599 -> example notebooks that go through the

3538.88 -> different types of visualizations that

3540.559 -> you can actually uh build and run with

3543.68 -> neptune notebooks

3545.2 -> uh we'll see those here in a moment we

3546.64 -> also have some sample applications

3548.16 -> specifically around fraud graphs

3549.599 -> knowledge graphs and identity graphs

3551.44 -> that enable you to be able to go in and

3554.319 -> give you a set of predefined uh actually

3557.359 -> gives a set of predefined data that you

3559.839 -> can load into your cluster as well as a

3561.76 -> set of predefined queries that you can

3563.28 -> start to see how you might go about

3564.96 -> building one of these sorts of use case

3566.64 -> applications

3569.04 -> and we have a set on machine learning if

3570.72 -> you're interested in using neptune ml

3572.72 -> you can come in here and look at the uh

3574.96 -> this xero4 machine learning and this

3576.72 -> will give you a very detailed

3578 -> walkthrough of the different features

3579.599 -> and functionality of that

3582.079 -> uh

3582.96 -> that feature of neptune

3584.72 -> for this example i'm going to jump into

3586.079 -> this notebook that i have

3587.68 -> called oc gremlin examples and let's

3590.48 -> take a little bit of a look at what some

3591.839 -> of the things we can do here um you know

3593.76 -> as i mentioned we have that query uh we

3596.079 -> have that status endpoint on on our

3597.839 -> cluster with neptune one of the things

3601.04 -> i should say with neptune notebooks

3603.2 -> it it provides a set of what are known

3605.04 -> in jupiter as magics anything that

3606.48 -> starts with this percent sign is one of

3608.24 -> those is a magic and it basically allows

3610.88 -> you to do a specific feature and

3612.24 -> functional a piece of functionality here

3614.559 -> in this case this status one is going

3616.88 -> out querying that endpoint that status

3618.88 -> endpoint and bringing back the data you

3620.48 -> can see the different types of

3621.44 -> information here you can see when my

3623.119 -> cluster was started what's the role of

3625.119 -> it

3625.76 -> what's the engine version i'm working on

3627.52 -> what versions of gremlin sparkle and

3629.2 -> opencypher supported

3631.2 -> any of the lap mode or beta features i'm

3634.559 -> looking at here any of the other

3636.079 -> features such as the result cache that i

3637.76 -> may have enabled

3641.28 -> beyond that we can also uh do things in

3643.92 -> our cluster like bulk load data so if

3645.839 -> you have

3646.96 -> data stored in an s3 bucket you can use

3649.76 -> neptune notebooks to kind of give you a

3652 -> a graphical walkthrough of being able to

3654.72 -> set up that the call that basically

3657.28 -> triggers that sort of thing i'm not

3659.119 -> going to do that here

3660.559 -> we also have a fast reset functionality

3662.799 -> so if you're in this stage of building

3665.44 -> your application where you need to be

3667.28 -> able to clear out your database quickly

3669.44 -> you can use the the fast reset or in

3672 -> this case you know through the db reset

3674.24 -> widget to be able to basically wipe all

3675.76 -> the data out of your cluster and start

3677.119 -> over from scratch

3679.44 -> i'm not going to do that here

3681.119 -> um because i have some data loaded in

3682.799 -> here uh the other

3684.64 -> the other command that's very useful

3686.24 -> here is as i mentioned we have some seed

3688.48 -> data sets some some data sets for very

3690.799 -> specific types of use cases

3693.04 -> uh that we provide out to the user

3695.92 -> uh through this notebook that you can

3697.44 -> use to to get started here and what

3699.599 -> we're going to use here is i've already

3701.119 -> loaded this data but one of these is a

3703.44 -> set of air house data so this is

3705.52 -> airports and flights between airports

3707.76 -> and we're going to use that to kind of

3708.96 -> walk through a little bit about what

3710.88 -> going and querying these graphs sort of

3713.04 -> look like

3714.16 -> so in this case we're working we're

3716.319 -> going to be working with property graphs

3719.28 -> and i'm going to be showing kind of

3721.2 -> talking a little bit about

3723.359 -> some of the basics of graph query

3724.799 -> languages and how you can use those to

3726.559 -> do things the first one we're looking at

3728.319 -> here is opencypher uh opencypher is as

3732.16 -> we mentioned has a sql inspired syntax

3734.24 -> so if you're looking at this you can

3735.76 -> probably right away

3738 -> even without me telling you anything

3739.599 -> probably have a reasonably good idea of

3741.599 -> what's going on here uh in this case

3744.64 -> uh site opencypher is based on

3747.119 -> kind of a pattern matching syntax uh

3749.119 -> anywhere you see the parentheses that

3752.24 -> represents a node uh we will see it here

3755.599 -> in a minute but anywhere that you see uh

3757.68 -> kind of an arrow and a line represents

3759.599 -> connections and you can use this to

3761.44 -> build more complex types of queries in

3764.799 -> this case it's kind of the most basic

3766.4 -> query you can look for i'm just going to

3768.24 -> be looking i want to match any nodes

3771.359 -> that node is going to be labeled n

3773.359 -> and i want to match any nodes where the

3775.039 -> code

3775.92 -> of the airport

3777.44 -> is anc for anchorage airport i can come

3780.48 -> in here

3781.599 -> i can run this uh you'll see it'll give

3783.92 -> me back basically the information i had

3786.16 -> i can view this in the json format

3788.72 -> that's natively returned from neptune

3791.76 -> i can also view this as a graph

3794.4 -> this is one of kind of the key features

3796.079 -> of of neptune notebooks is this ability

3798.48 -> to come in here and make this bigger we

3800.72 -> can come in here and look at this

3802.96 -> thing in this case it's not as

3804.48 -> interesting as it could uh would be in

3806.24 -> some other cases

3807.52 -> because it's just a single entity but i

3809.119 -> can come in here i can also look at all

3811.44 -> of the properties that were returned as

3813.039 -> associated here so we can see these are

3815.359 -> the uh you know this is that property

3817.52 -> graph

3818.72 -> model where you're looking at nodes the

3820.64 -> node in this case being an airport as

3822.64 -> well as the attributes associated with

3824.24 -> the properties associated with it

3830.799 -> um

3831.76 -> the other uh the other live query

3833.68 -> language we support for property graphs

3836.079 -> is gremlin uh as i mentioned uh as a

3839.92 -> compared to the sql inspired syntax of

3842.799 -> uh

3844.079 -> opencypher gremlin is more of a stream

3847.28 -> processing type language where data is

3850.079 -> pulled in from kind of one side of a

3851.92 -> step some aspect or pro or process is

3854.24 -> done on it and it sets on to the next

3856.24 -> one so in this case

3857.68 -> um i'm saying

3859.52 -> uh g is just a convention for kind of

3862.96 -> starting with a graph

3864.88 -> in this case you can sort of read this

3866.96 -> query as starting with the graph i want

3869.2 -> to find all of my vertices

3872.079 -> and then i want to filter this down to

3873.68 -> only having ones where the code for the

3876.799 -> property code matches asc anc excuse me

3880.16 -> for the anchorage airport and then

3881.92 -> element map basically says bring me back

3883.92 -> all of the properties associated with

3885.599 -> that um if i go in and i run this

3888.559 -> um we'll see i get back a very similar

3890.88 -> uh i get back the similar set of

3892.799 -> information uh the formatting is

3894.4 -> different between them um

3896.72 -> but i can also view this one as a graph

3898.96 -> as well um so this is the sort of pieces

3902.319 -> and functionality there um

3905.28 -> let's kind of you know that's the very

3907.359 -> basic uh way to use both the neptune

3909.359 -> notebooks as well as oc let's take a

3911.039 -> little bit uh you know a little bit of a

3913.28 -> more interesting look at some of the uh

3915.359 -> more complex things let's start using

3917.76 -> some of that pattern-based syntax so in

3919.92 -> this case i not only am i starting with

3922.16 -> at the location of the anchorage airport

3924.96 -> i have added this additional

3926.64 -> functionality with these dashes and

3928.88 -> these squares to basically say

3930.88 -> i want to now traverse out from this

3933.52 -> anchorage airport i want to traverse out

3935.28 -> any edges that are specified as a route

3938.319 -> to a destination airport so basically i

3940.559 -> want to find anywhere i can fly to or

3942.48 -> from anchorage

3944.16 -> i go and i run this we see that comes

3946.559 -> back very quickly

3949.28 -> and i'm able to graph this out in such a

3951.28 -> way that i can see

3953.039 -> of

3953.92 -> basically i i can see all of the

3956.079 -> airports that are connected into the

3957.76 -> anchorage airport itself

3959.599 -> um if i wanted yet again i could come in

3961.76 -> here and start going through and looking

3964.559 -> at different

3966.16 -> uh properties associated with these

3967.92 -> these items as well as properties

3969.68 -> associated with the edges as i mentioned

3971.28 -> earlier edges can have properties inside

3974.319 -> of the

3975.68 -> edges can have attributes representing

3977.359 -> properties of those edges in this case i

3979.52 -> have this property called dist which

3981.039 -> represents the distance that i need to

3982.799 -> fly in order to get there

3984.72 -> this is very powerful to be able to

3986.48 -> create these sorts of patterns

3988.64 -> that work to filter very efficiently and

3991.359 -> effectively

3994 -> i'm not going to go through all of the

3995.2 -> examples here with gremlin uh exactly

3997.119 -> what's going on but these are you know

3999.2 -> i'm able to run the same sort of queries

4001.359 -> in both opencypher gremlin

4003.52 -> get back very you know get back answers

4005.599 -> that are the same sort of answers

4008.079 -> um

4008.96 -> i'm going to jump next into

4011.44 -> when uh one of the things didn't mention

4013.039 -> about property graphs is when i'm

4014.88 -> running uh when i'm creating

4016.96 -> relationships inside of a property graph

4018.64 -> those relationships have directions

4020.4 -> associated with them

4022.48 -> there's a directionality aspect to it

4024.24 -> you have you know this relationship is

4026 -> going from some place to some place

4029.119 -> you can represent that inside of your

4031.76 -> graph query languages uh in the case of

4033.839 -> opencypher you use

4035.839 -> you represent this using arrowheads kind

4037.92 -> of pointing in the direction that you're

4039.599 -> looking for

4040.72 -> um so in this case as opposed to finding

4042.72 -> all of the area it's supposed to find

4044.96 -> all of the airports that i could fly to

4047.28 -> or from anchorage from uh i am now only

4050.4 -> finding those places i can fly from

4052.48 -> anchorage to so anchorage is going to be

4054 -> my start location and these are the

4056.24 -> airports i can fly to

4058.079 -> we can represent the same thing inside

4060.4 -> of gremlin um using a different set of

4063.28 -> steps that specifies the directionality

4065.359 -> here but

4066.799 -> the key piece to know here is you know

4068.559 -> edges inside graphs have directions

4071.2 -> and you can use that directionality as a

4073.2 -> filtering criteria

4075.119 -> you can sort of think about this if you

4076.48 -> want to think about it from the the

4078.16 -> construct of a

4080.799 -> like something like a social network if

4082.24 -> you're on twitter you might follow

4083.44 -> somebody but that person may not follow

4085.2 -> you back something like that so using

4087.76 -> those that directionality as an aspect

4089.359 -> of your data modeling is kind of a

4090.64 -> really strong feature of of graph

4093.359 -> databases

4094.64 -> um

4095.52 -> and you can keep extending this out you

4096.96 -> can extend this out in this case i'm not

4098.64 -> just finding everywhere i can fly to but

4100.64 -> i'm finding

4101.839 -> everywhere

4102.96 -> that i can fly to from anchorage that i

4105.04 -> can then fly to austin so basically

4107.6 -> i can want to find you know in this case

4109.359 -> two hops i'm specifying the exact number

4111.12 -> of hops so if you want to think yet

4112.96 -> again in a social network sort of

4114.319 -> segments this is like the friends of my

4116 -> friends sort of idea

4119.279 -> but when it really comes down to it

4121.199 -> these you know even these sorts of

4122.48 -> queries are ones that other databases

4124.08 -> can probably handle you can probably

4125.6 -> write a query inside of

4128.64 -> sql uh you've probably written similar

4130.64 -> queries where you'll be able to join

4131.839 -> tables together multiple times really

4134 -> the power of graph query language starts

4136.159 -> to happen when you have these variable

4138.319 -> length queries

4140 -> and these are represented naturally

4141.759 -> inside both gremlin oh and the open site

4144 -> for query language has really native

4146.319 -> constructs of the language to support

4147.759 -> this sort of thing so in this case i

4150 -> want to you know in this case my query

4151.92 -> is finding all of the flights from

4153.6 -> anchorage to sydney that are within four

4156.08 -> hops so this is where this um

4159.759 -> you know this variable length query

4161.52 -> syntax starts in in the case of

4163.279 -> opencypher you do this using this star

4166.52 -> 1.4 in this example and it's going to

4169.759 -> find me the i i put a limit on this

4172 -> because otherwise it's going to return

4173.199 -> quite a lot of data but this is going to

4175.44 -> you know

4176.319 -> you can start to see how you can use

4177.679 -> this to

4179.199 -> very quickly and efficiently

4181.12 -> find these sorts of variable length

4182.88 -> connections that in ways that if you

4184.799 -> were going to do this in other

4186.48 -> technologies you'd have to write like

4188.56 -> recursive functions or some sort of cte

4191.52 -> common table expression in order to be

4192.96 -> able to return it

4194.32 -> as i mentioned this is supported both

4195.92 -> natively in opencypher uh or inside of

4199.6 -> the gremlin query language here so

4202.48 -> that's a little bit of

4204.48 -> of kind of the

4206.239 -> the basics of graph query languages how

4208.8 -> you sort of write these let's let's jump

4210.8 -> right in and maybe take a moment to

4213.04 -> actually look at what this looks like

4214.32 -> for a real use case and in this case

4216.4 -> we're going to look at security graphs

4218.08 -> we're going to look at a security graph

4219.52 -> use case and why you might want to use a

4221.36 -> graph to do this

4222.96 -> and we're going to do this by graphing

4224.56 -> our aws resources

4226.4 -> you know we're all sitting here working

4228.48 -> on top of aws let's take a look at what

4230.88 -> our resources look like and what it

4232.719 -> looks like as a craft you know

4235.12 -> your your resources in aws kind of are

4237.36 -> naturally lend themselves to graph

4239.36 -> representations because there's a lot of

4241.84 -> connections between these items and

4243.36 -> these connections are really in many

4245.199 -> cases the keys things that you want to

4247.28 -> look at let's start by just taking a

4249.199 -> look let's see if we have any policies

4251.679 -> out there that are potentially insecure

4253.52 -> maybe these are anywhere where you have

4255.12 -> an administrator the administrator

4257.12 -> access policy

4258.4 -> or you have created any policies where

4261.12 -> you have a star is part of the the

4264.159 -> document text we can sit there and

4266.08 -> basically take a look very quickly and

4267.76 -> see that we have two of these policies

4269.44 -> out there we have an administrator

4270.719 -> access policy as does everybody also

4273.679 -> somebody has created this this policy

4275.6 -> called um opens3

4278.4 -> not sure why somebody decided that we

4279.92 -> should you know create an open s3 policy

4281.84 -> but we should probably investigate that

4283.28 -> a bit further

4284.56 -> and see exactly what how that's being

4287.04 -> used you know the first part here is

4288.8 -> let's let's take a roll let's take a

4290.48 -> moment and see if there are any roles

4291.92 -> using this because you might have a

4293.12 -> policy but if it's not being used may

4295.04 -> not be a big deal um if we take a look

4297.36 -> at this we can see that we have you know

4299.44 -> a few different uh roles be using this

4302 -> we have this role called altimeter

4303.36 -> that's using both this open s3 policy as

4305.6 -> well as the administrator policy so

4307.52 -> we're going to probably want to take a

4309.28 -> moment and dive a little bit more a

4311.52 -> little bit more deep a little deeper

4313.12 -> into some of these

4314.48 -> uh

4315.28 -> how these are being used to see if we

4316.719 -> have any potential security issues so

4319.28 -> you know let's let's jump to the next

4320.8 -> step and basically say okay we have

4322.4 -> these roles we have these access what

4325.76 -> resources inside my system are actually

4327.76 -> using it so what we can see

4329.76 -> is that there's an ec2 instance that's

4332.08 -> basic that is using this opens three

4335.04 -> role we can also come in here and see

4337.28 -> that there's another ec2 instance that's

4339.84 -> using this i need admin access role

4342.8 -> probably you know do they really need

4344.56 -> admin access i don't know we probably

4345.92 -> need to do a bit further investigation

4348.159 -> into that as well and take a look at

4350.159 -> what's going on there

4353.44 -> so

4354.64 -> what all connects to these resources as

4356.239 -> we can see we can start to we're

4357.6 -> starting to build out kind of a a much

4360.88 -> uh more robust query to basically find

4364.56 -> everything all these policies now we're

4366.4 -> finding roles now we're looking at all

4368.32 -> of the things that are connected to this

4369.76 -> and we can kind of see that we're really

4371.199 -> building upon this query in an iterative

4373.12 -> nature

4374.159 -> um and we're using that to start to see

4376.239 -> more and more information data about our

4378.4 -> system

4379.44 -> based on the connections and the

4380.719 -> connectivity of these systems we can

4383.12 -> start to see there's a lot of things

4384.48 -> connected to this there's a lot of

4385.92 -> different vpns and internet gateways uh

4389.04 -> looks like there's a database uh

4390.88 -> connected to this over here so this

4392.96 -> fact that we have these very permissive

4395.12 -> policies is something we probably want

4397.199 -> to take a look at uh from a security

4399.199 -> perspective inside of our own system

4401.199 -> here

4402.32 -> um

4403.36 -> you can then take this and build it out

4405.76 -> further and further in this uh

4407.84 -> and as you're doing this you can also

4409.44 -> filter it down to being able to only

4411.76 -> show things that are more interesting uh

4414.239 -> to the specific question trying to

4415.52 -> answer you know for example

4417.44 -> i don't necessarily need to look at the

4419.84 -> vpcs associated with this because i'm

4422.239 -> really more interested in what ec2

4424 -> instances are using these very

4426.239 -> permissive

4427.679 -> policies and roles inside of my aws

4430 -> architecture

4431.52 -> um

4434.32 -> and then finally i want to be able to

4436.56 -> you know potentially look at potential

4438.88 -> security of threat vectors associated

4440.88 -> with this i want to see not just that

4442.159 -> these ec2

4443.44 -> our instances are using these rules but

4445.28 -> are any of these actually exposed to the

4446.88 -> internet and in this case as we can kind

4449.36 -> of see both of these are actually

4451.36 -> exposed to the internet as well so we

4453.6 -> would definitely want to go

4455.12 -> look at locking these sorts of things

4456.8 -> down

4458.719 -> to be able to find you know to be able

4460.48 -> to make sure that we're securing our

4461.92 -> infrastructure appropriately

4463.92 -> so this is just kind of a quick

4466.48 -> use case a quick example of how you

4468.4 -> might want to use uh you know a real

4471.28 -> world type of use case for a security

4473.679 -> graph on top of

4475.52 -> the amazon neptune graph database

4479.04 -> if you're interested in trying out that

4481.12 -> demo yourself what you uh we have a

4483.76 -> couple of resources for you to start

4485.36 -> with

4486.56 -> the tool that i use to map out all of my

4489.44 -> aws infrastructure is an open source

4491.6 -> project from tableau called uh called

4493.679 -> altimeter excuse me

4495.28 -> in it under the cover what it does is it

4497.52 -> goes out uh with sufficient permissions

4500.08 -> it goes and reads the different

4501.92 -> configurations of your aws resources and

4504.08 -> generates a graph which can be stored in

4506.48 -> amazon neptune

4508 -> if you're interested in how to do this

4509.28 -> uh the second link i have there actually

4511.36 -> is a blog post uh on how to exactly do

4514.8 -> that on top of neptune how to install it

4517.12 -> correctly how to make it uh and then how

4519.12 -> to work it with it inside of

4522.4 -> neptune

4525.92 -> so as we're kind of wrapping up here i

4527.44 -> wanted to give you a few additional

4528.8 -> resources on how you can kind of get

4530.56 -> started uh with using neptune um

4534.08 -> first and and the first link you see up

4536.719 -> here is the link to the neptune

4538.32 -> notebooks or the graph notebook project

4539.92 -> the one that i used for this demo that

4541.76 -> provides that ide type application in or

4544.88 -> the ide uh interface for neptune

4548 -> if you're interested in how you can use

4550.239 -> neptune with some reference

4551.44 -> architectures are for it you can follow

4553.04 -> the second link here

4554.56 -> we also have a set of full stack or a

4556.719 -> set of applications from full stack

4558.56 -> applications to partial applications

4560.56 -> sample applications that the third link

4562.88 -> here will show

4564.719 -> and lastly if you're just interested in

4566.4 -> how other customers are using this

4568.64 -> what blogs we have code samples videos

4571.28 -> you can use uh the last link here will

4573.44 -> take you to that those areas

4575.76 -> specifically uh for amazon neptune

4580.719 -> thank you for uh listening to me today

4583.04 -> uh once again my name is dave beckberger

4585.28 -> and you can find me on twitter at

4587.48 -> bachbd

4589.36 -> thank you

4599.12 -> you

Source: https://www.youtube.com/watch?v=Y6jbFC8tvVw