Graph Database introduction, deep-dive and demo with Amazon Neptune  - AWS Virtual Workshop

Graph Database introduction, deep-dive and demo with Amazon Neptune - AWS Virtual Workshop


Graph Database introduction, deep-dive and demo with Amazon Neptune - AWS Virtual Workshop

With Amazon Neptune you can build and run identity, knowledge, fraud graph, and other applications with performance that scales to more than 100,000 queries per second. Neptune allows you to deploy graph applications using open-source APIs such as Gremlin, openCypher and SPARQL. Since Neptune is a fully managed database service, there is no need to worry about hardware provisioning, software patching, setup or backups. Many people are not familiar with graph databases which is why this workshop will introduce the cutting-edge use cases for graph databases that span fraud detection to personalization. This workshop will cover the architecture and all of the key features of Neptune. We will also use this time to do a demo of Neptune in action.

Learning Objectives:
* Objective 1: Understand the benefits of Amazon Neptune by going over use cases including fraud detection, personalization and advertising targeting.
* Objective 2: Dive deep into how open-source APIs can be used to deploy applications in Neptune.
* Objective 3: We will also use this time to do a demo of Neptune in action.

***To learn more about the services featured in this talk, please visit: https://aws.amazon.com/neptune/ Subscribe to AWS Online Tech Talks On AWS:
https://www.youtube.com/@AWSOnlineTec

Follow Amazon Web Services:
Official Website: https://aws.amazon.com/what-is-aws
Twitch: https://twitch.tv/aws
Twitter: https://twitter.com/awsdevelopers
Facebook: https://facebook.com/amazonwebservices
Instagram: https://instagram.com/amazonwebservices

☁️ AWS Online Tech Talks cover a wide range of topics and expertise levels through technical deep dives, demos, customer examples, and live Q\u0026A with AWS experts. Builders can choose from bite-sized 15-minute sessions, insightful fireside chats, immersive virtual workshops, interactive office hours, or watch on-demand tech talks at your own pace. Join us to fuel your learning journey with AWS.

#AWS


Content

3.06 -> [Music]
7.279 -> hello everyone
8.559 -> my name is dave beckberger i am a senior
10.8 -> graph architect on the amazon neptune
12.799 -> service team and i'm here today to talk
14.799 -> to you a little bit about graph
15.679 -> databases and then it's going to go into
17.68 -> a little bit of a deep dive in a demo on
19.84 -> amazon neptune which is aws's
21.92 -> purpose-built graph database offering
24.48 -> so let's jump right into it
26.88 -> so today when we work with customers we
29.76 -> see graphs being used for all types of
31.599 -> applications why is this
34 -> because graphs are very good at modeling
35.84 -> relationships that are not necessarily
37.84 -> easily represented or retrieved with
40.399 -> other types of databases out there today
43.12 -> if we look at this simple graph
44.399 -> representation it's pretty easy without
46 -> me having to tell you any other
47.2 -> additional information that you can
48.559 -> figure out that alice lives in any town
50.8 -> and that she works with bob
53.12 -> this is because greg you know this
55.039 -> represents how graphs and the graph the
57.68 -> graph way of looking at problems is very
60 -> intuitive to people
61.6 -> because they rep it does a very good job
63.28 -> of representing the natural way that we
65.519 -> think about data and connections in this
67.84 -> example you know
70 -> we're looking at these and we have what
71.76 -> we would consider from a graph
72.96 -> perspective a couple of nodes those
74.799 -> nodes representing the entities or the
76.56 -> real world objects here in this case
78.08 -> being alice bob in any town
80.32 -> and we have these these lines or these
82.32 -> connections between things which
83.759 -> represent the relationships between
85.439 -> these real world objects in this case
87.2 -> lives in and works with and when we look
89.52 -> at this and because of the way that
91.2 -> graphs and graph databases store this
93.28 -> data it really they really allow you to
96.079 -> explore these relationships and patterns
98.24 -> and this type of connected data in ways
100.479 -> that other can't uh whether ways of
102.72 -> other data stores and data structures
104.56 -> can't
106.88 -> customers are also very excited about
108.479 -> graphs and they're really especially
109.92 -> excited about managed graph services
112 -> since neptune was released in may of
114.079 -> 2018 customers have built many types of
116.64 -> applications on top of neptune but when
118.88 -> we think about it uh when we when we're
120.88 -> working with customers we kind of
122.159 -> broadly general generalize excuse me
124.88 -> these these
127.119 -> applications into a couple of common use
129.44 -> cases we see
130.72 -> the first common use case we see with
132.48 -> neptune is fraud detection and this is
134.72 -> exactly what you think it is we're
136.08 -> trying to use graphs to help find the
137.84 -> bad guys graphs are uniquely helpful for
140.48 -> fraud detection as they really enable us
143.44 -> as developers and as users of the system
145.68 -> to find the deep links and deep
147.28 -> connections and patterns of connections
149.52 -> in the data that really don't aren't
152 -> easily uh easily found using other sorts
155.04 -> of uh systems out there
157.92 -> the second use case we we generally see
159.92 -> is what we call identity graphs and
161.84 -> identity graphs are really based on the
164.16 -> concept of you know something like a
166.64 -> user is going to come to your website
168.4 -> for multiple different uh
170.56 -> different areas maybe they are going to
172.08 -> connect to it from their phone and their
173.599 -> work computer and their home computer
175.599 -> and their tablet and we really want to
177.28 -> be able to connect these these
179.599 -> these disparate interactions with the
181.84 -> user together in such a way that we can
184.8 -> kind of general uh generate a golden
187.04 -> record of kind of the the canonical form
189.68 -> of that data that a user can actually
192.72 -> so we can actually use that to help
194.159 -> provide things like personalized
195.599 -> recommendations or marketing ad
197.84 -> segmentation things like that
200.239 -> third use case we we often see with
202.239 -> customers are one of knowledge graphs or
204.56 -> knowledge organizations
206.319 -> and this is a really about connecting
208.159 -> disparate data silos inside of a company
210.64 -> together in such a way that you can
211.92 -> really get a holistic view of a specif
214.64 -> of of pieces of information and things
217.2 -> that are connected to that piece of
218.4 -> information
219.92 -> let's say you're something like an
221.36 -> e-commerce website and you have a
223.36 -> database that contains all of your
225.04 -> information about the products you sell
226.72 -> and one that contains all this
228.08 -> information about your customers and one
230.239 -> that contains all the information about
231.599 -> your inventory inside your warehouse
233.92 -> maybe you have another one that's
234.799 -> shipping and you want to be able to
235.76 -> connect all of these together in such a
237.439 -> way that you can look at all of the
239.76 -> information around a specific product or
241.76 -> customer or user something like that
244.159 -> and that would kind of fall into what we
245.439 -> call a knowledge graph type use case
248.64 -> the last one we have is the last common
250.4 -> use case we've seen is security graphs
252.239 -> and we've recently seen a real uptick in
254.239 -> customers interested in using graphs for
256.639 -> security based systems
258.32 -> security in general is sort of a graph
260.479 -> problem because it's really you know
262.24 -> security of for anything being physical
264.72 -> or logical or application security or
268 -> your cloud infrastructure security is
270.4 -> really about layers about multiple
272.4 -> different layers of security all being
274.08 -> connected together
275.68 -> in such a way that you want to be able
277.84 -> to look at that to be able to find
279.28 -> potential uh paths of of potential
283.6 -> um you know malfeasance or paths through
286.56 -> the graph and how things may or may not
289.199 -> be exposed to the internet for say when
291.52 -> they should or shouldn't be um so that's
293.68 -> sort of kind of what us you know one one
296.08 -> way we can look at security graphs and
298.4 -> this is really kind of a really
299.6 -> interesting and fast upgrading or
301.52 -> upcoming segment for uh graph based
304 -> solutions
305.28 -> so what are some other common business
306.96 -> problems that you think uh that would
308.8 -> work with customers when we think about
310.32 -> graphs and graph type problems problems
312.16 -> really needing those connect highly
313.44 -> connected data well as i kind of
315.68 -> mentioned already the first one is
316.96 -> people come to us saying they need to do
318.32 -> things like be better at detecting fraud
320.24 -> and fraudulent transactions inside their
322.08 -> system
324.16 -> maybe they want their customers to have
326.08 -> a better or more personalized
327.759 -> recommendation experience than they're
329.44 -> able to provide today
331.68 -> there's that knowledge graph use case if
333.199 -> we want to connect together those siloed
335.12 -> data sources inside our enterprise to
336.88 -> really kind of build out a you know an
338.4 -> entire
339.44 -> an entire platform that contains all of
341.84 -> the knowledge inside of our system
344.32 -> maybe you have those multiple websites
346.24 -> or you have multiple applications and
347.919 -> you need to link together disparate
349.919 -> customer identities in these systems to
351.759 -> kind of get that canonical or that
353.52 -> golden record
355.199 -> you know that
356.639 -> maybe you have machine learning
358.319 -> algorithms and you want to be able to
359.44 -> use the connections in your data to
361.44 -> improve those algorithms to give you
363.039 -> better sorts of answers
366.319 -> these sorts of questions or ones that
368 -> kind of are if you went out and you you
370.08 -> looked online you did some searching
371.68 -> around you would probably come back
373.039 -> these sorts of questions are really good
375.84 -> common business use cases for graph type
378.479 -> problems but there's also a wide array
381.12 -> of not so easily recognized graph
383.039 -> problems ones that when you
384.56 -> you know at first glance may not make
386.08 -> sense or you may not think of them as
387.44 -> graph problems but they really do lend
389.68 -> themselves very well to being solved
391.44 -> using graphs
392.8 -> first one example is you know what are
394.72 -> the risks in your it infrastructure your
396.639 -> supply chain you know any sort of i.t
399.28 -> infrastructure or supply chain it tends
401.759 -> to get very complicated very quickly
403.6 -> there's a lot of you know in the case of
404.96 -> a supply chain you have a lot of people
407.039 -> that are you know you have a lot of
408.4 -> products each of those products has its
409.919 -> own bill of materials each of the bill
411.759 -> of the items in those bill of materials
413.919 -> has one or more suppliers and those
415.759 -> suppliers have suppliers and those
417.52 -> suppliers have suppliers have suppliers
419.199 -> and being able to kind of look at the
421.12 -> overall risk portfolio of your supply
423.759 -> chain or your iq
425.199 -> infrastructure is a great use for our
427.039 -> graph
428.16 -> where did this data come from this is
430 -> this sort of question of being able to
431.599 -> track the lineage or the provenance of
433.84 -> data is one that we often talk with data
435.759 -> engineering teams about uh
438.4 -> these lends itself very well to a graph
440.4 -> because if you think if you can think
442.08 -> about it when you're sort of working
443.759 -> with you know any sort of data
445.199 -> engineering problem it's really about
446.72 -> taking data from one source doing some
448.72 -> sort of extract transformation and load
451.84 -> type process and usually loading it into
453.52 -> something else maybe you're combining or
455.599 -> aggregating that data but you probably
457.68 -> but when you aggregate that data you
459.199 -> still want to be able to track where did
460.639 -> this data come from maybe you need to be
463.039 -> able to track this to be able to comply
464.479 -> with privacy regulations like cccp or
466.879 -> gdpr and you want to be able to track
470 -> not only where this data is currently
471.919 -> but where where all the places it was
473.36 -> used intermittent
475.12 -> intermediately to be able to you know
478.24 -> clean up those things to be able to
479.84 -> understand the accuracy and the
482.96 -> efficacy of the data that you're working
484.56 -> with
486.479 -> why don't your search results relate to
488.56 -> the specific question was asking this is
490.4 -> a common use case we we see with
492.16 -> customers is they have search results
493.759 -> but their search results are a little
495.12 -> bit lackluster and a little
497.12 -> less clear than they would like them to
498.96 -> be so being able to use a graph to be
501.12 -> able to build that you know
503.759 -> those sorts of knowledge graphs where
505.039 -> you're connecting these data together to
506.4 -> be able to give you more relevant and
508.72 -> related answers to the types of
510.4 -> questions people are asking
512.959 -> how does person x have access to
515.36 -> information why
517.68 -> this is a security another security
519.919 -> graph type use case where you really
521.279 -> want to be able to you know maybe map
522.88 -> out the permissions to that folders and
525.519 -> files have by active directory groups i
528.16 -> was working with one customer where
529.839 -> their specific use case was they were
531.519 -> you know looking they wanted to
533.2 -> specifically look for
534.88 -> uh access or people that have acts that
536.959 -> had access to certain files and folders
539.519 -> through a very very long set of
541.2 -> connections because if they had a you
542.72 -> know if
544.399 -> i was giving direct access to this file
546.72 -> or folder it probably means i was
548.16 -> intended to have it but if i had to do
549.76 -> that if i had access to this file and
551.839 -> folder through multiple sets of groups
553.6 -> and permissions maybe it was an
555.2 -> unintended consequence of giving me a
557.519 -> permission was giving me access to this
559.36 -> this critical business information and
561.44 -> they want to be able to kind of track
562.64 -> that down and look for that sort of
564.399 -> thing
566.24 -> and you know things like
568 -> about your cloud infrastructure your
569.44 -> cloud infrastructure is a very good
571.76 -> example of a security graph type use
573.6 -> cases
574.8 -> being able to look at how different
576.48 -> things are used inside of it through a
578.64 -> wide array and a very large number of
581.36 -> variably connected data if you think
582.88 -> about your cloud infrastructure you're
584.16 -> going to have things like iam policies
586.399 -> which are connected to roles which are
587.92 -> then going to be connected to one of any
589.6 -> number of different types of entities be
591.44 -> they lambda functions or databases or
594.08 -> ec2 instances and you have a very
596.32 -> variably connected set of
598.8 -> entities that you're trying to look at
600.64 -> but when i really kind of sit down if i
602.16 -> wanted to kind of boil down graph
603.76 -> questions the types of questions that
605.2 -> graphs are good at answering
607.12 -> really for me it comes down to the where
609.04 -> why and how questions you know where are
610.56 -> the risks where did the data come from
612.959 -> why don't the search results do
614.399 -> something how about this person how is
616.72 -> this role being used things like that
618.88 -> and these tend to be good uh graph
621.12 -> questions because they have a few things
622.56 -> in common first they tend to navigate
625.279 -> variably connected structures of data
627.44 -> you know especially if you wanted to
628.48 -> think about this
629.68 -> in terms of looking at your cloud
631.519 -> infrastructure
632.64 -> the cloud infrastructure as we kind of
635.2 -> discussed a minute ago is really it's a
637.6 -> highly variable set of different
639.04 -> entities you have you know vpcs you have
641.92 -> enis you have iam rules iam policy all
645.12 -> of these are connected together against
646.64 -> other ec2 instances or databases and
648.959 -> being able to look at that and easily
651.44 -> move through that sort of information is
653.68 -> an area where graphs tend to excel
657.279 -> they also send to excel at questions
659.36 -> where you need to filter or compute a
661.279 -> result based on the strength weight or
663.2 -> quality of a relationship so in the case
664.959 -> of something like a supply chain risk
666.88 -> management being able to look at not
668.8 -> only the fact that these two things are
670.32 -> connected but how important is this
672.32 -> supplier to this person what other sorts
675.279 -> of
676.32 -> info what other sorts of backup
678.16 -> suppliers may they have for specific
679.839 -> things to be able to use that to help
681.12 -> calculate an overall risk of your supply
683.12 -> chain or where the risk or to find the
684.959 -> riskiest parts of your supply chain
687.04 -> is an example where using you know where
689.279 -> using the the connections in your data
691.76 -> is extremely important to be able to get
693.36 -> that answer
695.44 -> and finally recursing or requiring
698 -> traversing unknown numbers of
699.68 -> connections and this is really an area
701.6 -> where graphs really do excel
704.16 -> um and this is where you have questions
706 -> that are a bit open-ended you know let's
708 -> take a look at the example of how does
709.519 -> person x have access to information why
711.839 -> you know they may have been given direct
713.279 -> access to this information or they may
715.36 -> have this information or access to this
717.44 -> information through a wide array of
719.68 -> different connections through maybe
721.04 -> different active directory groups and
722.48 -> things of that nature
724.72 -> but you won't know exactly the number of
726.72 -> connections or how they're connected at
728.72 -> the time you're initially or you're
730.399 -> you're originally looking at the query
732.56 -> all you know is you want to find out how
733.839 -> two people or in two entities are
736.56 -> related inside this so this is really
738.72 -> sort of when i think about graphs and
740.399 -> graph type problems these are the sorts
741.839 -> of problems i really look for
744.88 -> where graphs benefit the you know the
747.36 -> the end use case quite
749.44 -> significantly
753.12 -> and why is this well there's a few
754.639 -> challenges around using
756.8 -> many other technologies with highly
758.959 -> connected data
760.32 -> first there they tend to be a little
762.079 -> unnatural for querying that data
764.56 -> and this tends to lead to
766.48 -> an inefficient processing of that sort
769.279 -> of information
771.36 -> and most other databases out there or
773.2 -> other data technologies out there tend
774.56 -> to have a rigid schema that's really
776.32 -> inflexible for rapidly changing data
779.92 -> that we that most of the types of use
782.639 -> cases we've talked about today be they
784.32 -> fraud graphs or knowledge graphs or
786.24 -> security graphs or identity graphs
788.639 -> really tend to require
790.8 -> so let's dive a little bit into that
792.24 -> what is it about graphs that actually
794.8 -> make them better to handle this sort of
796.88 -> highly connected data
798.639 -> well the first aspect here is the query
800.88 -> languages the query languages that we
802.639 -> use with graphs are really optimized to
805.04 -> use the connections
807.04 -> to move through through the network of
809.2 -> data that you're looking at and this
810.8 -> comes down to the fact that graphs
812.399 -> databases and manage graph services are
814.399 -> based on graph theory and one of the
815.839 -> kind of key pieces of graph theory is
817.68 -> this concept of traversing your data or
819.76 -> moving from point a to point b so if we
822 -> what looked at this specific example
824.32 -> we're looking at this and it says dave
826.079 -> works at amazon um the little gremlin
828.639 -> guy is sort of uh representing that's
831.44 -> that's the logo for the apache tinker
833.12 -> pop gremlin project and that's
834.56 -> representing kind of where we are in our
836.8 -> information today when i write queries
839.199 -> and graph query languages
841.36 -> as opposed to kind of uh you know when
844 -> the way you we work with them we'll see
845.6 -> an example of this later it really
847.76 -> taught you know they really work by
849.279 -> moving data from point a to point b so
851.519 -> you're i'm moving through my my graph or
853.519 -> my network from dave to amazon if we
856.16 -> want to contrast this with something
857.6 -> like a relational database relational
860 -> databases work on relational algebra set
862.72 -> set out algebra and they work by
864.8 -> combining sets of data so if we wanted
866.399 -> to kind of look at this in the same you
868.56 -> know the same thing uh the same example
870.48 -> here i would probably have a table
872.56 -> called something like person in a table
874.399 -> called something like company i would
876.48 -> perform a joint on this in order to get
878.24 -> the fact that dave is a person at a
880.16 -> company um
881.92 -> that i work at that company through some
883.68 -> sort of foreign key between those tables
886.24 -> and
887.04 -> the way you know at its very core kind
888.72 -> of the way that that relational
891.04 -> databases work as opposed to moving from
892.88 -> point a to point b in a graph you know
895.12 -> when i move it from point a to point b i
896.639 -> don't necessarily mean unless i
897.839 -> explicitly ask for it i don't
899.199 -> necessarily maintain the history of
900.959 -> everywhere i've been in relational
902.88 -> databases as i'm joining these tables
904.8 -> together i'm building a bigger and
906.32 -> bigger table in memory
908.639 -> in theory in memory that basically is
911.04 -> containing all of the information and
912.56 -> all of the history of where i've been
915.279 -> this is why when you start running large
917.44 -> queries that have to traverse or move
919.6 -> through a lot of data in order to get
921.279 -> there
922.32 -> you know graph databases because i'm
924.56 -> moving point a to point b as opposed to
926.48 -> building a bigger and bigger in-memory
928.16 -> table or more efficient uh from a memory
930.32 -> perspective and a speed perspective
932.32 -> we're gonna do it
933.68 -> the other aspect there is graph
935.6 -> databases are really optimized for
937.839 -> processing connected data at the kind of
940.48 -> engine level let's you know because
942.399 -> graph databases store not just the
944.959 -> entities but the connections
947.04 -> this really gives them the you know uh
949.36 -> the advantage of the fact that the
950.56 -> connections that you're working on are
952 -> data itself this means that they're
953.6 -> physically saved to disk so when i need
956.24 -> to actually retrieve this data in order
958.24 -> to know i want to move from point a to
959.68 -> point b across the connection i'm really
962.079 -> just reading data again i'm reading data
964.639 -> off of disk to retrieve that information
967.12 -> if we contrast this again with something
968.48 -> like a relational database
970.32 -> the worksack connection in this example
972.48 -> is really metadata it would be
973.68 -> represented through something like a
975.12 -> foreign key between those per that
976.639 -> person and company table
978.639 -> so when i want to find out what company
980.8 -> somebody works at i need to actually
982.48 -> calculate that at random time i need to
984.32 -> to
984.959 -> i need to run that relational algebra to
986.88 -> calculate that as opposed to being able
988.72 -> to retrieve from disk so when i start
991.279 -> needing to process
992.639 -> hundreds or thousands or hundreds of
994.72 -> thousands or millions or billions of
996.16 -> these sorts of relationships the fact
998 -> that i can retrieve them from this
999.279 -> versus calculate them at runtime really
1001.36 -> does lead to a much more efficient
1002.88 -> processing of that sort of information
1007.6 -> and the last kind of uh item i wanted to
1009.44 -> touch on here is a little bit about
1010.72 -> schema flexibility
1012.8 -> when we're looking at these two uh
1015.199 -> these two examples up here we see that
1017.68 -> we have you know these are both
1019.04 -> representations of family trees
1021.68 -> with a graph and with with amazon
1023.759 -> neptune and with most graphs they they
1025.6 -> tend they're they're they're what's
1026.72 -> known as a schema-less database i
1028.48 -> personally not my favorite terminology
1030.4 -> because if you have data you have schema
1032.799 -> so i like to think of it more in terms
1034.4 -> of explicit versus implicit schema so in
1037.199 -> the case of a graph when i start
1039.919 -> with its schema less nature or its
1041.839 -> implicit nature of schema i can just
1044.799 -> start writing information to my system i
1047.36 -> can write a person and i can start
1049.44 -> writing a property of a first name or a
1050.88 -> last name
1052.4 -> i don't have to declare these ahead of
1053.919 -> time i don't have to set up tables or
1056.16 -> keys or constraints around that i can
1058.4 -> just start writing that information to
1060.16 -> my system as i as the data that's coming
1063.039 -> in or the maybe the attributes of that
1064.48 -> data changes or we add new
1066.64 -> uh maybe we add a new type of data a new
1068.72 -> new set of entities to my graph i can
1070.559 -> just start writing those and they will
1071.76 -> be automatically included into the
1073.44 -> schema of my graph
1075.919 -> this provides a lot of flexibility
1077.52 -> especially as data evolves over time if
1079.919 -> we want yet again want to compare that
1081.679 -> and contrast that a little bit to a
1082.88 -> relational database relational databases
1085.28 -> have uh explicit schema i need to
1087.28 -> declare ahead of time that i have an
1088.88 -> individual and that individual has a
1090.559 -> first and a last name
1092.24 -> um
1093.36 -> you know you do this in in in pretty
1095.36 -> much any relational database today
1097.679 -> and this is very powerful until you want
1099.6 -> to start evolving this at scale and
1101.36 -> speed where you have to you know every
1103.039 -> time you go and do this you're gonna
1104.16 -> have to run some sort of
1105.919 -> um you know schema migration or schema
1108.559 -> diff process in order to basically bring
1110.48 -> things up to speed versus being able to
1112.08 -> just start writing the new entities as
1113.84 -> they come in
1115.679 -> there's also a bonus here which is the
1117.2 -> fact that graphs tend to be easier to
1120 -> understand by new people or
1121.36 -> non-technical people that aren't
1122.64 -> familiar maybe with the domain this is
1124.799 -> because when we look at these
1126.32 -> representations you know the
1127.6 -> representation of the graph is a lot
1129.84 -> more natural to the way we already are
1131.36 -> looking at the domain than necessarily
1133.919 -> looking at some erd diagram that you
1135.679 -> have to kind of dissect and put back
1137.84 -> together in order to figure out how
1139.44 -> everything's going to be related between
1141.12 -> them
1143.919 -> so it's a little bit about graphs and
1146 -> and why graphs are better for some sorts
1148 -> of problems than other technologies but
1150.4 -> why is it customers come to us why do
1152.32 -> they want to use a graph database
1153.84 -> service you know as we kind of mentioned
1155.84 -> uh other data traditional database
1157.6 -> technologies out there really aren't
1159.2 -> built to scale when you want to do like
1161.679 -> deep link querying or deep link analysis
1163.6 -> across billions of interconnected
1165.28 -> entities either you know hundreds of
1166.96 -> thousands millions or billions of
1168.4 -> interconnected entities for that matter
1171.2 -> and because of this they really
1172.4 -> challenge it really challenges those
1174.08 -> sorts of engines to deliver the low
1175.84 -> latency required for real-time
1178 -> inspection for of
1180 -> you know things like fraudulent you know
1181.679 -> looking for things like fraudulent
1182.88 -> activity or personalized recommendations
1185.36 -> or other potential you know potential
1187.2 -> malicious activities or things like that
1190.24 -> self-managed solutions also tend to be
1192 -> complex expensive and inflexible
1194.96 -> especially if you want to try and
1196.16 -> optimize them and scale them to the
1197.679 -> global nature of many of today's
1199.44 -> applications they require a lot of
1201.44 -> hardware management provisional
1204.24 -> provisioning you have to manually scale
1206.4 -> up and down these sorts of items and
1208.64 -> being able to do this in a compliant way
1211.36 -> as schema changes and evolves over time
1213.76 -> is very uh you know complex this really
1216.64 -> leads to a lot of these solutions not
1218.24 -> being able to evolve at the speed and
1220.72 -> scale that the landscape of today's data
1222.72 -> is changing
1223.919 -> and this is really why we went and built
1225.76 -> amazon neptune
1227.28 -> amazon neptune is a fully managed
1230.08 -> purpose built graph database
1232.559 -> built for the cloud it was released in
1234.72 -> may of 2018
1236.24 -> and since then we have been continuously
1238.08 -> improving it with new features and
1239.84 -> functionality based on feedback from
1242.24 -> customers
1243.6 -> as i said it's a fully managed service
1245.36 -> so aws takes care of all the hardware
1247.44 -> management we manage the os the database
1250 -> server and we uh we manage all of this
1252.64 -> through
1253.52 -> uh you know
1255.28 -> we manage all of this through a couple
1256.72 -> of ways you know all it takes is you as
1258.72 -> a user a few clicks in the management
1260.96 -> console or a few calls to the api to
1263.44 -> provision a new neptune cluster and
1265.6 -> it'll be up and running in a matter of
1267.2 -> minutes
1270 -> you know one of the core needs of any of
1271.52 -> these sorts of systems is the ability to
1273.28 -> scale based on demand so it's critical
1275.28 -> for it for most businesses to be able to
1277.44 -> support burst you know traffic that that
1280.72 -> fluctuates over over time maybe it's
1283.28 -> seasonal traffic built to burst on black
1285.44 -> friday or thanksgiving maybe it's
1288 -> something uh maybe you only your
1290.559 -> application gets more traffic in the
1292.32 -> evening versus the day and being able to
1294.08 -> dynamically scale is kind of a critical
1297.039 -> aspect of the way we built neptune from
1299.919 -> the uh from the get-go
1302.799 -> and you know the the last aspect here i
1304.799 -> want to kind of talk about is cost
1306.159 -> reduction
1307.2 -> in in many
1308.72 -> and many other database offerings
1310.799 -> because they don't scale very linearly
1313.76 -> you tend to over provision hardware
1316.32 -> and database servers and v cpus things
1319.039 -> like that
1320.48 -> to meet your peak demands because of the
1322.96 -> scalable nature of of neptune
1325.52 -> we really allow you to reduce the cost
1327.919 -> for you know to reduce the cost overall
1329.919 -> and add a lot more predictability to
1332 -> your cost by being able to fluctuate
1334.08 -> yours your hardware that you're running
1336.159 -> your database on up and down as demand
1338.08 -> happens
1339.2 -> neptune uses a pay-as-you-go model so
1341.039 -> you're going to only pay for the amount
1342.32 -> of time your database server is actually
1343.919 -> running so this helps reduce costs
1346 -> further by not over provisioning
1348.24 -> overspending and under utilizing this
1351.6 -> uh as we mentioned neptune at its very
1353.6 -> core is a purpose bill craft database
1355.2 -> that's optimized to store and map
1357.2 -> billions of relationships between
1359.36 -> entities it does this to enable
1361.44 -> real-time connections with millisecond
1363.84 -> query response times it does this
1366.159 -> through
1367.12 -> support for
1368.64 -> the three open standard query languages
1370.96 -> open
1372.24 -> open specification or open standard
1373.84 -> query languages those being open cipher
1376.08 -> gremlin and sparkle
1378.4 -> does this through being able are by
1380.72 -> supporting the two leading graph models
1383.36 -> out there the first graph model out
1384.799 -> there is property graph property graph
1387.28 -> at its very core represents data through
1389.6 -> the use of nodes representing real world
1392.32 -> entities edges representing connections
1394.72 -> between those real world entities and
1396.559 -> attributes representing properties of
1398.72 -> either a node or an edge i think that's
1400.96 -> kind of a key piece to
1402.64 -> add in there one of the unique aspects
1404.96 -> of property graphs is the ability to
1406.96 -> associate
1408.559 -> uh associate properties not only with
1411.2 -> the entities themselves as you can with
1413.28 -> many other database technologies but
1415.2 -> also the ability to associate those
1417.6 -> properties specifically with the
1419.6 -> connection between entities so maybe you
1422 -> have a person and maybe they have a
1423.679 -> product and you have an edge that says
1425.6 -> maybe they were bought well you can also
1427.52 -> store you know besides just saying the
1429.52 -> fact that you know a bought product b
1432.159 -> you can also add a properties or
1434.64 -> metadata to that edge maybe you want to
1436.4 -> add where that you know how they bought
1438.32 -> it to or the date they bought that as a
1440.88 -> property of that connection itself so
1443.12 -> it's kind of a unique aspect of property
1445.12 -> graphs if you choose the property graph
1447.36 -> model in neptune you have the ability to
1449.12 -> query that data through one of two open
1452 -> open specification query languages one
1454.32 -> of those being open cipher query
1456.24 -> language which is a c
1457.6 -> provides a sql inspired syntax for
1459.44 -> customers to use the second being the
1461.279 -> apache tinkerpop gremlin query language
1463.6 -> uh which is a very powerful query
1465.2 -> language that look
1467.279 -> well we'll look at both of these here in
1469.039 -> a little bit but uh look and compare
1471.2 -> them but gremlin is a very almost stream
1473.84 -> processing oriented query type language
1477.039 -> the second graph model that we store in
1479.2 -> neptune or that you can choose to use in
1481.279 -> neptune is the resource prescription
1483.6 -> framework or rdf
1485.2 -> you might have heard this referred to as
1486.88 -> the semantic web construct but rdf
1489.52 -> represents data inside a graph as a set
1491.919 -> of triples each of those triples
1493.84 -> containing a subject a predicate and an
1496.32 -> object so if we go back to the earlier
1498.48 -> example of dave works at amazon
1500.48 -> if we wanted to look at this with a in
1503.039 -> an rdf
1504.559 -> with an rdf data model we would
1505.919 -> represent that data as the subject would
1507.679 -> be dave the object would be worksat and
1510.4 -> the predicate would be amazon
1513.679 -> if you choose to use the rdf data model
1515.76 -> inside neptune we support querying that
1517.84 -> data using sparkle1.1.1
1520.799 -> which is a w3c standard query language
1523.6 -> for
1524.48 -> querying rdf data
1530.96 -> you have thousands of customers are
1532.48 -> using to neptune today in production um
1535.039 -> this is just kind of a really quick
1537.2 -> short list of some of our neptune
1539.039 -> customers across different verticals and
1540.799 -> different use cases
1543.279 -> so
1544.24 -> you know
1545.36 -> what what is neptune how does neptune
1547.12 -> work
1550 -> let's take a moment and look under the
1551.84 -> hood of neptune and how neptune is built
1553.919 -> what its architecture looks like and
1555.919 -> then talk a little bit about some of the
1557.679 -> the the the features and functionality
1560 -> of of neptune
1561.76 -> so when i think about neptune at the
1563.36 -> high level i think the architecture when
1565.679 -> i think about the architecture of
1566.64 -> neptune i sort of think of it in these
1568.32 -> sort of five
1570.32 -> basic areas the application layer the
1573.12 -> compute layer the shared storage layer
1575.36 -> set of service features and service
1576.799 -> integrations one of the unique aspects
1579.039 -> of the general architecture of neptune
1582 -> is that we were with with neptune
1585.12 -> we were able to break out the compute
1587.36 -> layer from the shared storage layer it's
1589.12 -> kind of a very key piece of the
1591.44 -> architecture this really enables us to
1593.84 -> provide a lot of these these features
1595.6 -> that your applications work so let's
1597.52 -> dive in and take a little bit of look
1599.279 -> and take a moment to look at some of
1600.72 -> these
1602.48 -> the first layer here we wanted to talk
1603.76 -> about is that application layer this is
1605.52 -> where you as the developer really live
1607.679 -> and work this is where you're building
1609.12 -> your social networking application or
1610.799 -> your fraud detection application maybe
1612.799 -> it's a knowledge graph or a security
1614.32 -> graph or one of the other use cases we
1615.679 -> talked about but this is where you're
1617.52 -> interacting with it and you do this
1619.279 -> through any of our three query languages
1622.32 -> as we mentioned and this basically is a
1625.12 -> for
1626.24 -> for property graph we basically have a
1627.919 -> set of endpoints exposed both via http
1631.84 -> or
1632.96 -> a websocket or bulk connecting
1634.96 -> connection
1636.799 -> uh if you're using uh property graphs to
1639.039 -> be able to write that data to and from
1640.88 -> the system so you're you're using a set
1642.48 -> of open source drivers or open uh you
1644.64 -> know drivers or
1646.32 -> rest rest and rest api in calls to
1649.2 -> interact with this system this system uh
1652.08 -> these applications are going to interact
1653.919 -> with this compute layer as i mentioned
1655.84 -> the compute layer in in neptune is
1657.84 -> separated out from the storage layer
1660.72 -> uh in a kind of a unique way for cloud
1663.2 -> data or basis
1665.279 -> cloud-based graph database excuse me and
1668.08 -> the compute instances are built to allow
1670.64 -> you to scale dynamic to dynamically
1672.96 -> scale as your application requires it's
1675.279 -> a instance based database so you your
1678.32 -> compute layers are can have up to 16
1680.799 -> different instances there's always one
1682.399 -> writer instance or one primary instance
1684.96 -> that primary instance can be scaled
1686.96 -> anywhere from our smallest being a t3
1689.52 -> medium
1690.48 -> up to an r524xl
1693.2 -> or
1694.559 -> recently we also if you have a high
1696.08 -> memory demands on your application we
1697.679 -> also recently released support for the
1699.44 -> x2g family in lines so you can scale
1701.919 -> that writer up from
1703.52 -> with to any of the sizes available
1705.2 -> within there
1706.399 -> in a vertical manner for reads most
1709.2 -> applicants most graph applications tend
1710.88 -> to be very read heavy
1712.72 -> so we allow you to scale out to up to 15
1715.76 -> different read replicas in on top of
1717.679 -> that same data you're storing that data
1719.84 -> one time each of these read replicas
1721.6 -> basically will read that same data and
1723.44 -> those those read replicas allow you to
1725.679 -> scale
1726.64 -> yet again uh also vertically from a t3
1729.279 -> medium all the way up to r424xl or x2gs
1732.88 -> as well as scale horizontally so for
1735.84 -> from an instance perspective the right
1737.919 -> instances can scale vertically and the
1739.679 -> read instances can scale both
1741.12 -> horizontally and vertically
1743.52 -> all of this compute instance is
1745.36 -> separated out from our shared storage
1747.279 -> layer uh we'll jump into a lot more of
1749.039 -> the details about this in a few minutes
1752 -> but at a high level this is a shared uh
1755.12 -> this is a strong a feature of the cloud
1757.36 -> native aspect of neptune is this shared
1759.84 -> storage layer that's separated from the
1761.52 -> compute layer
1762.799 -> when you write data into neptune that
1764.48 -> data is automatically going to be stored
1766.32 -> six times twice in each of three
1768 -> availability zones
1769.84 -> the data scales independent of your
1771.76 -> compute but it scales automatically for
1773.52 -> you so when you start at the neptune
1775.279 -> cluster it'll start with a provision
1777.44 -> space for 10 gigabytes of data and it
1779.6 -> will as you add more and more data that
1781.44 -> data will that provision space will grow
1783.679 -> up to 128 terabytes
1786.159 -> because it's a fully managed service we
1788.399 -> have a lot of features around automated
1790.32 -> backup and restore functionality for you
1793.679 -> and we have this ability we call it the
1795.279 -> database fast clone which is really
1797.36 -> about enabling you as a user to quickly
1800.48 -> make a copy of your database to maybe do
1802.399 -> something like try out
1804.32 -> a new
1805.44 -> engine version or to try something in
1807.36 -> your application without having to take
1809.6 -> down your production cluster in order to
1811.52 -> make this happen
1813.52 -> neptune also comes with a set of service
1815.52 -> features um
1817.12 -> that are part of the actual native
1818.88 -> implementation itself the first one
1820.48 -> being the ability to bulk load if you're
1822.72 -> going to bulk load data into neptune we
1825.039 -> provide an end point inside of the
1828.08 -> system that allows you to trigger this
1830.08 -> bulk load from an s from s3 files
1833.679 -> uh the s3 files have to be one of a few
1836.08 -> sets of
1837.76 -> specified formats if you're doing
1839.36 -> property graph there's two different csv
1841.2 -> formats you can use if you're doing rdf
1843.84 -> there is three different rdf data
1845.52 -> formats you can use once your data is
1847.52 -> formatted into that you trigger this
1849.039 -> bulk load this bulk load then reads in
1851.12 -> all of that data optimizes and
1852.88 -> parallelizes the load to give you the
1854.559 -> maximum performance for getting that
1856.559 -> bulk data into your system
1859.2 -> we also have neptune streams neptune
1861.039 -> streams is a cdc type functionality on
1864.08 -> top of neptune that enables
1866.24 -> you to basically get a continuous data
1868.88 -> stream or a continuous stream of data
1871.12 -> that
1871.84 -> data changes that are happening inside
1873.279 -> of your graph this can be used to
1875.279 -> trigger some sort of down
1876.799 -> downstream workflows or to be able to
1879.12 -> pot to
1881.12 -> basically propagate those changes to
1882.88 -> other external systems
1885.2 -> we also have a status endpoint on the
1887.36 -> cluster this cluster status endpoint
1889.519 -> basically tells you just basic
1891.519 -> information about the health of your
1892.799 -> cluster
1894.399 -> we have a query profiling and explaining
1896.24 -> endpoints that allow you to take sparkle
1898.72 -> gremlin or open cipher queries run them
1901.2 -> through these and get a very detailed
1903.2 -> look about how the engine is processing
1905.44 -> that data this allows you to then
1907.6 -> go a step further and be able to tune
1909.2 -> and optimize those queries based on the
1911.2 -> information being returned to you from
1913.679 -> those endpoints
1915.279 -> we also have a a feature that allows you
1917.919 -> to auto scale read replicas we'll talk a
1920.399 -> bit about this in a moment but this
1921.76 -> basically uh
1923.12 -> at a high level this feature enables you
1925.919 -> as the user to specify a set of
1928.72 -> thresholds at which point the your
1931.36 -> cluster will will
1932.72 -> will scale up and down to match the
1934.96 -> workload demands based on those
1936.559 -> thresholds
1938.72 -> neptune also comes with a few sets of
1940.559 -> service integrations uh the first
1942.48 -> service integration is an integration
1944.64 -> with the aws backup service so if you're
1946.399 -> an enterprise or you're in your account
1948.72 -> you use aws backup you can now backup
1951.2 -> that data or you can now back up your
1953.2 -> neptune data using that same service
1955.679 -> we also have a feature called neptune ml
1957.919 -> but we'll talk a bit about that here in
1959.76 -> a moment but at a high level what
1961.12 -> neptune ml is is it's an integration
1963.84 -> between neptune and sagemaker to enable
1967.039 -> customers of
1968.64 -> the neptune graph database to build and
1970.96 -> automate the process of building graph
1972.96 -> neural network-based machine learning
1974.48 -> models and using those models to run
1976.72 -> real-time predictions inside of
1979.12 -> queries
1980.72 -> we also have an integration with amazon
1982.64 -> open search and our integration with
1984.399 -> amazon open search enables when you you
1987.12 -> as a user to query
1989.44 -> open search for full text search type
1991.919 -> responses
1993.44 -> or full text search type queries and get
1995.679 -> those data get that information
1997.679 -> propagated back into your your query
1999.919 -> itself
2002.64 -> as i mentioned kind of one of the key
2004.559 -> pieces of the the architecture is this
2006.96 -> this cloud native storage layer that we
2009.2 -> have inside neptune itself
2012.159 -> this is a piece of really
2013.84 -> battle-hardened technology in inside
2015.919 -> amazon where when you write data as i
2018.559 -> mentioned previously that data is
2020.399 -> already is automatically written in a
2022.159 -> highly available and secure manner this
2024.64 -> data is replicated six times two as i
2027.12 -> said two times in each of three
2028.32 -> availability zones
2030.399 -> it's also continuously backed up to
2032.08 -> amazon s3 this storage layer itself was
2034.72 -> built for 11 9's of durability
2037.6 -> and it's based on this this this
2039.679 -> basically 10 gigabyte segment sizes this
2042.399 -> 10 gigabyte segment size as i said when
2044.48 -> you start a cluster you get one ten
2046.24 -> gigabytes
2047.2 -> uh one ten gigabyte data size or data
2050.32 -> segment as you add more and more data to
2052.48 -> your cluster we automatically provision
2054.56 -> these these additional data segments for
2057.04 -> you
2057.76 -> and this data segment is really the the
2059.679 -> unit of repair that this that this
2062.48 -> system automatically uses to heal itself
2064.8 -> and to rebalance hot spots inside of the
2067.76 -> inside of the storage uh layer things
2070 -> like that
2070.879 -> a quorum system is used for reads and
2072.72 -> writes so it's very latency tolerant and
2075.28 -> as i said as i mentioned this storage
2077.359 -> volume will automatically grow up to 128
2080.32 -> different terabytes
2081.919 -> excuse me up to 128 terabytes
2086.159 -> read replicas uh we replicas beyond just
2088.72 -> scaling uh
2090.48 -> late or scaling the number of
2092 -> transactions you can use there's a
2093.28 -> couple of uh really key aspects about
2095.359 -> read replicas
2096.72 -> as i mentioned neptune is an
2098.32 -> instance-based database and you can run
2100.88 -> it with as few as a single instance in
2102.72 -> which case you know that instance will
2104.16 -> be a writer
2105.52 -> when you're doing the if you're running
2107.2 -> in something like a single instance node
2110.4 -> with only a single instance excuse me
2112.96 -> if you have a if there's something that
2114.88 -> goes on with that writer node for you
2116.32 -> for some reason that node fails it will
2118.88 -> automatically be rebooted by the system
2120.72 -> but that can take several minutes to
2122.48 -> come up so you during that time you
2124.72 -> won't be able to serve any traffic
2126.24 -> inside of your apple or any traffic from
2128.48 -> your cluster to any sort of applications
2130.24 -> using it
2131.599 -> if you set up a read replica that read
2133.44 -> replica will basically be used as a
2135.76 -> failover so if that primary writer node
2138.32 -> goes down for some reason that read
2140.72 -> replica will be promoted to the writer
2142.88 -> and the writer node will be uh what was
2144.88 -> the writer node will be restarted as the
2146.72 -> read replicate itself and this really
2149.68 -> uh you know this automatic failure
2151.359 -> detection and things like that will
2153.28 -> really
2154.32 -> really
2156 -> minimize the amount of time that your
2157.68 -> application won't be able to serve
2158.96 -> traffic
2160.64 -> the other reason customers tend to use a
2162.56 -> lot of read replicas is to scale out the
2164.48 -> read traffic as i mentioned you can
2166.079 -> scale out the read traffic across your
2168.48 -> different
2169.52 -> um you know up to 15 different read
2171.119 -> replicas and
2172.4 -> balance the right the reply excuse me
2174.96 -> the read requests and responses based
2177.119 -> across that uh using the reader endpoint
2179.92 -> so that sort of kind of uh brings us to
2182.32 -> what
2183.44 -> how is how is traffic how do you work
2185.28 -> with the neptune how do you actually
2186.48 -> write read and write data to it when you
2188.88 -> work with neptune you're basically given
2190.56 -> or i should say when you create a
2191.68 -> neptune cluster you're giving two
2193.52 -> different endpoints you're giving a
2195.2 -> writer end point and a reader endpoint
2197.04 -> the writer endpoint also referred to as
2199.2 -> the cluster endpoint always points to
2201.2 -> the current writer always points to the
2202.72 -> point
2203.52 -> so the current primary instance uh this
2205.839 -> is the one you would use to do any sort
2207.28 -> of mutation queries on top of your data
2210.32 -> if you have read heavy queries as many
2212.48 -> applications do we also provide you a
2214.4 -> read endpoint
2216.4 -> and this read endpoint distributes those
2218.8 -> requests as they come into different
2220.72 -> read replicas so if you have three
2223.2 -> different read replicas this reader
2225.119 -> endpoint will round robin uh distribute
2228.48 -> requests across the i should say this
2232.24 -> will round robin uh for a specified
2234.48 -> period of time will will point to
2236 -> different things so your request will be
2237.44 -> distributed across your different
2239.04 -> instances that being said this reader
2241.44 -> endpoint is
2243.2 -> is time-based there's no fairness or
2245.119 -> round-robin guarantees but this is often
2247.599 -> sufficient for many customers uh one
2250.96 -> point to note here is if you're using
2252.8 -> gremlin with their websockets or using
2254.64 -> the oc bolt connections
2256.4 -> those
2257.28 -> connections once created are sticky so
2259.599 -> they'll stick to a specific read replica
2262.4 -> so something to be aware of
2264.32 -> that you may end up with hot spots in
2266.24 -> your applications if you're using web
2268.64 -> sockets or bolt
2270.96 -> customers also can build custom request
2273.44 -> distribution strategies we've seen
2275.28 -> customers we've had customers that have
2276.64 -> built them around fairness or to
2278.24 -> optimize cash locality um these have to
2280.88 -> be done at the application level if you
2282.48 -> want to do this sort of
2285.04 -> if you want to build these sorts of
2286.079 -> custom distribution strategies
2288.16 -> and you also need to be aware of
2289.76 -> potential failover scenarios and how you
2292.16 -> are going to handle that
2295.2 -> another key aspect of the neptune
2297.68 -> architecture is caching within neptune
2300.32 -> there's three different types of cache
2301.839 -> that are
2302.96 -> used
2303.92 -> the first being a buffer cache uh this
2306.079 -> is always on as soon as you spin up any
2308.24 -> instance this buffer cache is enabled
2310.24 -> and it's used to store pages of data of
2313.28 -> graph data from that shared storage
2315.2 -> layer in local memory this is really
2317.52 -> enables much faster queries by being
2320 -> able to pull frequently used data from
2322.8 -> the local instance memory as opposed to
2325.44 -> having to go across
2327.2 -> and fetch that from the actual storage
2329.119 -> layer itself as i said this is always on
2331.44 -> it's enabled by default
2333.44 -> there's nothing you as the user needs to
2334.96 -> do here
2335.92 -> one thing we i would recommend is
2338.8 -> to monitor there there's a cloud watch
2340.48 -> metrics called buffer cache hit ratio
2342.96 -> and you want to really want to make sure
2344.16 -> that's up in the high you know high 99
2346.72 -> you know high 90 percents uh
2349.839 -> and you can use that by monitoring that
2351.839 -> or you can figure that out by monitoring
2353.2 -> that metric inside cloudwatch
2355.68 -> the second type of cache we have is an
2357.76 -> optional cache it's a lookup cache this
2360.4 -> only works if you're uh this lookup
2362 -> cache i should say is only available
2364.16 -> if you have r5d instances because what
2366.64 -> it's doing is it's using the nvme
2369.52 -> based ssd
2371.119 -> data or sorry ssd disks to store
2374.079 -> property values and literals for use
2376.32 -> cases where you have to where you're
2378.079 -> frequently returning large numbers of
2380 -> these property values to the users this
2382.8 -> is something you can set up on a per
2384.72 -> instance basis and will automatically be
2387.52 -> used uh
2388.88 -> be used when uh to store that data that
2390.88 -> data will automatically be populated
2392.32 -> into those nvme
2394.079 -> disks as it's being used
2396.96 -> the second one is the query results
2398.8 -> cache the query results cache uses the
2400.64 -> instance memory uh it's also an optional
2402.72 -> opt-in cache and it basically it's
2405.44 -> exactly what you expect it to be it
2407.68 -> stores the query results so on a per
2410 -> query basis you can specify that you
2412.64 -> want to use this query results cache to
2415.76 -> store that data
2417.52 -> into uh to cache those query results for
2420.56 -> a specified amount of time
2422.48 -> this is really used for cases where you
2424.72 -> have maybe the same query is highly
2426.96 -> repeated but the data on the data being
2428.88 -> returned doesn't change
2431.04 -> um the other and probably most common
2433.52 -> use case we've seen with customers is
2434.88 -> for pagination if you want to be able to
2437.2 -> run a query one time that basically goes
2440 -> out pulls a lot of data orders it and
2442 -> returns the first 10 results and then
2443.68 -> you want to page the second 10 results
2445.2 -> in the 310 results um this sort of query
2448.4 -> results cache is really
2450.48 -> very helpful there because you will have
2452.4 -> to you know you run that query one time
2454.24 -> you can store those cash uh you can cash
2456.319 -> those results for for a ttl and the next
2459.52 -> time you go to pull the second page
2460.8 -> it'll be in uh very very quick mil you
2463.359 -> know a few milliseconds sort of response
2465.92 -> time to get that back because you're
2467.2 -> pulling it directly from this query
2468.56 -> results cache
2471.44 -> security is also all as as with all
2473.92 -> databases is a very key aspect of the
2476.64 -> neptune architecture
2478.64 -> neptune is built with network isolation
2481.119 -> it is a pro it is a vpc only database so
2484.319 -> any applications or there is no public
2486.48 -> endpoint so any applications
2488.48 -> or
2489.44 -> users that want to do it have to have
2490.88 -> access inside that vpc in order to talk
2493.28 -> to neptune
2495.04 -> data is encrypted with neptune um data
2497.44 -> is encrypted at rest using aws's kms you
2500.56 -> can either specify a specific key or use
2502.72 -> the default key for your application
2505.359 -> encryption is also handled in transit
2507.76 -> through ssl
2510.4 -> from a user perspective
2512.64 -> neptune is integrated with iam
2514.64 -> authentication so you can use iam
2516.56 -> policies in order to manage access into
2519.359 -> your neptune uh or into neptune the
2522.4 -> database
2523.44 -> any neptune cluster can be iam enabled
2525.92 -> so every request to that database where
2528.319 -> we require iam authentication order to
2531.04 -> make it happen and we we have a a
2534.16 -> fine-grained set of action-based access
2536 -> controls to to really granularly uh
2539.04 -> grant people access to different actions
2541.2 -> and to different data plane type actions
2543.2 -> inside the system
2545.839 -> as i mentioned uh neptune is a fully uh
2548.64 -> managed system so we have a variety of
2550.88 -> automated backup and restore
2552.24 -> functionalities
2553.76 -> daily automated backups uh will occur
2556.16 -> during a window that you can specify in
2557.839 -> which case uh the full storage volume
2559.599 -> snapshot is taken
2561.04 -> the retention period on these automated
2562.8 -> backups can be set uh configured by the
2565.599 -> the customer to be between 1 and 35 days
2568.72 -> you can also manually create snapshots
2571.2 -> of your data to back up an entire
2572.56 -> database instance uh manual snapshots
2575.44 -> can also be shared across accounts so if
2577.52 -> you have a use case where you need to be
2579.2 -> able to take a snapshot of your data and
2581.28 -> move it to maybe a prod account or from
2583.76 -> a prod account to a qa account to a test
2586.4 -> account things like that you can do that
2588.079 -> and they can be copied across regions
2590.319 -> um you have you know not only do we have
2592.16 -> the ability to take a snapshot but you
2593.52 -> can restore those database snapshots uh
2596 -> to a new database instance this allows
2597.839 -> you to also do things like if you want
2599.28 -> to change the parameter security groups
2601.119 -> around it to test out new settings you
2602.96 -> can do that
2604.079 -> and neptune also supports point in time
2605.92 -> restore so you can restore the database
2608.079 -> instance to any specific time down to a
2610.24 -> one second granularity
2612.88 -> um not listed here i mentioned i briefly
2614.72 -> earlier is also a fast clone capability
2616.56 -> which basically
2618.72 -> quickly allows you to make an entire
2620.4 -> clone of your database to be able to
2622 -> test out new and different features
2626.48 -> another key aspect is the ability to
2628.4 -> monitor your cluster as it's running
2630.319 -> neptune integrates with cloudwatch and
2632.16 -> cloudtrail to monitor to
2634.72 -> cloud watching cloudtrail services to
2636.24 -> basically
2637.359 -> log all all the api calls that neptune
2639.76 -> makes as well as look at different
2641.28 -> metrics around the database we want to
2643.119 -> look at the summary of the cpu and
2644.72 -> memory utilization of all of the
2646.24 -> instances in our system we can look at
2648.079 -> things like what is the query throughput
2650.079 -> what are the success and
2651.599 -> our error rates of that what is my read
2653.68 -> and write throughput how much storage is
2656 -> my system using
2657.92 -> there's also the uh there's an optional
2660.88 -> mechanism to
2662.48 -> enable audit logs audit logs will
2664.8 -> contain a very granular uh set of
2666.96 -> information about every query that's
2668.4 -> been uh or every query that's being
2670.56 -> processed by the thing or by the the
2672.96 -> engine it's going to have information
2675.2 -> like time stamps what are the servers
2677.28 -> and client hosts what what message was
2680.16 -> sent across and that's all going to be
2682.8 -> stored inside of your cloudwatch logs
2686.96 -> so it's a little bit about the
2687.839 -> architecture let's talk a little bit
2689.68 -> about some of the new features and
2691.599 -> functionality that neptune has and then
2693.599 -> we'll jump into a bit of a deep dive and
2695.52 -> demo on how that works
2698.72 -> one of the newest features we have for
2701.04 -> neptune is a python integration uh
2703.599 -> specifically kind of targeted and
2704.8 -> running some graph analytics and what
2706.079 -> this is is this is an open source python
2709.04 -> python integration
2710.56 -> that allows you to easily read and write
2712.8 -> data stored in neptune by removing the
2714.76 -> undifferentiated heavy work
2716.96 -> of taking of managing connections and
2720.079 -> and working with data or getting data to
2722.64 -> and from the format that graph query
2724.88 -> languages return to kind of something
2727.119 -> that's more usable for most uh customers
2729.599 -> in this case we're using
2731.44 -> pandas data frames is sort of the lingua
2733.68 -> franca between these two
2735.68 -> this library basically enables you to
2738.72 -> read and write data from neptune and to
2740.8 -> pull that data into pythons where you
2742.64 -> can use
2743.68 -> any of the popular open source tool
2745.68 -> python tools out there to do further
2747.44 -> analysis so maybe you wanted to run
2750 -> some sort of analytics or algorithm on
2751.839 -> top of it you could pull data down using
2753.52 -> this independence data frame and then
2755.359 -> use a network analysis library like
2757.119 -> igraph or networkx to run some sort of
2759.839 -> uh you know small scale analysis on top
2762 -> of your data this also comes with a set
2764.079 -> of sample application notebooks on how
2766 -> it's
2766.72 -> how you can actually use it
2770.079 -> the next feature i wanted to highlight
2771.52 -> here is was released in the spring of
2775.28 -> 2022 here
2776.96 -> and that is our support for open cypher
2779.28 -> so open cypher is a widely adopted open
2782.079 -> query language for property graphs it
2784.079 -> provides an intuitive way to work with
2785.76 -> property graphs by providing developers
2788.72 -> business analysts or data scientists a
2791.359 -> sql inspired syntax that has a familiar
2795.28 -> familiar structure that you can use to
2797.44 -> compose queries for graph applications
2800.4 -> customers this really enables customers
2802.8 -> that come from a relational background
2804.48 -> or familiar with sql to have a very sm a
2807.2 -> very a smooth on-ramp onto working with
2809.76 -> graph databases
2811.44 -> one one of the unique aspects of our uh
2814.56 -> integra our our support for opencypher
2817.119 -> is we have built uh this in such a way
2820.079 -> that you can load your data your
2821.44 -> property graph data into neptune once
2823.44 -> and then you could use open cipher or
2825.28 -> the gremlin query language on top of
2826.96 -> that same data this really allows
2829.2 -> customers that either are migrating from
2830.88 -> other systems
2832.48 -> or have data an application already in
2834.64 -> neptune to start getting uh able to use
2836.64 -> open cypher very easily
2838.64 -> we also also uh neptune also supports uh
2841.68 -> the open
2842.8 -> the open cipher bolt protocol this one
2845.92 -> basically allows customers that are
2847.44 -> running current workloads
2849.28 -> uh to migrate those workloads to neptune
2852.16 -> with a minimal amount of changes to
2854.24 -> their
2856.839 -> application the next feature i wanted to
2858.88 -> talk about here is neptune ml as i
2861.04 -> alluded to earlier neptune ml is an
2863.52 -> integration between amazon neptune and
2865.599 -> sagemaker to enable graph developers to
2869.2 -> make prediction our machine learning
2870.64 -> based predictions on graph data without
2872.4 -> having a ton of machine learning
2873.92 -> expertise it does this by automating a
2876.4 -> lot of the choices you would need to
2878 -> make in order to build your machine
2879.359 -> learning model being which model is the
2881.52 -> best one to use
2884 -> what training instant sizes i need what
2886 -> what processing instance sizes things
2887.599 -> like that this is all based on
2889.599 -> state-of-the-art machine learning uh
2891.52 -> techniques specifically using gnns or
2894 -> graph neural network based machine
2895.839 -> learning techniques and these have been
2897.839 -> shown in some some
2899.599 -> external studies to be up to 15 more
2902.559 -> accurate than some other uh
2905.359 -> machine learning based models uh machine
2907.119 -> learning based paradigms out there
2909.119 -> neptune ml is really built to scale to
2911.359 -> the large data sets that we have our
2913.839 -> applica our customers of neptune using
2915.52 -> today because you know when you're
2917.52 -> working with knowledge graphs or fraud
2918.96 -> detection or product recommendations
2920.24 -> you're talking about extremely large
2922.559 -> amounts of entities you're working at
2924 -> maybe up to billions of relationships
2926.96 -> neptune ml went
2928.64 -> ga a little about a year ago and
2931.44 -> recently we have added support for
2932.88 -> custom models which allows uh customers
2935.839 -> that have expertise already in gnns to
2938.24 -> build their own custom model
2939.52 -> implementations of python but still be
2941.68 -> able to use that with the rest of the
2943.119 -> neptune ml framework
2945.44 -> and we added support for sparkle so now
2947.52 -> you can whether you're using property
2949.2 -> graph or you're using rdf you can use
2952.319 -> neptune ml to enable machine learning
2954.8 -> based predictions on that graph data
2958.4 -> we also recently announced fine grain
2960.319 -> access control from for data plane
2962.079 -> actions uh this is really about
2965.04 -> allowing
2966.24 -> the customers to specify at a very
2968.559 -> granular level which sort of actions a
2971.68 -> specific iam role can take so this
2974.319 -> allows you to specify roles that are
2976.079 -> maybe only have access to read or only
2978.559 -> have access to write or only have access
2980.8 -> to trigger bulk loads things of that
2983.04 -> nature this really allows for kind of
2985.44 -> least privileged access
2987.28 -> allows you to set up least privileged
2988.559 -> access for applications that are using
2990.16 -> neptune to give them only the specific
2992.96 -> permissions that they need this is now
2994.96 -> the default when you turn on iam
2996.96 -> authentication as of the 1.2.0.0 release
3000.8 -> and it really allows you to create those
3003.04 -> separate policies to any data plane api
3008.24 -> another
3009.2 -> feature we recently added to simplify
3011.2 -> some of the operational headaches is the
3013.44 -> auto scaling read replicas as we talked
3015.92 -> about earlier neptune is an instance
3017.52 -> based database it allows you to scale
3019.76 -> read replicas up to 15 of them and auto
3022.48 -> scaling read replicas allows you to
3024.4 -> specify a minimum and a maximum capacity
3026.88 -> for that a scaling threshold based on
3029.44 -> some of the cloud watch metrics and then
3031.2 -> it will automate the scaling activities
3033.04 -> as your workload demands so as your
3035.2 -> workload ramps up it will scale out
3037.44 -> additional read replicas
3039.359 -> as your workload demand uh
3041.839 -> eases it will scale those back so it
3044 -> really helps automate someone it helps
3046.319 -> automate the process of doing that as
3049.28 -> well as provide you some cost
3050.8 -> optimization by not having to run
3052.48 -> additional read replicas when they're
3054.079 -> not needed
3056.48 -> another operational simplicity that
3058.72 -> feature that we recently released is one
3061.04 -> i'm very excited about and that's
3062.24 -> neptune global databases neptune global
3064.72 -> databases allows you to deploy neptune
3066.96 -> clusters across multiple aws regions for
3070.24 -> fast cross region disaster recovery or
3072.48 -> low latency queries the two reasons uh
3075.119 -> you know at its core this allows neptune
3077.599 -> global databases allows you to set up
3079.2 -> one primary cluster in up to five
3082 -> different
3083.04 -> secondary clusters that are read only in
3085.52 -> different aws regions um two main
3088.319 -> reasons which we've talked with
3090 -> customers are interested in this feature
3091.599 -> the first is disaster recovery in a
3093.92 -> scenario where uh you know you may have
3096.559 -> a regional outage customers want to be
3098.96 -> able to easily
3100.64 -> have their data
3102.48 -> actually easily maintain business
3103.839 -> continuity of their data
3106.16 -> when regions out by failing over to
3108.16 -> another region the second common use
3110.72 -> case we have uh talked with customers
3112.96 -> about is
3114.24 -> for global applications have been able
3116.8 -> to co-locate your data closer to where
3119.52 -> your uh user is to now enable lower
3123.2 -> latency reads as part of a kind of a
3125.119 -> global data distribution strategy
3128.96 -> this is as i mentioned this is this is a
3131.2 -> managed feature so there's you know
3133.44 -> it'll also would allow for things like
3135.359 -> fast cross region migrations uh and it's
3138 -> got a low replica lag between these
3140.079 -> these regions so the data is going to
3141.44 -> automatically be written into the
3142.96 -> primary region and then that will be
3144.72 -> automatically replicated out with a low
3146.8 -> latency to the other aws regions that
3148.96 -> you have clusters configured for
3154.079 -> beyond just uh you know feature or
3156.88 -> features new features and optimizations
3159.44 -> for operations we're actually have
3161.68 -> several new features related to cost
3164.16 -> optimization for users of their system
3167.04 -> first being our support for graviton 2.
3170.559 -> graviton 2 in our testing has shown to
3173.2 -> improve query latency and lower the cost
3175.2 -> compared to some x86 instance sizes
3179.68 -> so
3180.4 -> not only do you get faster queries you
3182 -> get to pay less for them so that's
3184.319 -> always a real benefit to customers and
3186.72 -> this is partially uh there's a lot of
3188.64 -> reasons behind this but part of this is
3190.079 -> also we inherit some of the benefits of
3192 -> the aws nitrate system for private
3194.72 -> networking and fast uh local storage
3198.079 -> as of today we support the t4g series uh
3201.44 -> for low or for low cost development and
3204.72 -> test type workloads we also support the
3206.96 -> r6 g-sys series and recently we added
3209.44 -> support for the x2 g series
3211.839 -> for any sorts of uh
3214.72 -> excuse me for any sorts of uh use cases
3217.52 -> where being able to have a lot of memory
3219.52 -> and a lot of buffer cache is very is
3222.16 -> beneficial
3225.2 -> as of april we also offer now offer a
3228.319 -> free trial offer for neptune
3231.359 -> this free trial offer does not limit the
3232.96 -> features of neptunes this means if
3234.48 -> you're in an organization that has not
3235.839 -> created a neptune cluster you can get
3237.599 -> started using this uh are trying out
3240.079 -> neptune for free and this means you can
3241.76 -> use any of our three query or any other
3243.52 -> two graph models or three query
3244.88 -> languages to do it for eligible
3246.64 -> customers you're going to get up to 750
3248.8 -> hours of t3 medium instances 10 million
3252 -> ios a gigabyte of storage a gigabyte of
3255.28 -> backup and this is going to be free for
3256.96 -> 30 days with no restrictions after that
3259.76 -> 30 days it's a pay it's going to reserve
3261.839 -> report revert to a pay-as-you-go model
3264.079 -> so you're only going to pay for the
3265.2 -> resources you consumed no up-front
3267.44 -> licensing costs
3269.76 -> so
3270.72 -> now that we're uh you know kind of walk
3272.559 -> through a bit about what net what graphs
3274.319 -> are what is amazon neptune some of the
3276.24 -> features of it
3277.44 -> let's do a deep dive and demo and take a
3279.28 -> look about how you can set up a neptune
3280.88 -> cluster and then why you might want to
3282.24 -> use this with some
3283.68 -> for some of the common use cases we
3285.119 -> discussed
3288.48 -> so what we're looking at here is we're
3290.559 -> looking at the neptune console what i'm
3292.799 -> going to walk through is you know if
3294.319 -> you're coming in here you're new to
3295.839 -> neptune and you want to create a
3296.88 -> database let's take a look at what you
3298.4 -> need to do in order to make that happen
3300.48 -> we're going to start here we're going to
3301.599 -> click on create database
3303.839 -> it's going to provide you a list of
3305.92 -> options you can specify that the version
3309.44 -> that you want to have
3311.2 -> of neptune especially if you're testing
3313.599 -> it out i almost always recommend you try
3315.52 -> out the newest version of the system you
3318.4 -> can specify a specific identifier for
3320.96 -> this in this case it defaults to a name
3323.44 -> of database one you also specify the
3325.839 -> template and the template
3327.359 -> uh here basically lets you choose or
3329.839 -> specifies the set of instance sizes and
3332.559 -> some basic configuration parameters that
3334.48 -> you have um when you select the dev and
3337.28 -> test instance you
3339.28 -> it provides you access to the t3 medium
3342.24 -> burstable class of instances these are
3344.48 -> really only good for development and
3346 -> testing type scenarios so if you want to
3347.599 -> create a production type cluster you
3349.68 -> want to select the production template
3351.04 -> which is what i'm going to do here
3352.96 -> um
3353.92 -> from there i can select
3355.68 -> the instant sizes i have
3357.839 -> as you can see we have quite a few
3359.28 -> instant sizes to choose from i'm just
3360.799 -> going to leave it at the default one
3362.48 -> right now of r6g extra large
3365.599 -> you can also uh in a production when you
3368.319 -> select the production template you also
3370 -> get the option to create a read replica
3372.559 -> and then different availabilities right
3374.079 -> out of the box out of the box
3377.2 -> you can specify if you have very
3379.04 -> specific vpcs you want it in you can
3380.799 -> specify that sort of things we also have
3382.96 -> the ability to create a notebook i
3385.359 -> didn't mention this until now but one of
3387.839 -> the features that comes with neptune is
3389.92 -> this construct of a neptune notebook a
3392.559 -> neptune notebook is a free open source
3394.88 -> package we'll see it here in just a
3396.96 -> moment and it runs on top of the jupiter
3400.64 -> web-based ide and provides the ability
3403.52 -> to run things uh you know to be able to
3405.839 -> interact with your cluster as sort of
3407.2 -> the ide for neptune
3410.24 -> um so we can automatically create one
3411.92 -> here uh i already have a few a database
3414.24 -> up but uh so i'm not going to do it here
3415.839 -> but if i clicked create database at this
3418.319 -> point what would go through is it would
3419.92 -> go in it would provision out that
3422.16 -> storage it would provision out the
3423.68 -> instances you had and it would spin all
3425.2 -> of this up and within a few minutes you
3427.359 -> would end up with a
3430.319 -> database that shows up here it would
3432.4 -> show up as available in this case as you
3434.16 -> can see i have two databases once uh
3436.799 -> sorry two clusters excuse me one called
3439.2 -> air routes one's called altimeter uh
3441.359 -> they both have a single writer instance
3443.44 -> and they're both currently available so
3445.599 -> let's let's jump in to one of these and
3448 -> start taking a look at what it looks
3449.44 -> like to actually work and interact with
3451.44 -> with neptune
3453.599 -> as i mentioned uh
3455.44 -> we're going to
3457.2 -> be using what is known as the neptune
3460.64 -> notebooks that this is a jupiter
3462.48 -> notebook so this is a free open source
3464.16 -> package
3468.559 -> if i come in here to notebooks you can
3470.24 -> see i have one notebook running i can
3472.96 -> open it from there
3474.4 -> um
3475.68 -> this ide
3477.68 -> is exactly uh this is the the ide we
3480.96 -> provide into neptune it's uh you can run
3483.119 -> this
3484.079 -> either as a hosted infrastructure piece
3486.16 -> of infrastructure or not as you can see
3488.48 -> when i click on this it pops up a hosted
3491.44 -> uh info a hosted jupyter notebook this
3493.76 -> is hosted as part of sagemaker notebooks
3496.559 -> you can either use this open source
3499.04 -> neptune notebooks package through these
3500.96 -> hosted instances as i am here or if you
3503.52 -> have your own jupyter server you can run
3505.359 -> and install this and as long as you have
3506.799 -> connectivity into
3508.64 -> the as long as you have connectivity
3510.72 -> into your neptune uh the vpc where
3513.44 -> neptune exists you'll be able to run
3515.44 -> these sorts of things
3516.88 -> um when we come in here we'll see
3518.88 -> there's a set of
3520.88 -> notebooks in here automatically we have
3522.559 -> ones around
3523.839 -> a few getting started notebooks that
3525.92 -> really go into talking about
3528.4 -> what it you know what is a little bit
3529.68 -> about neptune notebooks how you can use
3531.76 -> that to access the graphs a few
3533.839 -> different examples
3535.2 -> we have some that uh we have some
3537.599 -> example notebooks that go through the
3538.88 -> different types of visualizations that
3540.559 -> you can actually uh build and run with
3543.68 -> neptune notebooks
3545.2 -> uh we'll see those here in a moment we
3546.64 -> also have some sample applications
3548.16 -> specifically around fraud graphs
3549.599 -> knowledge graphs and identity graphs
3551.44 -> that enable you to be able to go in and
3554.319 -> give you a set of predefined uh actually
3557.359 -> gives a set of predefined data that you
3559.839 -> can load into your cluster as well as a
3561.76 -> set of predefined queries that you can
3563.28 -> start to see how you might go about
3564.96 -> building one of these sorts of use case
3566.64 -> applications
3569.04 -> and we have a set on machine learning if
3570.72 -> you're interested in using neptune ml
3572.72 -> you can come in here and look at the uh
3574.96 -> this xero4 machine learning and this
3576.72 -> will give you a very detailed
3578 -> walkthrough of the different features
3579.599 -> and functionality of that
3582.079 -> uh
3582.96 -> that feature of neptune
3584.72 -> for this example i'm going to jump into
3586.079 -> this notebook that i have
3587.68 -> called oc gremlin examples and let's
3590.48 -> take a little bit of a look at what some
3591.839 -> of the things we can do here um you know
3593.76 -> as i mentioned we have that query uh we
3596.079 -> have that status endpoint on on our
3597.839 -> cluster with neptune one of the things
3601.04 -> i should say with neptune notebooks
3603.2 -> it it provides a set of what are known
3605.04 -> in jupiter as magics anything that
3606.48 -> starts with this percent sign is one of
3608.24 -> those is a magic and it basically allows
3610.88 -> you to do a specific feature and
3612.24 -> functional a piece of functionality here
3614.559 -> in this case this status one is going
3616.88 -> out querying that endpoint that status
3618.88 -> endpoint and bringing back the data you
3620.48 -> can see the different types of
3621.44 -> information here you can see when my
3623.119 -> cluster was started what's the role of
3625.119 -> it
3625.76 -> what's the engine version i'm working on
3627.52 -> what versions of gremlin sparkle and
3629.2 -> opencypher supported
3631.2 -> any of the lap mode or beta features i'm
3634.559 -> looking at here any of the other
3636.079 -> features such as the result cache that i
3637.76 -> may have enabled
3641.28 -> beyond that we can also uh do things in
3643.92 -> our cluster like bulk load data so if
3645.839 -> you have
3646.96 -> data stored in an s3 bucket you can use
3649.76 -> neptune notebooks to kind of give you a
3652 -> a graphical walkthrough of being able to
3654.72 -> set up that the call that basically
3657.28 -> triggers that sort of thing i'm not
3659.119 -> going to do that here
3660.559 -> we also have a fast reset functionality
3662.799 -> so if you're in this stage of building
3665.44 -> your application where you need to be
3667.28 -> able to clear out your database quickly
3669.44 -> you can use the the fast reset or in
3672 -> this case you know through the db reset
3674.24 -> widget to be able to basically wipe all
3675.76 -> the data out of your cluster and start
3677.119 -> over from scratch
3679.44 -> i'm not going to do that here
3681.119 -> um because i have some data loaded in
3682.799 -> here uh the other
3684.64 -> the other command that's very useful
3686.24 -> here is as i mentioned we have some seed
3688.48 -> data sets some some data sets for very
3690.799 -> specific types of use cases
3693.04 -> uh that we provide out to the user
3695.92 -> uh through this notebook that you can
3697.44 -> use to to get started here and what
3699.599 -> we're going to use here is i've already
3701.119 -> loaded this data but one of these is a
3703.44 -> set of air house data so this is
3705.52 -> airports and flights between airports
3707.76 -> and we're going to use that to kind of
3708.96 -> walk through a little bit about what
3710.88 -> going and querying these graphs sort of
3713.04 -> look like
3714.16 -> so in this case we're working we're
3716.319 -> going to be working with property graphs
3719.28 -> and i'm going to be showing kind of
3721.2 -> talking a little bit about
3723.359 -> some of the basics of graph query
3724.799 -> languages and how you can use those to
3726.559 -> do things the first one we're looking at
3728.319 -> here is opencypher uh opencypher is as
3732.16 -> we mentioned has a sql inspired syntax
3734.24 -> so if you're looking at this you can
3735.76 -> probably right away
3738 -> even without me telling you anything
3739.599 -> probably have a reasonably good idea of
3741.599 -> what's going on here uh in this case
3744.64 -> uh site opencypher is based on
3747.119 -> kind of a pattern matching syntax uh
3749.119 -> anywhere you see the parentheses that
3752.24 -> represents a node uh we will see it here
3755.599 -> in a minute but anywhere that you see uh
3757.68 -> kind of an arrow and a line represents
3759.599 -> connections and you can use this to
3761.44 -> build more complex types of queries in
3764.799 -> this case it's kind of the most basic
3766.4 -> query you can look for i'm just going to
3768.24 -> be looking i want to match any nodes
3771.359 -> that node is going to be labeled n
3773.359 -> and i want to match any nodes where the
3775.039 -> code
3775.92 -> of the airport
3777.44 -> is anc for anchorage airport i can come
3780.48 -> in here
3781.599 -> i can run this uh you'll see it'll give
3783.92 -> me back basically the information i had
3786.16 -> i can view this in the json format
3788.72 -> that's natively returned from neptune
3791.76 -> i can also view this as a graph
3794.4 -> this is one of kind of the key features
3796.079 -> of of neptune notebooks is this ability
3798.48 -> to come in here and make this bigger we
3800.72 -> can come in here and look at this
3802.96 -> thing in this case it's not as
3804.48 -> interesting as it could uh would be in
3806.24 -> some other cases
3807.52 -> because it's just a single entity but i
3809.119 -> can come in here i can also look at all
3811.44 -> of the properties that were returned as
3813.039 -> associated here so we can see these are
3815.359 -> the uh you know this is that property
3817.52 -> graph
3818.72 -> model where you're looking at nodes the
3820.64 -> node in this case being an airport as
3822.64 -> well as the attributes associated with
3824.24 -> the properties associated with it
3830.799 -> um
3831.76 -> the other uh the other live query
3833.68 -> language we support for property graphs
3836.079 -> is gremlin uh as i mentioned uh as a
3839.92 -> compared to the sql inspired syntax of
3842.799 -> uh
3844.079 -> opencypher gremlin is more of a stream
3847.28 -> processing type language where data is
3850.079 -> pulled in from kind of one side of a
3851.92 -> step some aspect or pro or process is
3854.24 -> done on it and it sets on to the next
3856.24 -> one so in this case
3857.68 -> um i'm saying
3859.52 -> uh g is just a convention for kind of
3862.96 -> starting with a graph
3864.88 -> in this case you can sort of read this
3866.96 -> query as starting with the graph i want
3869.2 -> to find all of my vertices
3872.079 -> and then i want to filter this down to
3873.68 -> only having ones where the code for the
3876.799 -> property code matches asc anc excuse me
3880.16 -> for the anchorage airport and then
3881.92 -> element map basically says bring me back
3883.92 -> all of the properties associated with
3885.599 -> that um if i go in and i run this
3888.559 -> um we'll see i get back a very similar
3890.88 -> uh i get back the similar set of
3892.799 -> information uh the formatting is
3894.4 -> different between them um
3896.72 -> but i can also view this one as a graph
3898.96 -> as well um so this is the sort of pieces
3902.319 -> and functionality there um
3905.28 -> let's kind of you know that's the very
3907.359 -> basic uh way to use both the neptune
3909.359 -> notebooks as well as oc let's take a
3911.039 -> little bit uh you know a little bit of a
3913.28 -> more interesting look at some of the uh
3915.359 -> more complex things let's start using
3917.76 -> some of that pattern-based syntax so in
3919.92 -> this case i not only am i starting with
3922.16 -> at the location of the anchorage airport
3924.96 -> i have added this additional
3926.64 -> functionality with these dashes and
3928.88 -> these squares to basically say
3930.88 -> i want to now traverse out from this
3933.52 -> anchorage airport i want to traverse out
3935.28 -> any edges that are specified as a route
3938.319 -> to a destination airport so basically i
3940.559 -> want to find anywhere i can fly to or
3942.48 -> from anchorage
3944.16 -> i go and i run this we see that comes
3946.559 -> back very quickly
3949.28 -> and i'm able to graph this out in such a
3951.28 -> way that i can see
3953.039 -> of
3953.92 -> basically i i can see all of the
3956.079 -> airports that are connected into the
3957.76 -> anchorage airport itself
3959.599 -> um if i wanted yet again i could come in
3961.76 -> here and start going through and looking
3964.559 -> at different
3966.16 -> uh properties associated with these
3967.92 -> these items as well as properties
3969.68 -> associated with the edges as i mentioned
3971.28 -> earlier edges can have properties inside
3974.319 -> of the
3975.68 -> edges can have attributes representing
3977.359 -> properties of those edges in this case i
3979.52 -> have this property called dist which
3981.039 -> represents the distance that i need to
3982.799 -> fly in order to get there
3984.72 -> this is very powerful to be able to
3986.48 -> create these sorts of patterns
3988.64 -> that work to filter very efficiently and
3991.359 -> effectively
3994 -> i'm not going to go through all of the
3995.2 -> examples here with gremlin uh exactly
3997.119 -> what's going on but these are you know
3999.2 -> i'm able to run the same sort of queries
4001.359 -> in both opencypher gremlin
4003.52 -> get back very you know get back answers
4005.599 -> that are the same sort of answers
4008.079 -> um
4008.96 -> i'm going to jump next into
4011.44 -> when uh one of the things didn't mention
4013.039 -> about property graphs is when i'm
4014.88 -> running uh when i'm creating
4016.96 -> relationships inside of a property graph
4018.64 -> those relationships have directions
4020.4 -> associated with them
4022.48 -> there's a directionality aspect to it
4024.24 -> you have you know this relationship is
4026 -> going from some place to some place
4029.119 -> you can represent that inside of your
4031.76 -> graph query languages uh in the case of
4033.839 -> opencypher you use
4035.839 -> you represent this using arrowheads kind
4037.92 -> of pointing in the direction that you're
4039.599 -> looking for
4040.72 -> um so in this case as opposed to finding
4042.72 -> all of the area it's supposed to find
4044.96 -> all of the airports that i could fly to
4047.28 -> or from anchorage from uh i am now only
4050.4 -> finding those places i can fly from
4052.48 -> anchorage to so anchorage is going to be
4054 -> my start location and these are the
4056.24 -> airports i can fly to
4058.079 -> we can represent the same thing inside
4060.4 -> of gremlin um using a different set of
4063.28 -> steps that specifies the directionality
4065.359 -> here but
4066.799 -> the key piece to know here is you know
4068.559 -> edges inside graphs have directions
4071.2 -> and you can use that directionality as a
4073.2 -> filtering criteria
4075.119 -> you can sort of think about this if you
4076.48 -> want to think about it from the the
4078.16 -> construct of a
4080.799 -> like something like a social network if
4082.24 -> you're on twitter you might follow
4083.44 -> somebody but that person may not follow
4085.2 -> you back something like that so using
4087.76 -> those that directionality as an aspect
4089.359 -> of your data modeling is kind of a
4090.64 -> really strong feature of of graph
4093.359 -> databases
4094.64 -> um
4095.52 -> and you can keep extending this out you
4096.96 -> can extend this out in this case i'm not
4098.64 -> just finding everywhere i can fly to but
4100.64 -> i'm finding
4101.839 -> everywhere
4102.96 -> that i can fly to from anchorage that i
4105.04 -> can then fly to austin so basically
4107.6 -> i can want to find you know in this case
4109.359 -> two hops i'm specifying the exact number
4111.12 -> of hops so if you want to think yet
4112.96 -> again in a social network sort of
4114.319 -> segments this is like the friends of my
4116 -> friends sort of idea
4119.279 -> but when it really comes down to it
4121.199 -> these you know even these sorts of
4122.48 -> queries are ones that other databases
4124.08 -> can probably handle you can probably
4125.6 -> write a query inside of
4128.64 -> sql uh you've probably written similar
4130.64 -> queries where you'll be able to join
4131.839 -> tables together multiple times really
4134 -> the power of graph query language starts
4136.159 -> to happen when you have these variable
4138.319 -> length queries
4140 -> and these are represented naturally
4141.759 -> inside both gremlin oh and the open site
4144 -> for query language has really native
4146.319 -> constructs of the language to support
4147.759 -> this sort of thing so in this case i
4150 -> want to you know in this case my query
4151.92 -> is finding all of the flights from
4153.6 -> anchorage to sydney that are within four
4156.08 -> hops so this is where this um
4159.759 -> you know this variable length query
4161.52 -> syntax starts in in the case of
4163.279 -> opencypher you do this using this star
4166.52 -> 1.4 in this example and it's going to
4169.759 -> find me the i i put a limit on this
4172 -> because otherwise it's going to return
4173.199 -> quite a lot of data but this is going to
4175.44 -> you know
4176.319 -> you can start to see how you can use
4177.679 -> this to
4179.199 -> very quickly and efficiently
4181.12 -> find these sorts of variable length
4182.88 -> connections that in ways that if you
4184.799 -> were going to do this in other
4186.48 -> technologies you'd have to write like
4188.56 -> recursive functions or some sort of cte
4191.52 -> common table expression in order to be
4192.96 -> able to return it
4194.32 -> as i mentioned this is supported both
4195.92 -> natively in opencypher uh or inside of
4199.6 -> the gremlin query language here so
4202.48 -> that's a little bit of
4204.48 -> of kind of the
4206.239 -> the basics of graph query languages how
4208.8 -> you sort of write these let's let's jump
4210.8 -> right in and maybe take a moment to
4213.04 -> actually look at what this looks like
4214.32 -> for a real use case and in this case
4216.4 -> we're going to look at security graphs
4218.08 -> we're going to look at a security graph
4219.52 -> use case and why you might want to use a
4221.36 -> graph to do this
4222.96 -> and we're going to do this by graphing
4224.56 -> our aws resources
4226.4 -> you know we're all sitting here working
4228.48 -> on top of aws let's take a look at what
4230.88 -> our resources look like and what it
4232.719 -> looks like as a craft you know
4235.12 -> your your resources in aws kind of are
4237.36 -> naturally lend themselves to graph
4239.36 -> representations because there's a lot of
4241.84 -> connections between these items and
4243.36 -> these connections are really in many
4245.199 -> cases the keys things that you want to
4247.28 -> look at let's start by just taking a
4249.199 -> look let's see if we have any policies
4251.679 -> out there that are potentially insecure
4253.52 -> maybe these are anywhere where you have
4255.12 -> an administrator the administrator
4257.12 -> access policy
4258.4 -> or you have created any policies where
4261.12 -> you have a star is part of the the
4264.159 -> document text we can sit there and
4266.08 -> basically take a look very quickly and
4267.76 -> see that we have two of these policies
4269.44 -> out there we have an administrator
4270.719 -> access policy as does everybody also
4273.679 -> somebody has created this this policy
4275.6 -> called um opens3
4278.4 -> not sure why somebody decided that we
4279.92 -> should you know create an open s3 policy
4281.84 -> but we should probably investigate that
4283.28 -> a bit further
4284.56 -> and see exactly what how that's being
4287.04 -> used you know the first part here is
4288.8 -> let's let's take a roll let's take a
4290.48 -> moment and see if there are any roles
4291.92 -> using this because you might have a
4293.12 -> policy but if it's not being used may
4295.04 -> not be a big deal um if we take a look
4297.36 -> at this we can see that we have you know
4299.44 -> a few different uh roles be using this
4302 -> we have this role called altimeter
4303.36 -> that's using both this open s3 policy as
4305.6 -> well as the administrator policy so
4307.52 -> we're going to probably want to take a
4309.28 -> moment and dive a little bit more a
4311.52 -> little bit more deep a little deeper
4313.12 -> into some of these
4314.48 -> uh
4315.28 -> how these are being used to see if we
4316.719 -> have any potential security issues so
4319.28 -> you know let's let's jump to the next
4320.8 -> step and basically say okay we have
4322.4 -> these roles we have these access what
4325.76 -> resources inside my system are actually
4327.76 -> using it so what we can see
4329.76 -> is that there's an ec2 instance that's
4332.08 -> basic that is using this opens three
4335.04 -> role we can also come in here and see
4337.28 -> that there's another ec2 instance that's
4339.84 -> using this i need admin access role
4342.8 -> probably you know do they really need
4344.56 -> admin access i don't know we probably
4345.92 -> need to do a bit further investigation
4348.159 -> into that as well and take a look at
4350.159 -> what's going on there
4353.44 -> so
4354.64 -> what all connects to these resources as
4356.239 -> we can see we can start to we're
4357.6 -> starting to build out kind of a a much
4360.88 -> uh more robust query to basically find
4364.56 -> everything all these policies now we're
4366.4 -> finding roles now we're looking at all
4368.32 -> of the things that are connected to this
4369.76 -> and we can kind of see that we're really
4371.199 -> building upon this query in an iterative
4373.12 -> nature
4374.159 -> um and we're using that to start to see
4376.239 -> more and more information data about our
4378.4 -> system
4379.44 -> based on the connections and the
4380.719 -> connectivity of these systems we can
4383.12 -> start to see there's a lot of things
4384.48 -> connected to this there's a lot of
4385.92 -> different vpns and internet gateways uh
4389.04 -> looks like there's a database uh
4390.88 -> connected to this over here so this
4392.96 -> fact that we have these very permissive
4395.12 -> policies is something we probably want
4397.199 -> to take a look at uh from a security
4399.199 -> perspective inside of our own system
4401.199 -> here
4402.32 -> um
4403.36 -> you can then take this and build it out
4405.76 -> further and further in this uh
4407.84 -> and as you're doing this you can also
4409.44 -> filter it down to being able to only
4411.76 -> show things that are more interesting uh
4414.239 -> to the specific question trying to
4415.52 -> answer you know for example
4417.44 -> i don't necessarily need to look at the
4419.84 -> vpcs associated with this because i'm
4422.239 -> really more interested in what ec2
4424 -> instances are using these very
4426.239 -> permissive
4427.679 -> policies and roles inside of my aws
4430 -> architecture
4431.52 -> um
4434.32 -> and then finally i want to be able to
4436.56 -> you know potentially look at potential
4438.88 -> security of threat vectors associated
4440.88 -> with this i want to see not just that
4442.159 -> these ec2
4443.44 -> our instances are using these rules but
4445.28 -> are any of these actually exposed to the
4446.88 -> internet and in this case as we can kind
4449.36 -> of see both of these are actually
4451.36 -> exposed to the internet as well so we
4453.6 -> would definitely want to go
4455.12 -> look at locking these sorts of things
4456.8 -> down
4458.719 -> to be able to find you know to be able
4460.48 -> to make sure that we're securing our
4461.92 -> infrastructure appropriately
4463.92 -> so this is just kind of a quick
4466.48 -> use case a quick example of how you
4468.4 -> might want to use uh you know a real
4471.28 -> world type of use case for a security
4473.679 -> graph on top of
4475.52 -> the amazon neptune graph database
4479.04 -> if you're interested in trying out that
4481.12 -> demo yourself what you uh we have a
4483.76 -> couple of resources for you to start
4485.36 -> with
4486.56 -> the tool that i use to map out all of my
4489.44 -> aws infrastructure is an open source
4491.6 -> project from tableau called uh called
4493.679 -> altimeter excuse me
4495.28 -> in it under the cover what it does is it
4497.52 -> goes out uh with sufficient permissions
4500.08 -> it goes and reads the different
4501.92 -> configurations of your aws resources and
4504.08 -> generates a graph which can be stored in
4506.48 -> amazon neptune
4508 -> if you're interested in how to do this
4509.28 -> uh the second link i have there actually
4511.36 -> is a blog post uh on how to exactly do
4514.8 -> that on top of neptune how to install it
4517.12 -> correctly how to make it uh and then how
4519.12 -> to work it with it inside of
4522.4 -> neptune
4525.92 -> so as we're kind of wrapping up here i
4527.44 -> wanted to give you a few additional
4528.8 -> resources on how you can kind of get
4530.56 -> started uh with using neptune um
4534.08 -> first and and the first link you see up
4536.719 -> here is the link to the neptune
4538.32 -> notebooks or the graph notebook project
4539.92 -> the one that i used for this demo that
4541.76 -> provides that ide type application in or
4544.88 -> the ide uh interface for neptune
4548 -> if you're interested in how you can use
4550.239 -> neptune with some reference
4551.44 -> architectures are for it you can follow
4553.04 -> the second link here
4554.56 -> we also have a set of full stack or a
4556.719 -> set of applications from full stack
4558.56 -> applications to partial applications
4560.56 -> sample applications that the third link
4562.88 -> here will show
4564.719 -> and lastly if you're just interested in
4566.4 -> how other customers are using this
4568.64 -> what blogs we have code samples videos
4571.28 -> you can use uh the last link here will
4573.44 -> take you to that those areas
4575.76 -> specifically uh for amazon neptune
4580.719 -> thank you for uh listening to me today
4583.04 -> uh once again my name is dave beckberger
4585.28 -> and you can find me on twitter at
4587.48 -> bachbd
4589.36 -> thank you
4599.12 -> you

Source: https://www.youtube.com/watch?v=Y6jbFC8tvVw