AWS re:Invent 2021 - AWS Networking: Making all workloads possible
AWS re:Invent 2021 - AWS Networking: Making all workloads possible
AWS Networking helps enable you to run any kind of workload in the cloud. In this session, join Dave Brown, VP of Amazon EC2 Compute and Networking Services, to learn how this is possible. Dave reviews the progress AWS made in the last year across networking and content delivery solutions, which are designed to be the most secure, have the highest network availability, deliver consistent high performance, and have the broadest global coverage. Dave also discusses new capabilities and how AWS customers are using our networking services to build on the AWS comprehensive network for all workloads.
ABOUT AWS Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.
AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.
#AWS #AmazonWebServices #CloudComputing
Content
1.668 -> [music playing]
5.372 -> Please welcome Vice President
Amazon EC2, AWS, Dave Brown.
21.088 -> Well, hi, all.
22.122 -> And thanks very much
for joining me here today.
24.758 -> It's great to be back after
nearly two years of being at home
28.629 -> and having re:Invent
virtual last year.
30.964 -> I'm thoroughly enjoying the time
32.466 -> that I'm getting to spend
with customers,
34.334 -> learning about what they're doing
35.569 -> and what they've been
building with AWS.
40.274 -> We've come through a lot
in the last year
42.476 -> and the innovation in
the networking space continues.
45.846 -> We continue to innovate
for our customers
47.381 -> to ensure that they can
build networks on AWS
50.851 -> focused on a couple of key areas.
53.62 -> First, it's being able
to support from the largest
56.023 -> and most scalable
global network,
58.325 -> Secondly, we want to take a look
at how do we make sure
60.894 -> that we give you the best network
performance at all times.
64.565 -> Obviously in today's world
the security
66.834 -> is something
we're all thinking about
68.101 -> and I'll take a look
at some of the network security
69.736 -> that we've been doing as well.
71.905 -> Fourth, we'll look at network
for every single workload
74.141 -> and finally we'll look at what are we
doing to bring AWS and AWS Networking
78.979 -> closer to you,
wherever you may be in the world.
81.548 -> That's quite a bit to get going,
83.15 -> we're going to be jampacked
for the next 60 minutes or so
85.385 -> and I'm looking forward to it.
86.62 -> So, let's get started
with that largest
88.689 -> and most scalable global network.
92.492 -> You know, we've seen customers
from all segments using AWS.
96.763 -> I was fortunate enough to join EC2
in about 2007 in our Cape Town office
102.002 -> and it was incredibly early days
and back then you really needed,
105.973 -> you know, to have a PhD degree to do
anything really useful with EC2.
110.01 -> It was very simple,
we had a very simple network.
112.98 -> But today, you know,
across all sectors,
115.148 -> we've seen incredible adoption.
116.817 -> In the cyberspace we have start-ups
Pinterest and Redfin
120.053 -> enterprises like General Electric,
Intuit and Pfizer.
123.423 -> In the public sector
we have customers
124.925 -> like The American Heart Association,
The FINRA
127.961 -> and the USDA
and software providers
130.264 -> and partners like Dedalus,
Adobe and Accenture,
132.966 -> and it's a real privilege to get
to work with all of these customers
135.903 -> and no matter what they are trying
to do at incredible scale.
139.773 -> I wanted to highlight two
very quickly,
141.675 -> the first of them is Slack.
143.544 -> I'm sure most of you have used Slack
145.546 -> or are using Slack
on a day-by-day basis,
148.182 -> and I've had the privilege
of working with the Slack team
150.851 -> as they've gone through
their network journey on AWS.
154.321 -> Slack started out from a simple,
single VPC
157.724 -> and today the run hundreds of VPCs
if not thousands of VPCs
161.495 -> across regions all over the world.
163.664 -> They use Transit Gateway heavily
for that architecture
166.667 -> and every single Slack message
168.235 -> that you send travels
through a Transit Gateway
170.771 -> somewhere within an AWS region.
173.674 -> The second customer
we'll look at his Twilio.
177.744 -> Twilio has been native on AWS
179.613 -> almost right from the beginning
of their journey
181.815 -> and they actually provide
programmatic access for developers
184.785 -> to make phone calls and send
and receive text messages
188.155 -> and to perform other
communication functions.
191.158 -> Obviously in the last year
with call center
193.76 -> needs just exploding the demand
for Twilio services really expanded
198.899 -> and very often developers are
making calls that span the globe –
201.869 -> your customer
could really be anywhere.
203.837 -> Now, we've been privileged to work
with Twilio
205.405 -> to provide them
with the low latency access
208.242 -> to the network no matter
where they may be.
210.544 -> When you make a call on Twilio today,
it actually travels our AWS
214.181 -> Global Backbone to get to
the location that it needs to be at.
219.286 -> That's where I want to start,
220.487 -> is looking at our AWS
global infrastructure.
223.857 -> when I joined EC2 there was
really just a single region
226.927 -> and that was our US-East-1 region,
228.762 -> but we've expanded that a lot
in the last few years.
231.698 -> In the first five years of EC2
we added four regions.
235.502 -> We thought that was
pretty impressive.
237.704 -> In the next five years we added
seven,
240.274 -> and in the last five years
or so we've added another 14 regions
243.744 -> and we have another nine regions
ready for launch
246.18 -> that we've also already
proactively announced.
248.582 -> All of this… we also,
along with that,
250.884 -> have 275 CloudFront PoPs,
points of presence,
255.322 -> where you can get access
to our CloudFront CDN
257.791 -> in multiple countries
around the world.
259.86 -> We also have over 100 direct
connect locations
263.697 -> where you can bring in low latency
MPLS-Like Network directly into AWS.
269.937 -> And, finally,
all of these locations are connected
272.973 -> with the AWS Global Backbone.
276.276 -> Every line that you see
on that screen
277.978 -> is actually
a piece of optical fiber –
280.113 -> very often 100
Gigabit optical fiber –
282.316 -> and many, many strands that are
actually owned and managed by AWS,
287.354 -> and this spans the globe.
288.689 -> It's an incredible undertaking
290.39 -> and the growth of that global network
has really been amazing.
295.095 -> Now we have to think
several years out.
297.497 -> Often when you think about the cloud
299.299 -> you think about this idea
of the illusion of infinite capacity,
302.769 -> that's something I probably say
to my teams
304.371 -> once a week where we say
"we have to provide our customers
307.474 -> with this illusion that we always
have capacity for them."
310.577 -> It is an illusion and it takes
an enormous amount of work
313.413 -> behind the scenes
to make that happen.
315.549 -> I often think about that
in terms of EC2 instances
317.951 -> but is also important
in the AWS Global Backbone.
321.755 -> Here you can see a couple
of locations
323.457 -> and those lines that are highlighted
are actually trans-oceanic fiber
328.328 -> and cables that we put in place
very often as part of a consortium.
332.199 -> We put the Hawaiki cable in
from New Zealand, through Sydney
334.902 -> and it actually terminates up
in Portland, Oregon.
337.571 -> We did the Jupiter cable that started
in Singapore terminated in Hong Kong
341.608 -> and then crossed the Pacific
also to Los Angeles that one was,
345.479 -> and the Maria cable which runs on
the East Coast of the US to Europe.
349.55 -> And when we do those, we're actually
looking a number of years ahead
352.819 -> because we've got to make sure
that we always have enough
355.422 -> backbone capacity
to carry your workload.
358.425 -> Very often customers say to me
359.893 -> "you don't tell us enough
about the AWS Global Backbone"
362.763 -> and the reality is I don't really
want to be talking to you
364.865 -> about it cause it should just work.
366.667 -> You should never have to worry
368.001 -> if we have enough capacity
to carry your workload.
371.238 -> You know this literally is a ship
that goes out to sea
374.308 -> with the cable behind it
375.809 -> and in the case of the Hawaii cable
this is actually
379.046 -> 9000 miles long starting in Japan
and terminating in Portland
383.116 -> and here you can actually see this
is the coast of Japan
385.385 -> where they're actually putting
this cable off the beach –
388.021 -> you can see how they dug up
the sand there and obviously clear
389.923 -> that so at some point
you don't have any idea
392.059 -> that a piece of optical fiber,
393.727 -> or very large cable with many strands
of optical fiber, came from there.
397.264 -> So a lot of work that goes into that.
399.933 -> Our AWS regions,
you've heard us speak a lot
401.735 -> about this over the years,
we have never lowered the standard
405.405 -> for what it means to have
a highly available AWS region.
409.209 -> Every one of our regions consists
of multiple availability zones,
412.312 -> at least three, and in many cases
we actually have regions
414.915 -> with many more
than three availability zones.
417.818 -> Those availability zones, and very
often a single availability zone,
421.154 -> actually consists
of multiple datacenters.
424.057 -> And we think about the space
between these datacenters,
426.293 -> they're sort of the Goldilocks zone
as we call it.
428.629 -> We want those datacenters
to be far enough apart
431.465 -> that they won't fail
for the same reason at the same time,
434.201 -> but not so far apart
that the network latency
437.137 -> actually exceeds about one
to two milliseconds.
439.973 -> So customers are able to use multiple
availability zones
442.342 -> for high availability without
having to worry about the latency,
445.579 -> that they've seem and how
that would affect their application.
449.116 -> The network behind these availability
zones is critically important
452.986 -> and we actually have multiple
redundant pairs of fiber
457.591 -> that run between all of these
datacenters, and transit centers,
461.161 -> to ensure that there's always
multiple paths
463.964 -> between any of these locations.
466.2 -> We have to obviously assume that
failure could happen at any time
470.27 -> and it's always a little amusing
to see
471.939 -> what actually caused
a piece of optical fiber to break.
474.775 -> I can't tell you how many times
I've seen backhoes
476.844 -> digging up
fiber all around the world.
479.313 -> I even had a dumpster fire in Brazil
481.081 -> burn some optical fiber
at the top of a telephone pole.
484.451 -> And so it's always interesting
but we have to plan for it
487.855 -> and you'll see no customer
impact from fiber cuts
491.425 -> because we've planned
those alternative paths
493.493 -> and we really think about
the network convergence time.
495.896 -> We actually do that failover
at the optical level
498.832 -> so it's really just a handful
of packets that may be dropped
501.668 -> when a piece
of optical fiber is broken.
506.106 -> You know this year we actually just
celebrated 15 years of Amazon EC2.
511.845 -> It's an incredible milestone,
513.213 -> we've come in incredible journey
we've walked in that time,
516.583 -> and so I want to go back,
in this talk today,
519.119 -> and look at some of the things
that we learnt
521.522 -> over those years in the network
and the journey that we went on.
525.692 -> Firstly, as I said, EC2 launched
on August the 26th 2006.
530.364 -> We had no idea what we were building,
532.466 -> we certainly didn't think it would
become what it has become today.
536.103 -> Our network at this stage was
actually just a single flat subnet
540.24 -> and all of the instances
actually only had public
543.177 -> IPs, there was no NATing,
there was no private IP address,
547.08 -> we said "here's an instance
on the internet",
550.384 -> that was our call to Direct
Addressing.
554.021 -> That worked pretty well
for the first couple of months.
557.357 -> We ended up adding private
addressing,
559.059 -> which was a private IP address,
we didn't have elastic
561.328 -> IPs at that stage in 2007,
and I actually remember the day
565.399 -> that I interviewed at Amazon
in Cape Town,
568.535 -> I walked into the office
and one of the engineers –
571.138 -> somebody said hello to him and said
"how are you doing?"
572.94 -> and he said "oh,
I've been up all night
575.876 -> fighting with that network device"
and what had happened is,
579.546 -> a little bit less
than about six months,
581.381 -> less than a year after launching EC2
583.951 -> we started to see network devices
at the edge of our network
587.387 -> starting to struggle.
588.555 -> And what they were struggling with
was not necessarily
591.024 -> the data planeload,
592.492 -> they were struggling with
the control planeload.
595.395 -> Within a traditional datacenter
your networking team
598.131 -> is making a handful of changes
600.2 -> maybe every week or everyday
normally done via command line.
603.904 -> In the cloud environment
we provide API access
607.074 -> and we were launching instances
every few seconds
610.077 -> and having to reprogram
these devices with the NAT rules
612.579 -> that they needed at the time,
614.014 -> those devices just could
not get keep up.
616.283 -> The control plane on there
just hadn't been designed for it.
619.753 -> And so we knew that we had to think
differently about
622.689 -> how we were building these
network devices and deploying them.
626.193 -> And so that kicked off a project,
in early 2008
629.93 -> we actually added elastic
631.164 -> IPs, which I'm sure many of you know
and love today
633.934 -> the ability to attach the public
IP that you own to an EC2 instance.
637.838 -> The way we do that is actually just
639.273 -> NAT that IPS to your private
IP internally.
643.41 -> We knew we had to build custom
hardware,
645.646 -> we could see the rate of change
that elastic
647.614 -> IPs brought, there was no way
that standard devices out there,
650.651 -> network devices,
651.785 -> could keep up with that rate
of change that the cloud had brought.
654.755 -> And so we actually designed our
very first custom network device
658.792 -> which we call Blackfoot, named after
a specific species of Penguin in Cape
663.497 -> Town, obviously aligned with Linux
as well in the Linux logo.
667.267 -> And it was a Linux server
basically that just did basic
669.87 -> NAT translation with a number of
networking cards.
674.908 -> You know, Blackfoot turned out to be
a great success
678.078 -> and one of the things that proved
that – there were ready two things –
681.048 -> it really simplified the approach.
683.083 -> We said this device doesn't need
to do all of the other stuff
686.153 -> that the alternative could do,
687.788 -> all it needs to do is
NAT translation.
690.09 -> And we actually were able to achieve
times of less than one millisecond,
692.86 -> and back in 2009
less than a millisecond for
695.629 -> NAT translation
that was pretty amazing
697.764 -> and actually better
than anything else out there.
700.3 -> And the way I know it was
really successful is we still use
703.103 -> Blackfoot today for every
single packet that comes into EC2
707.841 -> and other parts of AWS as well.
709.977 -> And it was so successful
they even named a building after it.
713.18 -> And so the main building for AWS
is actually called Blackfoot today.
719.286 -> In 2010 we realized that our custom
hardware was obviously the way to go
723.023 -> and we had started to see
other challenges with network
726.493 -> scalability in other places,
727.761 -> and the one we started to look at
then was our top-of-rack switches.
731.798 -> Every single EC2 rack
within our datacenters
733.834 -> obviously contains
a number of servers,
735.769 -> but is connected to the rest of the
network using the top-of-rack switch.
739.84 -> And so we actually formed
one of our famous two pizza teams.
743.443 -> Now in South Africa the pizzas
are a lot smaller
745.379 -> so those teams
tend to be fairly small.
747.748 -> Yeah, that's about 8 to 12 people
with the American sized pizzas.
751.385 -> But we had a small team and we gave
them about 10 months
753.82 -> to actually go and build
a new top-of-rack switch from scratch
757.925 -> – and they were able
to deliver something.
759.66 -> I mean there are a couple of things,
you know, that really allowed us
761.828 -> to innovate rapidly
in the custom hardware,
765.232 -> custom network hardware space,
766.433 -> so some things we really
hold true today.
768.769 -> The first thing is we use
an incredibly simple design.
771.705 -> You know, we want to make sure that
these devices
774.107 -> do what we need them to do
and absolutely nothing else.
777.477 -> We don't want any complexity,
we don't want any code paths
780.514 -> that we don't use frequently,
We keep them incredibly simple.
784.251 -> We also design for high availability.
787.487 -> We want to make sure that not only
are the devices themselves
790.424 -> highly available with very,
very low annual failure rates
793.427 -> – and we've been able
to achieve numbers there
795.095 -> that we didn't think
would be possible.
796.997 -> But we also want to make sure that
when we actually make changes
798.866 -> to these devices that we do that
in a very consistent way.
802.669 -> One of the things we have to think
about in the datacenters
804.404 -> is network convergence,
806.039 -> if I'm rebooting a device my network
is going to converge and customers
809.176 -> are going to see periods
of connectivity issue.
811.445 -> That's not okay on AWS
and we don't have that today.
814.414 -> The way we get around that is every
single time
816.617 -> we make change to a device,
818.151 -> we refresh the operating
system completely –
820.554 -> before that we've taken it
out of service – we refresh it
823.19 -> and we put it back into service.
824.858 -> It's exactly the same workflow
as if I've added a new device
827.394 -> to the network and that eliminates
the problem of convergence times.
831.632 -> Also because we build all this
hardware ourselves
833.734 -> from custom point of view,
835.169 -> we have complete control of both
the hardware and the software
838.172 -> and that allowed us to lower costs
in the network,
841.375 -> improves our security
and improved our reliability.
844.344 -> When an issue happens I'm able
to see the code and fix it myself
847.548 -> and obviously given us
a big performance boost.
851.919 -> There are a number of things we do
as well that are kind of funny
854.555 -> and just very practical.
856.456 -> Obviously over the years
857.658 -> we've built many different
versions of this hardware
860.727 -> and when a top-of-rack switch fails,
a Data Tech has to go to a store,
865.132 -> check out a new top-of-rack switch
and go and replace it.
868.268 -> And one of the problems we had
was Data Techs would often go
871.004 -> and choose the wrong switch because
they all looked kind of similar
873.106 -> and we said
"well these aren't really devices
875.209 -> we're trying to sell to anybody,
876.777 -> so let's just make them
all a different color."
878.979 -> So if you go into
our data centers today
880.848 -> we have red ones and green ones
and blue ones and orange ones.
883.917 -> But when we actually have these made,
we get sent a Pantone color palette
887.955 -> by the manufacturer that's going
to bend the sheet metal for us.
891.325 -> And we say… we go and pick the color
and we send it back to them
893.493 -> and they've actually come back
to us to said
895.229 -> "here's a sample piece
of sheet metal,
898.465 -> is this okay with your design?"
900.4 -> And when we see it
we can't really tell,
902.436 -> is that color called magical blue.
905.405 -> If the color is called magical blue
we're good to go.
907.908 -> If it's not, well,
we're going to have to rework it.
909.476 -> So we were picking the names,
not the colors.
911.712 -> It works very well for us.
914.615 -> So our first top-of-rack switch
could do 10 gigabytes per second.
919.386 -> We very quickly moved on to be
able to support 100 gigabytes
921.889 -> and today we can support
400 gigabytes at our top-of-racks.
926.026 -> In one of our latest instances
and the network that we have,
929.363 -> we can support 460 terabits
of networking capacity
933.166 -> in a one-way latency
of 12 microseconds.
935.068 -> That was back in 2013.
937.437 -> In our innovation
of the top-of-rack switch
938.906 -> we've actually increased
that now to be able to do
941.208 -> 10,000 terabytes per second
or ten petabits per second
944.945 -> and latency is all the way down
to about 7 microseconds.
947.948 -> That's almost half the latency
that we had in 2013.
952.519 -> It's incredible progress
that we've made.
955.989 -> Today we monitor over 11 trillion
events across the network.
961.695 -> With millions of network devices
spread across all of our regions
964.998 -> we have to make sure that
no device causes customer impact
968.869 -> and that includes hard failures
where a device is clearly down,
972.806 -> but also the infamous grey failure
974.675 -> where it might be doing
something to the packets
976.376 -> that's incredibly
difficult to monitor.
978.545 -> The way we do this is all of
our monitoring is end to end.
982.149 -> We do have a lot of monitoring
obviously at the device level
984.151 -> but really making sure that packets
can traverse from the source
987.654 -> to their destination with no problems
990.157 -> is what we're looking at
in 11 trillion different
992.86 -> monitoring metrics coming out
across AWS network on a daily basis,
997.664 -> all monitored
at the second granularity.
1001.869 -> You know, it's this investment
in building at a global scale
1006.24 -> and building a network that can scale
that really allowed us
1009.977 -> to be able to handle
what we saw during Covid-19.
1014.314 -> In a matter of days we saw
our network traffic increase by 400%
1019.386 -> when Italy was locked down
for Covid-19.
1023.156 -> There's no way that any planning
over a few days
1025.826 -> could have solved that problem,
1027.194 -> you had to be ready
for that sort of scale
1029.763 -> and planned many, many months
if not years in advance.
1033.233 -> We also had customers such as Zoom
and Peloton and Robin Hood
1037.104 -> that have seen tremendous growth
either due to the pandemic
1040.607 -> or due to an IPO
that has really allowed us,
1043.01 -> from a networking scaling
point of view,
1044.978 -> to be able to support them
and it's been exciting to see.
1049.816 -> Next I'd like to take a look…
if we can just go to the next slide.
1057.791 -> Okay, let's go into the next one
which is our highest performance.
1062.496 -> You know, we've invested
an enormous amount over the years.
1066.466 -> I remember when we released
our very first EC2 instance,
1069.036 -> you know, you would see
latencies of 200 to 300
1071.505 -> milliseconds within the network
1073.707 -> and that was the virtualization
engine –
1075.909 -> obviously the hypervisor was adding
a whole lot of latency
1078.378 -> with Zen back in the day –
and very quickly we realized
1081.715 -> that for us to really get the scale
and the performance that we needed,
1084.952 -> we would have to think
very differently
1086.486 -> about how we build our servers.
1088.622 -> And we started a journey
in rethinking hypervisors
1092.125 -> to really lower the latency
1094.595 -> and also the jitter to make sure
that the latency was low
1097.531 -> and consistently low
for our customers,
1100.701 -> and the network was the first place
we started in this journey.
1104.104 -> In about 2013 we actually
shipped our first instance
1107.875 -> that we call Network Optimized,
it was our C3 instance,
1110.978 -> and it actually used
a network offload card,
1112.88 -> which was the first Nitro card
with an on-based chip, on that card,
1116.884 -> where we actually did all of
the network processing on that card
1121.021 -> instead of using the CPU.
1123.457 -> And over the years we've actually
transferred all of our computing
1127.194 -> that we need to do
as a cloud provider,
1129.062 -> whether that's the security that we
need a process on every packet,
1131.698 -> whether that's the networking,
whether that's the storage,
1134.368 -> whereas all the management,
the billing,
1135.836 -> everything that happens
behind the scenes we've taken 100%
1139.072 -> of that away from the central
Intel AMD or Graviton processor.
1143.644 -> And so, today, we're the only card
provider that's actually
1145.779 -> able to give you 100% of that CPU
and 100% of the system memory
1151.418 -> and 100% of the storage
because we run in a separate system.
1155.956 -> Now you've actually seen
this innovation that we've done
1157.925 -> with Nitro
in the instance level bandwidth.
1160.494 -> How much bandwidth are we providing
at a single instance level.
1164.031 -> I mean back in the day when we
launched our first instance
1166.033 -> we actually provided
1 Gigabit of networking,
1168.168 -> which is pretty amazing back then,
1169.903 -> and you can see
how the baseline networking
1171.572 -> has increased over the years.
1173.373 -> As I said 2013 was our first instance
that actually was network optimized,
1178.712 -> 2019 we increased our baseline
to 25 Gigabits per second and in 2021
1183.784 -> we hit 50 Gigabits
per second on our latest Ice Lake
1187.955 -> and Milan processes,
our sixth series generation.
1193.327 -> We also have network optimized
instances where we've actually…
1196.163 -> we're the first cloud provider
to provide
1198.031 -> 100 Gigabits
of Ethernet connectivity.
1201.235 -> We had some instances that provided
1202.703 -> 400 Gigabits of Ethernet
connectivity and yesterday
1206.073 -> at re:Invent who launched
our first instance,
1208.175 -> the Trn1 instance used
for machine learning training.
1211.378 -> It actually provides 800 Gigabits
of networking –
1215.983 -> and we won't be
stopping there by the way.
1221.088 -> Customers have sometimes asked us,
1222.589 -> I actually saw a tweet
just three days ago
1225.158 -> that said
"can I get more than five Gigabits
1227.995 -> of network connectivity
to the Internet from Amazon EC2"
1231.698 -> and the answer, today,
is absolutely yes.
1234.401 -> And we've increased the outgoing
bandwidth from EC2 instances
1237.571 -> to 50 Gigabits per second
on our latest instance types
1241.441 -> and so now you're able
to get higher bandwidth
1243.11 -> both between applications
running in multiple regions –
1245.746 -> so inter-region data transfer.
1247.614 -> You're also able to get that
between the instance and the Internet
1250.384 -> and we also provide
that between data transfer
1252.819 -> between the instance
in your on-premises location
1255.455 -> via AWS Direct Connect.
1258.725 -> AWS Direct Connect, as I said earlier,
1260.294 -> provides you with MPLS-like
connectivity to AWS regions
1264.898 -> from an on-premise location
via a Direct Connect location.
1269.203 -> And Direct Connect started out
with 10 Gigabit connections,
1272.472 -> you could get 10 Gigabits
of connectivity to AWS
1275.542 -> was the largest size.
1277.077 -> A couple of years ago we launched
support to be able to bring multiple
1281.982 -> 10 Gigabit connections together
as a single connection to AWS,
1285.586 -> but you still had to manage
those connections separately.
1288.488 -> And today I'm happy to announce
the availability of 100 Gigabits
1291.825 -> per second connections
via Direct Connect to AWS.
1296.496 -> This is full 100 Gigabit
port connectivity,
1299.8 -> you're getting access
to that full port, it's not shared
1302.87 -> and it's also not bringing together
a number of 10 Gigabit ports.
1307.074 -> It's a single port.
1308.709 -> We also provide security and privacy
through MACsec
1312.312 -> encryption that's available
on those ports as well
1314.781 -> if you want to ensure
those links are encrypted.
1317.251 -> We work with a lot of customers that
do an enormous amount of bandwidth
1321.421 -> and one of those customers
is ByteDance.
1324.825 -> They have a number of social
media applications
1327.895 -> that just generate
an enormous amount of content,
1331.365 -> and so these 100 Gigabit network
connections on Direct Connect
1334.902 -> to something
that they've used to scale
1337.037 -> and you can see to support…
they said to support
1338.906 -> ByteDance applications
we need high speed, low latency
1341.842 -> connections capable of sending
exabytes of data around the world,
1345.879 -> and we're very excited
to be able to work with them
1347.681 -> and support
the incredible applications.
1351.451 -> If you think back to your first-year
networking course at college
1355.822 -> one of the things
you learned about was TCP congestion.
1359.026 -> You know traditional TCP routing
1360.894 -> does not effectively use
all available network capacity
1363.864 -> and can often lead
to network congestion,
1366.033 -> and that's something we've been
looking at within our datacenters.
1368.569 -> But we see we must be able to get
more out of the capacity
1371.371 -> we have available and not be limited
by what a single TCP flow can do.
1376.777 -> We developed a protocol
a few years ago
1378.278 -> called
Scalable Reliable Datagram,
1380.514 -> which we've spoken about previously,
and this is actually an Ethernet
1383.383 -> based protocol
that we developed internally.
1385.953 -> We look at protocols like in funny
band
1387.788 -> another low latency network protocols
1389.523 -> but we thought you know we had
such an investment in Ethernet
1392.492 -> connectivity that we really wanted
to make sure we doubled down on that
1395.329 -> and see if we could innovate
to get around some of the challenges
1398.165 -> that TCP congestion brings.
1400.234 -> One of those challenges is TCP
requires that packets arrive in order
1404.404 -> and if a packet arrives out of order,
well what TCP will do
1406.84 -> is it will hold up the packets
and say
1408.575 -> "I missed the packet
can we please have a retransmit"
1411.178 -> and that's a lot of latency to add
and it also means that your flow
1414.748 -> actually has to follow a
common path through the network.
1418.719 -> And so we delivered Scalable
Reliable Datagrams
1420.754 -> so that it actually negates
'in order' packet delivery.
1423.357 -> We don't mind anymore in what order
those packets arrive
1425.492 -> and we rely on the layers in
the Nitro system above the data plan…
1428.695 -> above the data layer to make sure
1430.13 -> that we're actually reconverging
those packets in the right way.
1432.966 -> We've also shipped
Elastic Fabric Adapter,
1434.935 -> which is the underlying protocol EFA,
for network intensive applications.
1439.54 -> So when you use Elastic Fabric
Adapter with an EC2 instance today,
1443.01 -> you're able to get latencies
as low as 15 – that's 1-5 –
1447.181 -> microseconds between EC2 instances.
1449.082 -> It's incredibly popular in both
the high performance computing space
1452.519 -> and the deep learning
and machine learning spaces today.
1455.989 -> Let's take a look at how that works.
1457.324 -> So when a network flow is routed
through the network,
1459.693 -> typically it will choose
a set of routers
1461.995 -> and all packets in that flow,
based on how it hashes,
1464.198 -> the source IP and destination
IP and ports,
1466.733 -> will route along those same paths.
1468.268 -> But you can see there's routers
in the network
1470.137 -> that aren't actually being utilized
effectively for that flow,
1472.739 -> and there's effectively
capacity available
1475.175 -> that we're not using
for that flow today.
1477.244 -> The other problem you see there
is a few of the routers
1479.112 -> are actually seeing multiple flows.
1480.614 -> As our flows get larger we might
have routers
1482.783 -> that actually become overloaded.
1484.551 -> And so with Scalable Reliable
Datagram
1487.354 -> we're actually
able to send those packets,
1489.59 -> even for a single flow,
through multiple routers.
1492.025 -> It no longer matters in what
order they actually flow
1494.995 -> and so the results of this mean
you just see far less congestion,
1498.265 -> you have much better utilization
of your network and for you,
1500.834 -> as a customer, you see
significantly lower latencies.
1504.838 -> One of these customers
was Fox Sports,
1507.608 -> they like to do live broadcasts.
1509.443 -> If any of you do live broadcasts
you know how incredibly stressful
1513.113 -> that can be,
you don't get a second chance
1514.948 -> at a live broadcast
and what Fox had been trying to do
1517.417 -> was to see whether they could move
their live streams
1520.42 -> and their live broadcasts
to the cloud
1522.356 -> and they haven't been able to do that
with any other cloud provider.
1525.826 -> And they actually need latencies
as low as 16.6 microseconds per frame
1530.264 -> to make sure that they can
effectively broadcast a 4K stream.
1533.567 -> And with EFA and now SRD protocol
we've been able to provide Fox Sports
1539.306 -> with access to being able
to live broadcast directly from AWS.
1545.712 -> Coming soon, in 2022,
1548.415 -> we are bringing the power of SRD
not only to HPC applications
1551.852 -> or for machine learning
or specific use cases,
1553.954 -> we're going to be integrating
SRD deeply into the Nitro protocol,
1558.091 -> And so the… the VPC protocol.
1560.794 -> And so every single instance
that communicates
1563.263 -> either to the internet
or between instances
1565.299 -> is actually going to start
to see lower latencies.
1567.768 -> We expect a 300% increase
in single flow bandwidth
1571.104 -> and up to 90% reduction
in tail latencies.
1574.441 -> And so hopefully we're going to see…
you'll see that rolling out next year
1577.544 -> and really should improve
and lower the latencies
1579.613 -> that you're
seeing across your applications.
1583.217 -> Let's take a quick look at security.
1588.121 -> Here we started off with this
slide about our Global Backbone,
1590.691 -> and I mentioned it earlier,
but every single packet
1593.794 -> that flows across this Global
Backbone is actually encrypted.
1596.897 -> We don't have any flows that happened
between regions
1599.867 -> that aren't encrypted at the edge.
1601.735 -> We also encrypt everything
on the wire,
1603.437 -> we don't do it
on a per customer level,
1605.439 -> which means that your traffic
is just lost in the noise.
1607.708 -> Not only is it lost in the noise
it's also encrypted,
1610.01 -> which means that it's very difficult
to either decrypt or do
1613.413 -> any sort of sniffing
or interpretation
1615.382 -> of what the traffic could be.
1617.284 -> Within the region traffic
between availability zones
1620.02 -> and data centers
is also encrypted at the line level.
1624.558 -> We also have cross-region peering
is encrypted on top
1628.061 -> of what's provided
natively at the line level.
1629.696 -> We do additional encryption there.
1631.598 -> We have encryption between instances
for any of our newer Nitro instances,
1635.969 -> so any of the fifth generation
instances
1638.105 -> that have an 'N' in the name,
as a suffix,
1640.941 -> or any of our sixth
generation instances
1643.01 -> and every single Nitro
instance going forward,
1645.746 -> would support native network
encryption within the VPC.
1649.283 -> Nothing that you need to do.
1650.784 -> We obviously always recommend
you do do HTTPS as well.
1653.887 -> Then we also have IPsec
on VPN tunnels
1656.323 -> and MACsec on Direct Connect
to your data center and elastic load
1659.893 -> balancing provides easy encryption
of any application traffic using TLS.
1667 -> The other thing we've had to think
about is how do we make sure
1669.603 -> that this encryption
is not only good today,
1672.339 -> but with the imminent arrival
in the next 10 years
1675.576 -> or so of quantum computers,
1677.644 -> which have significantly more
processing power,
1680.647 -> there's a chance that somebody
could capture traffic today
1683.016 -> and be able to decrypt that
in the future.
1685.452 -> And so to protect against that,
1686.753 -> all of our encryptions algorithms
make use of AES-256 encryption
1691.425 -> to ensure that we're actually
quantum safe and reduce
1694.161 -> that risk of actually recording
traffic in 20 or 30 years time
1698.198 -> and being able to do
something with it.
1704.872 -> You know, one thing you often see
when I talk to customers,
1707.241 -> especially when they've migrated
to the cloud,
1709.009 -> is they have some on-premise devices
that they absolutely love.
1712.613 -> They've been using them for years,
they've worked incredibly well
1715.349 -> and they want to be able
to use them within AWS.
1718.051 -> Now the good news is most of these
providers have taken their appliances
1722.322 -> and actually virtualized them
and made them available in the cloud,
1725.325 -> and many of you are using things
like Palo Alto networks