AWS re:Invent 2022 - Trading up: Fidelity Investments takes trading to the cloud (FSI317)
AWS re:Invent 2022 - Trading up: Fidelity Investments takes trading to the cloud (FSI317)
The mainframe has been a cornerstone of the technical capabilities of Fidelity Investments for years. Recently, Fidelity instituted a strategic program to modernize its core brokerage platform and recast the mainframe’s capabilities into a cloud-based platform using elastic scalability and cost structures and more modern technologies while shedding technical debt. Join this session to learn more about Fidelity’s migration journey, which focuses on not only Fidelity’s technology stack (including Amazon EKS, Amazon MSK, Amazon DynamoDB, and AWS Lambda) but also its corporate culture and lessons learned from building one of the largest trading platforms on AWS.
ABOUT AWS Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.
AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.
#reInvent2022 #AWSreInvent2022 #AWSEvents
Content
0.12 -> - Hello everybody.
2.25 -> Thank you for coming here today.
3.6 -> I know that it's 5:30,
5.4 -> the first day of re:Invent.
6.54 -> All your colleagues are probably
at happy hour right now,
8.55 -> so we appreciate you being
here with us instead.
11.25 -> This is FSI317,
13.02 -> Trading up: Fidelity Investments
takes trading to the cloud.
16.86 -> My name is Jeremiah O'Connor.
18.18 -> I'm a Principal Solutions
Architect for AWS.
20.94 -> I've been working with the
Fidelity folks on stage
22.74 -> for about the last four years now,
24.33 -> helping them in their AWS journey.
26.82 -> So I'm gonna be joined
today by Louis Mancini,
29.46 -> who is Head of Equity Options & Technology
32.13 -> for Fidelity Investments.
33.84 -> I'm also joined on stage
with Amr Abdelhalem,
36.81 -> who is the Head of Cloud Platforms
38.94 -> at Fidelity Investments.
42.06 -> So, we've got a very
useful agenda here today.
44.46 -> So, the first topic we're gonna get into
46.86 -> is we're gonna talk about
48.06 -> Fidelity's order management system.
50.46 -> Then, we're gonna segue into
the roadmap and challenges
53.64 -> that they had migrating
this order management system
56.22 -> to AWS, and really getting
this very latency-sensitive
61.32 -> application running on AWS.
63.45 -> Then, we're gonna dive into
the actual architecture itself,
65.73 -> so we're gonna dive into
what AWS components comprise
68.64 -> this architecture for this application.
70.92 -> Then, we're gonna kick it over to Amr,
72.12 -> who's gonna talk a little bit
about platform resiliency,
74.82 -> so the stuff under the hood
75.93 -> that this application actually runs on.
78.09 -> And then, finally, we're
gonna wrap it all up
79.71 -> with sort of an overview of
Fidelity's cloud platforms
82.335 -> and how they work today.
84.36 -> So with that, I'll pass it over to Louis,
86.07 -> who's gonna talk a little bit about
87.21 -> the order management system.
89.58 -> - Thanks Jeremiah.
94.05 -> So, my name's Louis Mancini.
95.561 -> I run Equity & Options Trading Technology
98.46 -> at Fidelity Investments,
99.87 -> and the last few years,
101.072 -> we have begun the modernization
103.02 -> of our trading stack at Fidelity.
105.3 -> The trading landscape
in the last few years
107.67 -> has heavily changed, as I'm
sure many of you are aware.
110.88 -> We've gone from building out older systems
116.189 -> to having a need to be able to
scale to a much larger extent
120.87 -> and be able to process much larger volumes
124.11 -> of data in our systems
to be able to handle
126.39 -> the ever increasing amount of volume
128.22 -> that's coming in from
both our retail traders
130.32 -> and our institutional traders.
132.54 -> This project has begun in 2019,
134.88 -> and today we have it running in production
137.46 -> and I'm gonna take you through
138.81 -> how we went about that journey,
140.43 -> what the reasons we did for
that to build that system ware,
143.58 -> and what we've done in the future
145.92 -> to get what we plan on doing in the future
147.3 -> to get to where we are.
149.73 -> So, let's talk about the reasons
150.93 -> that we had to come up with to build this.
153.27 -> We started seeing capacity constraints,
154.89 -> as we said, up until about 2019.
156.901 -> Volumes were relatively consistent.
159.36 -> We could kind of know
about the max volumes
161.88 -> that we were gonna have,
and based upon our number
163.77 -> of clients and the amount of accounts
165.09 -> that we were seeing.
167.07 -> Into 2019, things drastically changed,
169.44 -> which I'm gonna show
in a couple of slides.
171.15 -> There were also some other reasons though,
172.71 -> that we needed to actually
go down this path, right?
174.99 -> We needed to modernize our technology.
176.64 -> Our technology was starting
to get a little old.
178.5 -> We've been been building
on top of mainframes
180.3 -> and on top of X-86 server architectures
182.198 -> that have been in existence since the 90s.
184.92 -> We also realized that we needed to have
188.67 -> a large amount of savings in
our terms of our costs, right?
192.33 -> When we look at the cloud,
193.59 -> we believe that we could
have some significant savings
195.69 -> in both our hardware and licensing fees,
197.944 -> real estate costs, and our ability
200.04 -> to have to staff our data centers.
201.99 -> Another key component that we realized
203.73 -> as we started to begin this
journey was speed to market.
206.28 -> Speed to market was
important in our ability
208.08 -> to change things, and get
that type of quick to market
213.81 -> new products was becoming
ever more important,
216.45 -> especially as people were trading more
218.85 -> and we started to see much larger volumes.
222.66 -> So, let's talk about the
impetus for this, right?
224.79 -> Back in 2019, this is a graph,
226.977 -> and this is one of my favorite graphs,
228.48 -> probably of my career.
229.8 -> You can see that this
is the trading volumes
231.84 -> that we saw at Fidelity Investments.
233.49 -> In 2019, we had our historic high.
235.83 -> And you could see from the
little part on the graph
238.17 -> what we saw up until then.
240.113 -> And at that time, we thought
that was a very large number,
243.15 -> but then three things drastically changed
245.73 -> in the trading space.
247.77 -> We had fractional and notional trading
249.54 -> that came about, in terms
of being able to trade
252.12 -> portions of shares, which
ended up allowing people
254.76 -> to trade much more frequently.
257.37 -> We had zero commissions,
259.17 -> which also changed the
trading landscape, right?
261.42 -> You saw that all of a sudden,
that you can trade for $0
264.27 -> and it no longer became
a barrier to trading
266.43 -> for you to be able to
execute and route an order.
268.38 -> So instead of maybe trading
one large lot of a 100,
270.69 -> you might have traded
three lots of 33, 33, 34.
275.61 -> And then, the last piece, of course,
276.96 -> was the meme stocks, right?
278.94 -> The meme stocks hit in 2021.
281.16 -> So to give an example,
282.15 -> which you could see from what
our historic peak in 2029,
285.249 -> we are doing 4X our historic 2019 peak
289.095 -> just in our average daily volumes today,
292.71 -> and we are doing well over five to 6X
297.81 -> in our peak that we
saw in the meme stocks,
300.06 -> it was actually over
500% our original peak.
302.85 -> And as most of you that
are involved in trading,
304.56 -> would know that's actually not even fully
306.51 -> the whole story, right?
307.71 -> What you really see is: you
see it in the first half hour,
310.32 -> you see the vast majority of trades
312.57 -> versus the actual entire day.
315.24 -> So, if you were to look at this,
316.135 -> our ability to handle that volume
318.54 -> needed to increase very significantly,
321.27 -> and that's how this came about.
322.53 -> We had taken this system live
324.12 -> that we're gonna talk about
324.953 -> right before that meme stock crisis.
329.295 -> So, let's talk about how we actually went
331.26 -> about building these new systems.
333.66 -> The first thing we had to do,
334.65 -> is we had to build out a development team
336.33 -> capable of operating in the cloud, right?
339.36 -> That's a very different paradigm
341.28 -> than actually building out
systems today running on-prem.
344.1 -> We had to take our developers
345.87 -> that were very familiar
with trading systems
347.85 -> and retrain them on how
to operate in a cloud
351.09 -> to be prepared for every and
anything to possibly fail.
354.48 -> To understand availability
zones, and regions,
357.6 -> and make sure that we could
transmit orders in a Tier Zero
361.11 -> system at any time under
any failure scenario.
364.17 -> So what that means in Fidelity terms,
366.21 -> is that even if we were to
lose an availability zone
368.391 -> or were to lose a region,
370.17 -> we can seamlessly work on orders
373.38 -> that were submitted in prior regions
375.09 -> or new orders without a customer
377.271 -> knowing about the outage.
379.98 -> Some of the key things that we had to do
381.66 -> is we started shortening release cycles,
384.36 -> we did smaller builds,
386.28 -> we did CI/CD pipelines, and
we did the standard stuff.
389.4 -> But some of the non-standard
stuff that we had to do
391.59 -> that's special to trading
392.79 -> is we had to build a custom chaos
396.63 -> and performance testing framework
398.55 -> that allowed us to actually build out
400.8 -> and simulate market on open loads,
404.151 -> 10X market loads from the original peak
407.04 -> that you would see over there,
408.45 -> and be able to simulate failures
410.37 -> during any and all of these times
412.38 -> in real time, in a environment
414.263 -> that is a mimic of production
416.04 -> so that we can make sure
417.54 -> that we're actually able
to handle these trades,
419.46 -> and get these trades to the market
421.23 -> in a very fast period of time.
423.27 -> The reason for that,
because unlike most systems,
425.259 -> if a system fails or part
way through a transaction,
428.79 -> it can be completed later,
429.75 -> say like a credit card system, right?
431.28 -> Could always be recharged five
minutes or 10 minutes later
433.41 -> once the system comes back, a trade can't.
435.57 -> Once we've accepted a trade,
437.04 -> we need to get that trade to the market
438.982 -> or else the price might move
440.337 -> and the customer maybe owed money
442.26 -> on how to fix that trade from the price
443.67 -> they should have gotten from
the price they've gotten.
445.83 -> So, building out these chaos tools
448.14 -> and these performance tools
449.34 -> was very key to us being able to deliver
452.43 -> a trading system on AWS with
our tier zero requirements
456.84 -> of 24 by seven with 100% reliability.
462.39 -> So, let's talk about
some of the challenges
464.79 -> that we went through to get through,
466.47 -> and build what I just talked about.
468.36 -> Some of the challenges in trading,
469.98 -> is in trading you have to
interact with other parties
473.61 -> and you have to interact
with older systems
475.74 -> that give you data.
477 -> Trading is not simply taking a trade,
478.65 -> verifying you have enough money
480.15 -> and then sending it to a broker
or an exchange to execute.
484.77 -> That's the most simple form of it.
486.45 -> But when we look at fidelity
and how we do trade,
488.43 -> we service an enormous
amount of business lines,
491.76 -> stock plan services for people
that have blackout calendars.
496.05 -> WI, which is our workplace investing,
498.45 -> we have to make sure
that we can do all sorts
500.22 -> of different types of tradings,
501.09 -> and we also need to make sure
502.11 -> that we can handle complex order types.
504.45 -> So, we have to be able to integrate
506.04 -> with legacy systems that are both on-prem,
509.55 -> new systems that are in AWS for teams
512.1 -> that have already made
the progress to AWS,
514.144 -> and we have to be able to integrate
515.91 -> with all of the existing trading framework
518.31 -> around Wall Street.
519.45 -> And most of the trades
today, when you do a trade,
521.94 -> is sent via, many of you're
aware is called the fix network.
525.18 -> Fix is a protocol that is
built 20 something years ago
528.99 -> in the 90s.
529.92 -> It was designed around
servers that have, you know,
532.44 -> a disc attached to them.
533.79 -> Has heartbeats, and is
not designed for a cloud
536.79 -> where you need to have, you know,
537.96 -> storage that doesn't exactly
exist on your server.
540.93 -> So, we had to solve by
building custom fix engines
544.68 -> that can actually operate
in a Kubernetes environment,
547.86 -> and we had to build custom frameworks
549.63 -> to make sure that we
could actually operate
551.4 -> at a multiple thousands
transactions per second,
554.43 -> as it would look like
on a normal fixed engine
556.71 -> that you would see,
557.543 -> and be able to transmit those orders
559.5 -> to the market in a timely fashion.
561.57 -> We also had to make sure
that those fixed engines
565.139 -> could be able to operate multi-region,
567.672 -> which was another hard concept
that we had to get through,
570.33 -> which I'll explain a little bit more later
571.83 -> in some of the architecture diagrams.
574.32 -> Some of the other pieces
575.28 -> that we had to take into
account is, as I said before,
577.89 -> we had to do it in a multi-region setup.
580.98 -> What does that mean?
581.836 -> We need to make sure that
583.08 -> if a customer's submitting an order,
584.76 -> it could go to either region one
586.74 -> or region two, and if anything
was to happen to region one,
590.85 -> a customer could act upon
that order in region two
594.941 -> and be able to cancel
or replace that order
597.84 -> without actually knowing
that the primary site
600.45 -> that they sent that that
order to has failed.
603.33 -> We also need to make sure
604.53 -> that if we were to actually
still have a single region,
607.74 -> that we would be able to survive
609.57 -> multiple availability
zone failures as well.
612.81 -> And we did this in a
multiple different factors.
614.813 -> We did this by utilizing
616.909 -> many of the services from Amazon
619.02 -> such as MSK and DynamoDB,
621.39 -> which we'll show later in
our dark detector diagrams,
623.79 -> as well as having to build
a lot of custom toolkits
626.22 -> to allow us to circuit break,
628.77 -> be able to tell when some things go wrong,
630.6 -> reroute orders that are in flight
632.55 -> in case of an issue,
633.78 -> detect failures, say
in underlying storage,
637.05 -> and be able to actually
work on orders in real time,
640.26 -> triage them automatically via the system
642.87 -> and be able to make sure
643.74 -> that they get to market in a real time.
645.69 -> And if for some reason they don't,
647.07 -> we can actually notify
and have a corrected price
649.71 -> for the customer.
650.543 -> So this should all be
invisible to the customer.
653.97 -> Some other pieces that we had to deal with
655.655 -> is the transition between
public cloud and legacy.
658.755 -> We have an enormous amount of data
660.93 -> that's going through our legacy systems.
662.55 -> It was impossible for us
664.62 -> to actually just build a new system,
666.42 -> which would actually be many systems
669.03 -> and just reroute everything
to the new system in one day.
672.24 -> So, we've had to stand up
a new system inside of AWS,
675.96 -> as well as link it back to our old system
678.6 -> that's actually currently today on-prem.
681.9 -> And the way we've had to do that,
683.01 -> is we've had to forward bridge
and backwards bridge data
686.4 -> and we'll go through that in
the architectural diagram.
688.71 -> And some of the key components to that
690.15 -> right in the trading system here
692.04 -> is unlike the traditional
low latency trading systems
694.74 -> that you would see in any of the other
697.47 -> margin broker dealers,
698.76 -> we also have to support
order inquiry in real time.
702.33 -> Customers utilizing our
active trader plow platform,
705.33 -> or fidelity.com, need
to know what the status
708.09 -> of their order is immediately
after it's submitted,
710.46 -> and immediately after it's been executed.
712.74 -> That requires us to have the ability
714.87 -> to serve those customers from
either our old system out,
718.14 -> or our new system independent
720.653 -> of where we've actually sent
that order to the marketplace.
725.94 -> And of course, the last
piece of the challenges
727.59 -> that we've had to deal with
coming from legacy systems,
730.92 -> we're able to control
our hardware changes.
733.62 -> In the cloud, we cannot always
control our hardware changes,
736.23 -> and as a 24 by seven system,
738 -> we need to be able to
route away from changes
740.34 -> if there are major changes happening,
741.9 -> or be able to just absorb those
changes in the trading day.
745.71 -> That required a large
amount of understanding
748.26 -> from our developers to make sure
749.73 -> that they can code and
be able to absorb changes
754.41 -> that happen intraday on
these cloud based systems.
759.48 -> So, let's talk a little bit
about some of the technology
761.074 -> we use to overcome some of the challenges
765.24 -> that I just spoke about.
767.97 -> One of the key pieces we
had to use data stores.
771.33 -> Some of the data stores
we used is DynamoDB,
773.46 -> as well as we had to use some
custom in-house cash systems
776.85 -> as well as a traditional RDBMS to satisfy
779.43 -> some of the needs that
we couldn't be satisfied
781.74 -> in a standard key value
payer system such as Dynamo.
785.76 -> Another really big key component,
787.44 -> is the logging and visibility.
789.54 -> We had to build an entire framework
792.03 -> around logging and visibility
794.55 -> so that we could make sure
that we could track a trade
797.82 -> from when that trade begins,
799.86 -> and hits the system through
every single component
802.59 -> of the system, in real time,
804.81 -> to make sure that trade
is actually executing,
807.57 -> and a problem hasn't occurred.
809.28 -> So, we have systems today
810.96 -> that are actually going to listen
812.16 -> to all the different pieces.
813.15 -> So, we will accept the trade,
814.8 -> we will validate it, we will
then begin the routing process,
817.95 -> we will send it to an exchange,
819.33 -> we will receive an acknowledgement,
820.62 -> we will receive executions,
822 -> and we will validate in real time
824.04 -> to make sure that all of those processes
825.99 -> are performing as they're supposed to.
827.97 -> And if they're not performing
as they're supposed to,
830.49 -> we have circuit breakers that
we've built into the system
833.07 -> that will actually knock
pieces of the system,
835.2 -> whether it's a region,
836.49 -> a pod in Kubernetes, or
it's a region in AWS,
842.7 -> a whole region, or it's
an availability zone
844.59 -> that will knock them out in real time
846.36 -> in an automated fashion
847.92 -> so that the customer is not impacted.
849.87 -> So, we can triage, or the
system could automatically try
852.54 -> and correct itself any event
855.07 -> that should occur that all of a sudden,
857.49 -> our trades are not going to the market
858.99 -> in as timely fashion as necessary
860.88 -> to provide the customer of an execution
862.539 -> that will give them the best price.
865.71 -> Like I mentioned before,
866.91 -> our enhanced test tools
868.203 -> on top of just the standard test tools
870.39 -> that we had to build
for like unit testing,
872.891 -> we had to build an entire custom chaos
876.24 -> and performance framework for that.
878.28 -> I mentioned that a little bit before,
879.87 -> but let me go into a little
bit more detail on that, right?
882.84 -> We've built an entire replica
884.19 -> that we can stand up
and stand down at will
886.16 -> of our production environment,
887.82 -> and have built replicas
of our entire data sets
890.58 -> at 1X, 2X, and up to 10X
892.219 -> or even more of our maximum day,
894.96 -> like I showed on that green graph
896.4 -> at the beginning of the presentation.
898.32 -> We are then able to
replicate that environment,
901.44 -> send data in, have an expectation
904.38 -> of what we would see at our
99th, 99.9, etc. timeframe,
908.31 -> and then we can inject automated faults
910.65 -> in every part of the system
911.76 -> that we can at least come
up with that could fail
914.13 -> to make sure that as those failures occur,
917.168 -> we are able to actually
respond in real time,
920.37 -> and be able to correct the system
921.96 -> so that the customer will not be aware.
923.97 -> And that goes from everything
925.23 -> from the smallest Kubernetes pod,
927.48 -> to a large scale database failure,
929.85 -> to a failure of an availability zone,
934.05 -> to an entire region failure.
936.18 -> So, we can fail in regions.
938.52 -> And we are set up in
a way that's hot, hot,
940.74 -> which I'll explain a little
bit later in the diagram.
943.26 -> But just as we mean, we
can send order flow today
946.14 -> to both regions at all times.
947.79 -> So today,
948.93 -> if you were to be sending
orders on fidelity.com
951.21 -> and you were to be utilizing it,
952.65 -> there is a roughly coin flip probability
955.98 -> that you'll be sending orders
to either our new system,
958.98 -> or to our old system.
960.57 -> And there's also another
roughly coin flip probability
962.79 -> that you'll be going to if
you went to the new system,
964.98 -> you would be going to
the one of the prime:
967.26 -> the first region or the second region.
968.76 -> There is no primary region.
970.35 -> There's two regions that
run in a hot, hot fashion
972.96 -> replicating in real time to each other.
976.71 -> Some other pieces that we did
977.76 -> that were a little bit more standard,
979.248 -> we utilized mostly standard middleware
981.93 -> and messaging packages.
983.52 -> We used a lot of MSK from
from Amazon managed service,
987.6 -> and we also did a little bit
of custom middleware messaging
990.48 -> like I mentioned before,
991.65 -> where we had to build
some custom fix engines,
993.54 -> and some custom underlying technology
995.37 -> to make those work inside
of a Kubernetes environment.
999.15 -> And in terms of language,
1000.05 -> we use mostly standard Java languages,
1001.73 -> but we use a little
smattering of everything else.
1006.56 -> So, let's talk a little
bit about our order,
1008.12 -> our high level order architecture.
1010.58 -> The way we've structured this,
1011.66 -> is you can see we have the gray diagram,
1014.42 -> which is our legacy systems,
1016.34 -> which will not only take orders in
1018.38 -> and orders and send them to the market.
1021.68 -> Our green systems are our new AWS systems,
1023.87 -> which will also take orders
in and orders to the market.
1027.11 -> Both systems will also process
our customer inquiry traffic,
1032.54 -> which can be very large.
1034.25 -> We serve and operate as a
standard order management system
1036.86 -> for a lot of different customers.
1038.96 -> We have our own system,
1040.13 -> such as Active Trade Pro and fidelity.com,
1044.36 -> but there are many channels
and other business lines
1046.7 -> inside Fidelity and outside of Fidelity
1048.98 -> in our clearing business
1050.99 -> that utilize Fidelity's
trading infrastructure.
1053.51 -> And all of them require the ability
1055.55 -> to know exactly where an order is,
1057.223 -> at what state that order is,
1059.21 -> and be able to act upon
that order at any time.
1062.24 -> The key to what we did here
1063.46 -> is we wanted to make this
invisible to the customer.
1066.89 -> So as you can see in the upper left,
1068.45 -> there's a box that's called
director that runs on site.
1071.399 -> All trades and all inquiry statuses
1075.71 -> will go through director.
1077.12 -> Today, trades do inquiry process statuses
1079.43 -> are almost complete.
1080.9 -> Director has the ability based upon rules
1083.026 -> and circuit breaker knowledge
1084.92 -> of where to route an order.
1087.59 -> Or should it go to the new system,
1089.18 -> or should it go to the old system?
1091.25 -> What this does is it makes
it invisible to the customer.
1093.8 -> We gave one API that's in front.
1095.63 -> So if someone's building
out a new trading system,
1097.757 -> and they need to connect to us,
1099.2 -> someone has an old trading system,
1100.519 -> they don't know if they're
going to the new system,
1102.83 -> because that allows us to migrate
1105.23 -> our flow piece by piece by piece,
1106.881 -> so as we build out new
pieces of the system,
1110.03 -> we can continue to add functionality
1112.85 -> without customers needing to be tied to us
1115.25 -> to make sure that they make
the appropriate changes
1117.65 -> and they are tied to us in releases.
1119.96 -> We became independent of their releases.
1123.59 -> The other key component
here, which I'll stress,
1125.415 -> is our ability to back bridge
1127.43 -> and forward bridge our data.
1129.11 -> We are in a hybrid mode right now,
1130.79 -> and we've been in a
hybrid mode for two years,
1133.07 -> and we will be in a hybrid
mode in multiple years
1135.47 -> coming until we complete this project.
1138.05 -> The ability for us to to back bridge
1140.36 -> and forward bridge our data
1141.322 -> is another piece of the puzzle
1143.21 -> that allows it so that customers
1145.34 -> don't have to worry
about where their inquiry
1147.62 -> or where their order goes.
1149.09 -> And if we were to move
customers, it would be invisible.
1152.24 -> So, our AWS system has a copy
of all of our trading data
1156.68 -> that's going on in real
time on our legacy systems.
1160.61 -> Our legacy systems have a copy
1162.71 -> of all of our trading data
1164.3 -> that's going on inside of our AWS systems,
1168.41 -> so that in the event
of a customer inquiry,
1170.93 -> they can go to either system
and get served their data.
1174.26 -> It also allows us a lot of freedom
1176.93 -> to release at a much faster rate.
1178.73 -> We don't have to necessarily,
1180.5 -> if we want to do a large release,
1181.82 -> we can scale volume down
and scale volume up.
1184.16 -> If we want to add new functionality,
1185.772 -> we can add that new functionality
1187.28 -> and slowly apply customers to it
1190.1 -> to make sure that we won't cause an outage
1191.78 -> or cause customer dissatisfaction.
1193.97 -> That's the key to how
we've been building this
1196.22 -> and that migration has
allowed us to continue
1199.13 -> to go forward.
1201.2 -> So, as you see in the picture,
1202.64 -> there's a forward bridge,
and a backwards bridge.
1204.98 -> We're real time asynchronously
1206.39 -> replicating across both of them.
1208.55 -> - [Audience Member] Quick question.
1209.383 -> - Sure.
1210.216 -> - [Audience Member] So,
are you all doing something
1211.61 -> like a strangler thing
pattern to basically do
1215.63 -> the traffic shift as you
keep adding new services?
1220.61 -> - It's very similar to a
strangler pattern, yes.
1222.65 -> it's not exactly, but it's
basically a rule based engine
1226.229 -> that underlying it has the 50
different or whatever x amount
1230.36 -> of rules that there are
that would compromise
1232.16 -> 100% of the order set, and then slowly
1234.26 -> we go click down one by one by one,
1236 -> until eventually hopefully there is none.
1237.5 -> - [Audience Member] Thank you.
1239.12 -> - No problem.
1241.49 -> So, let's go forward on
our platform resiliency.
1243.35 -> This is one of the hardest
pieces of the puzzle
1244.853 -> that we needed to solve when
building something on Amazon,
1248.12 -> especially something that
needed to be tier zero.
1252.77 -> We needed to make sure
that we could operate,
1254.27 -> as I said before,
1255.103 -> a multi-region multi
availability zone pattern
1258.71 -> and be able to operate
on orders that existed
1260.78 -> in either region at any time.
1263.54 -> So, couple of key decisions
that we made at the beginning.
1266.66 -> All of our application
logic is in Kubernetes,
1270.082 -> it is not in any sort of EC two instances,
1273.41 -> there is no application logic,
including our fix engines
1276.17 -> that are operating outside of Kubernetes.
1277.94 -> That allows us to be able to scale
1281 -> whatever we need to scale.
1282.68 -> It also allows us to add process
1284.42 -> so that the event of, say,
1285.41 -> we have another very, very large spike
1287.54 -> that is unforeseen due to
some sort of market event,
1290.3 -> we can simply click a button,
1291.95 -> and change our pod count from 20 to 50
1293.987 -> based upon our built up
orders that we see overnight.
1297.44 -> One of the ways in trading
that we can determine,
1299.27 -> especially in a more of in
a customer facing system,
1301.94 -> is our overnight orders give us a guess
1303.74 -> as to what the probability
1304.94 -> of what we're gonna see
in the first 30 minutes.
1306.92 -> So, unlike a traditional on-prem system
1308.9 -> where someone's gonna have to run
1310.28 -> and hopefully there'll be extra hardware
1311.81 -> lying around that we could set up,
1313.1 -> we never have to worry
about that problem again.
1315.11 -> We literally just spin
up a bunch more pods,
1317.373 -> and we're ready to go with
that particular point.
1320.324 -> Some of the other keys
that we needed to do
1323.24 -> to make sure that this would work
1324.38 -> is we need to be able to fail over,
1325.79 -> and this was something that
was very, very difficult
1327.44 -> for us to build.
1328.55 -> We need to be able to
seamlessly, within seconds,
1331.34 -> fail over our entire fixed infrastructure
1333.92 -> from one region to another region.
1336.86 -> So, if you trade a market order today,
1338.841 -> it's very quick, it's gonna execute
1341.36 -> or it's not gonna execute immediately.
1343.46 -> Half of the orders though
are not market orders.
1345.44 -> They're much more complex order types,
1347.3 -> limit orders, GTCs, go to
cancels, trailing stops,
1351.74 -> and they could exist for minutes, hours,
1354.2 -> days, weeks, months.
1355.823 -> Some could even go into the multi-month
1358.16 -> to even year timeframe.
1360.02 -> We need to make sure that if those orders
1361.7 -> exist on exchanges, then a
customer could actually operate
1364.79 -> on those orders even in
the event of a full failure
1368.27 -> of our entire region.
1370.43 -> So, to be able to accomplish that task,
1372.74 -> we've built the functionality
1374.03 -> that we can with the click of a button,
1375.95 -> be able to move our fixed engines
1377.63 -> from the affected bad
region to the good region,
1380.78 -> and be able to take all
of our customer flow
1382.7 -> from the director and point
it only to a single region.
1385.46 -> We have actually, and this
has been a very large benefit
1388.88 -> to us in production,
1389.713 -> so that we can actually
go to a single region
1392.57 -> with minimal to no
customer impact, if any,
1395.87 -> and be able to actually be able to trade
1397.61 -> in that other region.
1399.23 -> That was a very difficult problem,
1400.063 -> and one of the key pieces
to how we built this system.
1405.17 -> So, let me finish up with the
last slide on where we are.
1412.278 -> So, we go to the next slide.
1413.111 -> So where we are today,
1414.29 -> we began this journey about
four years ago in 2019.
1417.62 -> We took it live on AWS somewhere
1420.98 -> around two years ago from
the original POC that we did.
1424.219 -> You could could see: these
are real volumes here,
1426.051 -> as you could actually
see we've graphed them.
1429.32 -> You could actually see
in the red line here,
1431.75 -> what our legacy system is processing,
1433.49 -> and it kind of matches a
little to that green chart
1435.17 -> I showed at the beginning
of the presentation.
1437.09 -> And in the blue line,
1438.71 -> what our new system is processing,
1441.14 -> and you could see that there
are actually even days recently
1443.21 -> where we have actually
processed more trades
1444.814 -> on our new system versus our old system.
1448.88 -> Over the next couple
of years, coming years,
1451.04 -> we look to continue
doing this for equities
1453.2 -> and then we eventually look to scale
1454.55 -> this pattern to options
1455.93 -> as well as many of our
other different type
1457.58 -> of order types and business lines
1459.32 -> so that we can complete the migration
1461.33 -> from our older on premise legacy systems
1465.62 -> to our new cloud based systems.
1468.38 -> Thank you everybody for your time.
1470.561 -> (Louis chuckles)
1471.89 -> Thanks.
1472.723 -> Amr's gonna talk about
the cloud component of it,
1474.95 -> and what Fidelity did to build out
1476.33 -> their cloud architecture next,
1477.65 -> and then we'll take some questions.
1479.84 -> - Thank you Luke.
1480.737 -> (audience applauds)
1489.56 -> All right, so let me take a
different spin for the story.
1494.3 -> This is actually the first slide
1495.53 -> that we had in the presentation.
1496.82 -> - [Louis] For sure.
1497.653 -> - I'm going just backward a little bit
1498.486 -> like wanna talk about two
dates here or two numbers here.
1502.04 -> The first one was 2019,
1504.316 -> and that was Fidelity,
basically the announcement
1506.87 -> that the CNCF con at San Diego,
1511.16 -> our strategy to move to public clouds,
1513.47 -> multi public cloud as first.
1515.96 -> And then like three years
later, which is today,
1518.72 -> we're at 5,700 application
running in the public cloud.
1522.95 -> So, the all running in this platform
1525.26 -> and in the next 20 minutes,
1526.34 -> I'm gonna do my best
to tell you the story,
1528.47 -> how we build it, where is it today,
1531.17 -> what's our vision for tomorrow.
1532.91 -> and how we are hosting many applications
1536.15 -> that you know, actually,
1537.2 -> I would love to interact with you guys.
1539.72 -> Can you guys raise your
hand if you are using today
1541.79 -> fidelity.com, Fidelity mobile NetBenefits,
1545.489 -> all of our Fidelity products,
1548.18 -> and if you look around, it's awesome.
1550.88 -> Thank you for your
business, thank you so much.
1553.34 -> And literally you are interacting
with this platform today.
1558.17 -> So, this is a very
interesting slide actually/
1561.59 -> It does show like you
know, what was our vision,
1563.42 -> how we started that journey,
1565.07 -> and how we focus.
1565.903 -> When you scale 5,700 applications
1568.64 -> and more to come in that platform,
1570.53 -> you have to start first of all
by building the foundation.
1573.74 -> This foundation have to
be security foundation,
1576.56 -> your compliance foundation,
your infrastructure foundations,
1580.43 -> when are you gonna host your application,
1582.14 -> your data, your event streaming.
1584.3 -> All these components need to be in place.
1586.64 -> and I'm gonna go a little bit deep
1587.96 -> about that in a few slides.
1590 -> Also, you have to reimagine
these applications.
1592.43 -> So Louis and his platform,
1593.72 -> here is one of these.
1595.01 -> If you imagine since we
talk about Kubernetes here,
1597.65 -> and about containers, imagine
like all of these platforms
1600.65 -> are running as clusters of containers,
1603.2 -> they're all running like shapes or bolts
1605.3 -> or carriers that carries these containers.
1607.19 -> We have massive numbers of them today.
1609.65 -> Some of them are critical-T
to level tier zero
1612.35 -> to tier one, to tier three and so on.
1614.6 -> But that imagining of
the application itself
1617.09 -> to run encapsulated with
your API, with your data
1620.66 -> with event streaming, with your
observability and security,
1623.93 -> all encapsulated that one of the values
1626.57 -> that was added to this platform.
1628.79 -> And obviously work in investments,
1631.22 -> we care about our numbers.
1633.02 -> The FinOps model, I'm gonna
show a few slides about that,
1635.3 -> but the FinOps for us is critical.
1637.85 -> It can get really expensive
1639.14 -> when you move this amount of
applications to the cloud.
1642.11 -> How we manage that today,
1643.16 -> and how we are continuing to manage that,
1645.038 -> that's gonna be one of the discussions
1647.18 -> we're gonna have here.
1648.53 -> And last but not least, we
manage thousand of developers,
1652.34 -> hundreds of develops
teams, many business units
1655.31 -> or business partners
working with us around that.
1658.46 -> So, we're definitely
like focusing this year
1661.55 -> and next few years that on
the developer experience
1663.92 -> about having, you know,
1665.27 -> and building our fidelity
open source project,
1668.09 -> we're gonna discuss that,
show it to you guys as well,
1670.61 -> and how we are attracting
the talent in the company.
1674.629 -> So, just full disclaimer
before I go on this slide,
1678.92 -> this is one way to build the platform.
1681.56 -> There's many other ways that
we can build this platform,
1683.6 -> but this way is bulletproof.
1685.67 -> We tried it, it did work,
1687.59 -> and in this slide I
wanna just share with you
1690.02 -> like you know, what is the rules?
1691.73 -> Rule number one that we used,
1693.11 -> we use an open source technology.
1695.03 -> We focus in containers,
1696.2 -> we focus in Kubernetes,
1697.37 -> we focus in many of the CNCF
products that we use today.
1701.72 -> We're using Envoy.
1702.83 -> We are actually part of the
Envoy source product itself.
1705.56 -> We're big in the telemetry side,
1708.02 -> and open telemetry process as well.
1710.03 -> So, that was one of the key strategy
1711.86 -> that we announced in 2019 at Coucon.
1716.12 -> The second part was used managed services.
1719.3 -> So, while Cervic is awesome,
1721.49 -> but we don't wanna get busy managing coop.
1724.46 -> As a matter of fact, we
have a major private cloud
1726.529 -> in our data center today running coop,
1729.17 -> and it's a big job and big task
1732.11 -> for operating that platforms.
1734.27 -> So, definitely one of the
recommendations I would say,
1736.64 -> is to start using cool, you know,
1737.99 -> many services like ETS,
1739.82 -> and I'm gonna go a little bit in deep
1741.08 -> and how that we're using today,
1742.79 -> and managing like hundreds
and hundreds of clusters today
1746.15 -> or just like container ships as well.
1749.124 -> Number three, definitely,
1750.86 -> you need focus in building
your network strategy.
1754.7 -> So we literally, we have inside Fidelity
1756.35 -> similar like a map like
this, you know subway map,
1759.08 -> this is New York subway map,
1760.82 -> I couldn't share the right one
that we have inside Fidelity,
1763.79 -> but we have a map that
shows all the regions
1767.15 -> in all the cloud provider,
it shows all the colors,
1770.06 -> it shows all of our data centers,
1772.13 -> the exchange and the latency
between all these areas.
1776.15 -> So Louis and his team and other product
1778.64 -> and literacy where I should
place my applications.
1781.97 -> What is a sunny day via the rainy day,
1784.67 -> what is my like, you know,
1786.02 -> like, in New York if you
guys from New York area,
1787.64 -> you know there is a express subway
1789.02 -> and local subway when you
taking the express subway,
1792.17 -> what happened when you
have like this disaster
1794.09 -> and you have to go through a local subway.
1796.58 -> So the network is definitely
one of the investment
1798.95 -> that we did, and since we
start using managed services
1802.4 -> like ETS and MSK,
1804.5 -> we start focusing in building
1806.166 -> our Fidelity cooperatives program
1809.03 -> or our container program on top of that.
1811.52 -> So, we focus in the fleet management,
1813.5 -> how we manage these clusters,
1816.2 -> how we manage the multi-tenancy,
1817.88 -> how we can host multiple
applications in these clusters.
1821.57 -> We also focus in
application management side,
1824.36 -> how we integrate our
platform or applications
1827.36 -> with the security in the
backend with observability,
1831.95 -> with other components,
1833.282 -> And last but not least,
1835.07 -> integrating that with the
cloud services itself.
1837.68 -> Like, you wanna manage your clusters
1841.04 -> and you manage your fleet
from FinOps perspective
1843.98 -> for resource management,
1845.48 -> you wanna do that optimization.
1847.25 -> And our program focuses that.
1849.117 -> On top of that,
1850.13 -> we start building all
of our core comments.
1852.17 -> So today, literally we are more running
1854.376 -> our event streaming programs there,
1856.37 -> we're running our API program
on top of that platform.
1859.25 -> We're running many other programs
1861.29 -> including our future data programs itself
1863.36 -> is running on top of that platforms.
1865.52 -> And last but not least, obviously,
1867.56 -> trade and fidelity.com and others
1869.33 -> are running on top of that.
1874.34 -> There is multiple ways you can manage
1875.978 -> fleet of containers in Kubernetes.
1878.39 -> One way, you can buy a product.
1880.46 -> Second way, you can do like
what we did three years ago,
1883.49 -> go and assemble multiple
open source projects
1885.74 -> and build your own project
1887.27 -> or build your own open source program,
1889.37 -> or you can use ours.
1890.36 -> Ours is available, it's
free, it's open source.
1893.27 -> We'll be very happy to
collaborate with you,
1895.31 -> as a matter of fact, like, you know,
1896.683 -> there's a couple of banks
already collaborating with us
1899.33 -> around that, and we'll love if you guys
1901.4 -> wanna pre-partner with us about it.
1904.28 -> Just to give you, like, and
highlight the program itself.
1907.58 -> The first piece is like how
you can connect your fleet.
1909.95 -> So imagine you have a fleet of ships
1911.99 -> that are running your containers,
1913.73 -> and you wanna have your
thousand of developers
1916.07 -> access these platforms or
these clusters, and safely,
1920.93 -> and understand what rule
1922.1 -> and what authentication
authorization they can get in.
1924.68 -> That's what our KConnect tool does.
1927.541 -> Second one, is our Kraan,
1930.5 -> and Kraan is our framework that we build
1933.013 -> to manage these clusters,
1936.14 -> to build all the operators
1937.55 -> and the integration that
we have in these clusters
1939.8 -> and how we can safely
upgrade this cluster.
1942.83 -> As a matter of fact,
1945.86 -> Kubernetes program itself, and CNCF,
1948.223 -> and AWS required to
upgrade every three months.
1951.26 -> So, you have to upgrade your
environment every three months.
1953.48 -> We have requirement dehydration as well
1955.34 -> that goes almost on a monthly basis.
1958.28 -> So, with that program, and with Kraan,
1960.14 -> we didn't manage over 12,000
upgrades in the last few years.
1964.31 -> And that's all happened seamlessly
1966.17 -> without you know,
interference for the business.
1968.99 -> And you will need definitely
a kind of, like, framework
1972.11 -> that will manage this infrastructure
1973.58 -> for you or in your behalf.
1975.86 -> Last but not least,
1976.82 -> we're very focused today in
resiliency and operation.
1979.82 -> So, we're actually raising
our Theliv program,
1982.34 -> and Theliv is meant for hub,
1986.101 -> and it's a way of integrating
our fleet of clusters
1990.23 -> or containers or Kubernates
1991.481 -> with our premesis infrastructure
1993.674 -> that we're gonna be launching in future,
1996.53 -> and collecting all of this data,
1998.06 -> and all of our data analytics
1999.53 -> for all our operational
data in the backend.
2002.65 -> And we're building a framework
2003.76 -> where it actually can
program diagnosing issues
2007.87 -> in your application or event issues.
2010.03 -> For instance, if you have
an auto scaling event
2012.43 -> or you have a deployment event,
it will do in your behalf,
2015.43 -> it'll do the checkup that you ask it to do
2017.188 -> and it'll figure out where is the issues
2019.27 -> and the challenges and will
provide you with some, you know,
2024.22 -> solutions and hopefully in
future to be intelligent enough
2027.97 -> by adding some machine learning
Ops model on top of that.
2031.72 -> But this is a future for us.
2036.91 -> Now, the real foundation under all of that
2040.36 -> is an EKS cluster.
2042.28 -> Our EKS clusters are very systematic,
2045.82 -> meaning we provide like
one single template
2050.991 -> for how the cluster running
for all of our applications
2053.92 -> and all our systems.
2055.939 -> They have done as Louis is saying,
2058.21 -> in multi regions, multiple
availability zones.
2061.6 -> So they become rebuilt
2063.13 -> for all of our application team to use.
2065.74 -> We also provide policies,
like policies about
2069.55 -> how our routing is happening,
2070.93 -> how is our DNS service in
the backend is being set up,
2073.42 -> how is our LBS is all figured.
2075.749 -> All of that becomes prebuilt.
2078.55 -> For all of our application team
2080.23 -> to host their application in that.
2087.58 -> Beside that, we actually
do cluster management side
2090.85 -> or hosting all out what we
call the Kraan program itself.
2094.12 -> That has all become, as well, prebuilt.
2096.1 -> So when you deploy your
application to our platform,
2099.82 -> you actually literally
deploy your application.
2102.49 -> Preconfigure for observability,
preconfigure for routing,
2106.33 -> preconfigure for security,
preconfigure for FinOps,
2110.073 -> preconfigure for east
and west communication,
2114.61 -> and preconfigure for you
know, additional tasks
2118.3 -> like you know how you do the near services
2120.07 -> that services discovery and others,
2121.63 -> and futuristic how we're
gonna do service mesh
2124.48 -> overall across all these clusters.
2131.11 -> I mentioned developer experience,
2132.67 -> that's something actually
we started this year.
2135.574 -> What we found after releasing
all these containers today,
2138.82 -> we have over quarter
million container running
2141.31 -> critical workload in production.
2144.19 -> And what we found that we
need to start building this
2146.446 -> conversion or consolidation,
2148.75 -> and unified our developer experience.
2151.558 -> Today, we have multiple
projects working as that.
2154.72 -> This is one of them, called
the Starling project.
2157.48 -> And the Starling project
2158.41 -> is our application management platform.
2161.77 -> It very focus around, like,
how you have unified experience
2166.09 -> when you board application,
2167.89 -> how you can manage the
small things in the backend,
2170.62 -> how you can integrate the
teams to board application,
2173.71 -> board multiple applications,
2175.51 -> how you can start like you know,
2177.1 -> building prescriptive
model around deployment,
2181.09 -> using some frameworks like you know Argos,
2183.55 -> CD and other frameworks.
2185.59 -> How you actually manage your cluster
2187.54 -> so you can manage your upgrade,
2188.71 -> you can manage your hydration,
2190.39 -> and provide all of that
through single portal
2193.18 -> that can be self-service for all the teams
2195.55 -> and all the application teams,
2197.17 -> and all of the Ops teams as well.
2199.78 -> You can manage that.
2204.25 -> Behind that, we actually have,
2206.03 -> and I have to recognize that
and I have to mention this,
2208.66 -> this is Bombayer, our sister
group, and very thick focus
2212.32 -> in building this modern development cycle.
2215.68 -> So, before your application, actually,
2218.05 -> let it board to our platform
2219.7 -> and before our application
can get inside the platform,
2222.34 -> you have to go through that pipeline.
2224.38 -> This is our like, you know,
2227.26 -> I would say like one of
the most like you know,
2230.59 -> awesome programs that I
saw for intersourcing,
2233.29 -> because it does collaborate all of our
2234.86 -> thousand of developers
are all collaborating
2237.7 -> and building that model system,
2239.65 -> and that model system, actually,
2240.91 -> it does focus on the governance side,
2243.31 -> it does focus in building,
like, consistency
2245.29 -> around the CI processes,
2246.97 -> how this application is being built
2248.92 -> around the test stream works
2250.75 -> around the security aspect of that,
2252.67 -> and it lasts into our production
2254.62 -> where our cloud platform is,
2255.97 -> and where is our application
is being deployed are.
2261.876 -> I wanna focus a little bit in the FinOps,
2263.89 -> there is actually one
presentation there, it's awesome
2266.5 -> you know, the presentation
information is there
2267.97 -> around the FinOps side,
2269.83 -> but I wanna go through how
we started the FinOps model
2272.74 -> in the cloud platform.
2274.687 -> In day one, it's like
when you buy a house,
2278.82 -> so you go there and you are
very excited about the new house
2282.31 -> but then you get hit by
the first mortgage bill,
2284.301 -> And be like, "Oh my god,
2286.18 -> I have to worry about that
and I have to manage as well."
2288.88 -> So the first thing you do,
2290.02 -> you're kind of looking at
refinancing information,
2292.54 -> you know, possibilities,
and that's what we did.
2295.39 -> So, definitely, like, you know,
2297.28 -> you wanna go through a discussion
2299.56 -> about using reserves, in a sense,
2301.9 -> and how you can do that
2302.989 -> will provide you kind of like, you know,
2306.386 -> definitely cost management in this case.
2309.55 -> But the next time, the next
one that you do after that,
2312.01 -> you start looking in your rooms,
2314.05 -> and you start seeing lights on,
2315.97 -> and you start like
shutting down the lights
2317.95 -> behind your kids and everyone
in the families, right?
2320.347 -> And we do that as well.
2321.91 -> We go like every weekend and every night,
2323.95 -> and we see which system are being not used
2326.2 -> or underutilized and we
do shut down this system
2328.96 -> in the back ends.
2330.7 -> And the next one, you
start thinking about like,
2332.717 -> "Why I don't put like an intelligent
2334.48 -> in our power system?
2336.52 -> Why don't start using solar,
2337.87 -> using some of this smart system
2340.45 -> and smart devices in the houses."
2342.52 -> And that's what we start
doing as well in our side.
2345.01 -> So we start like utilizing
the spot instance,
2347.83 -> that was one of the things
2349.03 -> that we leased about like two years ago
2350.68 -> and in our management infrastructure,
2353.17 -> we were able to get up to like
40% of saving by using spots.
2357.612 -> About two months ago, we
released graph return,
2361.48 -> and that's additional saving
that we're gonna experience,
2365.17 -> and we're still likely
evaluating that right now,
2367.45 -> but we expecting that
might go to additional 30%
2371.29 -> of the, you know, of course
saving as well in that.
2374.98 -> But they feel like what
is the most critical thing
2377.38 -> is really like you know,
application being developed
2380.56 -> toward a financial aspect
or how you can drive
2386.38 -> the culture inside your developing teams,
2388.45 -> and your developing community,
2390.1 -> to start thinking about cost saving
2392.59 -> and start thinking about
2394.66 -> how we can optimize
the application itself.
2396.91 -> How we can use a smart auto scaler,
2399.31 -> where auto scaler will understand
more than memory and CPUs.
2402.67 -> It will understand where this
application is being utilized
2405.28 -> and how it can be likely, you know,
2407.08 -> in a sense, will be reduced
2408.07 -> when you use these kind of things.
2409.72 -> And that's actually a futuristic thing
2411.19 -> that we're trying to do right now as well.
2416.59 -> I did speak about
observability a little bit
2419.05 -> when I mentioned Theliv,
2420.303 -> but this is one of the things
2421.87 -> that we're working on right now in our lab
2423.49 -> and we're gonna be
releasing that massively
2425.08 -> across of all of our
Fidelity cloud platforms,
2429.04 -> and what we found around
the observability side,
2431.41 -> it's very interesting,
2432.58 -> because when you are in your data center,
2435.61 -> you are literally fine
2437.32 -> with having traditional monitoring tools.
2439.87 -> But when you start
building a hybrid model,
2442.615 -> and hosting your application,
2444.64 -> part of your application on premise
2445.99 -> and other part is moving to the cloud
2447.61 -> and moving from one
region to another regio,
2449.83 -> and moving from one
cloud to another cloud,
2451.477 -> and you start to deploy
your mono application
2454.84 -> toward like, you know, microservices.
2457.12 -> So one single lab become
like 30 or 40 microservices,
2460.39 -> and you wanna manage
all this communication,
2462.73 -> what you found that the observability tool
2464.672 -> have more noise, it's more expensive,
2467.587 -> and it doesn't provide entity art
2471.43 -> that you're looking for.
2473.17 -> So, one of the things
that we're doing right now
2475 -> is start investing in building
our observability pipeline.
2477.963 -> It's based on CNCF OpenTelemetry.
2481.75 -> It would ease its GA right now
for metrics and for traces.
2486.047 -> We're still working with them today
2488.157 -> around the log side as well.
2490.21 -> And I think this is one of the areas
2491.157 -> that's gonna be our
future for observability.
2494.65 -> Once we turn the pipeline on,
2498.16 -> this means we can remove
most of this noises
2500.26 -> out of observability.
2501.45 -> We will be able also to drive
2505.073 -> like the actual data,
or the critical data,
2508.21 -> to our premium solution of observability,
2510.7 -> and move the non-critical data
2512.44 -> to S3 storage or other solutions
2515.8 -> that can be used in the backend,
2517.69 -> and using Theliv and
using the CNCF technology
2521.078 -> and Kubernetes, I think,
copremesis and others,
2523.63 -> will be able to collect this data.
2525.79 -> You know, for example,
2527.073 -> when you connect to an EBI server in coop,
2530.53 -> you might be able to extract
a thousand metrics per second
2533.562 -> out of your API server.
2536.251 -> This is by itself is very expensive
2539.26 -> if you're using like traditional
cloud observability tools.
2543.31 -> But using that method, you'll
be able to filter that.
2545.92 -> And you'll be able to see which area,
2547.75 -> which metrics that you care about
2549.7 -> at certain times and you
can program the other one
2552.31 -> to use them or not use them as well.
2562.21 -> Now, the data pattern
is a interesting topic
2567.67 -> because (laughs) we
started the data journey
2570.093 -> using our traditional RDBMS,
2573.34 -> and, you know, SQL databases,
2575.091 -> and and Louis mentions that as well
2577.9 -> in the first section of the presentation.
2580.044 -> What we found that while the work well
2586.18 -> inside our data center,
2587.32 -> but when you move in the cloud side,
2588.79 -> you have to worry about failures
2591.19 -> and you have to worry
about synchronization
2593.02 -> between multi regions,
2594.64 -> and between six availability
roll zones for tier zero
2597.49 -> application, like Louis' app.
2599.62 -> And with that, we have to start investing,
2601.45 -> like, a newer pattern.
2602.77 -> This is one of the pattern
that we invested in today
2605.56 -> using Dynamo DB.
2607.03 -> So, it's used as a caching layer
2609.4 -> in front of our RDBMS
database in the backend,
2613.48 -> and it does hot, hot synchronization
2615.31 -> between the two regions.
2617.77 -> That's how we guarantee that
the order or the failure
2620.92 -> in one of the region and the failure,
2622.42 -> one of the availability zones,
2625.93 -> can be recovered in the right SLAs
2629.2 -> when we have the data synchronization
2631.36 -> happening almost near time.
2636.174 -> And I wanna end it at,
2639.28 -> it's a great journey
in the last three years
2643 -> I think, you know, we use
multiple like newer technology,
2646.45 -> we have a lot of flyers,
2647.44 -> but what really mattered
was the Fidelity culture.
2652.03 -> Having these four
pillars between security,
2655.42 -> between the platform, between the SRE,
2657.945 -> between the applications,
2660.34 -> having harmony between the four,
2661.99 -> collaboration between the four pillars.
2665.71 -> That's actually what made
our platform successful.
2669.7 -> We chat, we argue, (laughs) we discuss,
2673.87 -> we change plans, but
at the end of the day,
2676.42 -> having these four pillars integrated
2678.67 -> and collaborating together,
2680.83 -> understanding that it's not
like a traditional data center,
2684.01 -> not traditional races that
can solve the problem.
2687.49 -> And instead, like, every team is worried
2689.554 -> and every team is focused on
what the other team is doing,
2692.62 -> security team is helping
the platform team,
2695.41 -> our platform team is for
you know, focused on SRE.
2698.41 -> Our SRE team is helping in engineering,
2700.9 -> our application team is
everywhere helping us with that.
2703.48 -> That's what matters, that's
what Fidelity culture,
2706.637 -> the cloud platform culture is about.
2708.97 -> And I think that's why
we're successful today,
2710.71 -> moving 5,700 application to the cloud
2713.74 -> and thank you so much for that.
2715.052 -> (audience applauds)
2721.45 -> - All right, thank you Amr.
2723.19 -> So, I promised we will
leave room, I should say,
2726.38 -> for questions from the audience.
2728.59 -> I do have one question, though, for Louis.
2729.88 -> I'm gonna flip back to this
architecture slide real quick,
2732.55 -> 'cause I think there's some
interesting data points
2735.43 -> here that I wanna discuss
real quick. Let me find it.
2739.24 -> So Louis, earlier on in your presentation,
2741.7 -> you were talking about
orders queuing up, right?