With a 200-year history as a pioneer in banking, JPMorgan Chase faces the challenge of meeting the ever-changing needs of its customers. Launched in 2021, Chase International combines the trustworthiness of an established brand with the seamless experience of a digital retail bank. Learn how Chase is providing fast access and personalized service 24/7 using AWS services, including Amazon Connect, Amazon EKS, AWS Glue, and Amazon OpenSearch Service. Chase details how it built a modern banking platform and scaled a new entity in an established market, attracting more than 500,000 consumers and over $10 billion in deposits.
ABOUT AWS Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.
AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.
#reInvent2022 #AWSreInvent2022 #AWSEvents
Content
1.06 -> - Hi everyone and welcome
to FSI session 308.
5.832 -> Chase International, always-on
customer experience at scale.
10.28 -> My name is Colin Marden and
I'm an AWS solutions architect.
15.341 -> As an architect, I help AWS customers
17.979 -> build secure, scalable,
18.812 -> and reliant reliable architectures on AWS.
23.821 -> I'm part of the AWS team that's
dedicated to JP Morgan Chase
28.445 -> and more specifically their
payments, merchant services,
31.464 -> and international businesses.
33.996 -> It's my secondary event,
35.568 -> but it's special for me 'cause
at this re:Invent I get to
38.776 -> introduce you to Chase International,
40.706 -> to their CIO Paul Clark
and to Courtney de Lautour
44.384 -> an executive director
and engineering lead.
48.011 -> Chase is a digital only retail bank
51.469 -> that launched around one year ago.
57.301 -> and they're owned by JP Morgan
Chase, the larger business.
60.961 -> If you've not heard of JP Morgan Chase,
62.862 -> they're a leading global
financial services firm
65.814 -> with operations worldwide
67.263 -> and around $3.4 trillion in assets.
71.603 -> They're a leader in investment
banking, commercial banking,
74.874 -> consumer and community
banking for small businesses,
78.738 -> financial transaction
processing and asset management.
82.717 -> With that, let's take a look
at the agenda for the day.
86.579 -> So we'll start with a few
introductions to our speakers
89.563 -> and to Chase International.
91.596 -> From there we'll discuss
customer experience
94.152 -> and why that matters so
much to Chase International.
97.492 -> Before we dive a little
deeper into the architecture
100.555 -> and exactly how the cloud has
enabled them to move fast,
104.4 -> but equally to stay secure.
106.989 -> Finally, we'll close by walking through
108.769 -> some of the key takeaways and challenges
111.124 -> from the last 12 months of operation.
115.159 -> So why did JP Morgan Chase
build Chase International?
119.479 -> To answer that question,
I'm joined by Paul Clark.
122.432 -> Paul, can you tell us a little more
124.242 -> about the JP Morgan
opportunity in retail banking
127.322 -> and exactly why you built Chase?
130.408 -> - Yeah, thanks Colin.
131.499 -> Hi everybody. It's nice to be here.
133.792 -> It's my first re:Invent,
135.558 -> so I'm sure for those in the
audience that are American,
138.456 -> you're familiar with Chase,
the brand and the bank.
141.532 -> It's never been launched
outside the US before.
144.142 -> It had been considered
but it was not thought
147.275 -> to be the right time.
148.501 -> Launching a bank requires
a lot of infrastructure.
151.801 -> You have to build branches,
153.026 -> you have to do a lot of brand marketing.
154.417 -> It's, it's very, very expensive.
156.865 -> But however, you know,
158.057 -> the world has moved on a little
bit and we're much more used
160.092 -> to now working in kind of digital ways
163.032 -> and digital banking is
becoming much more prevalent.
165.279 -> If you are in the UK,
166.921 -> we have banks like Staling,
168.428 -> we have banks like Monzo in Europe,
170.048 -> there's N 26 over here,
171.192 -> you guys have got Chime and
Venmo and people like that.
173.928 -> So people are much more used to it.
176.203 -> And we saw this point in time
as the point in in time to,
179.346 -> to launch a digital bank. You know,
181.141 -> we have a quite an advantage
over the kind of newer kind of
184.539 -> entrant banks if you like,
186.159 -> having the heritage of JP
Morgan behind us and having that
188.625 -> kind of weight behind us.
190.731 -> So we set out to build a
platform that would scout,
193.606 -> would scout to millions of customers,
194.814 -> would scout up and down
as it was required.
197.628 -> You know,
198.461 -> AWS was what we chose to do it on
200.278 -> and the reason we chose
to do it, you know,
202.311 -> was because to do it that
way is because people,
204.446 -> everybody uses digital.
205.816 -> Everybody in this room will
have a phone in their pocket.
207.873 -> Everybody in this room
uses apps all the time.
210.238 -> And what we want to do is
build a digital bank that would
212.23 -> scale but at the same time provide
214.313 -> the best possible service.
216.769 -> Because people don't compare banks, right?
218.924 -> You don't compare your
bank to your friends bank.
221.289 -> People compare their
banks to the digital apps.
223.421 -> They're used to using their Instagrams,
225.807 -> their Facebooks, whatever
your favorite mobile app is.
229.332 -> And so we wanted to build a
bank using digital first kind of
232.808 -> methods that would scale that would
235.391 -> give that kind of a wonderful service.
236.947 -> And then ironically we decided
to do it in the hardest
238.771 -> possible way.
239.703 -> We decided to launch it in the
UK and we decided to do that
243.424 -> because for those of you
familiar with the idea of do the
245.576 -> hardest thing first,
246.557 -> the UK is a very competitive
banking landscape and it has
249.842 -> the most kind of
competitive digital banks.
252.349 -> It also has a very high bar
when it comes to regulation.
256.464 -> The Bank of England and the PR and FCA
259.231 -> who regulators is considered to be kind
261.2 -> of like almost the gold standard really.
263.03 -> So if we could launch a
digital bank in the UK,
266.208 -> if we could make that successful,
267.518 -> if we could do that in
that regulatory regime,
269.44 -> we kind of ticked a box that
this thing would have legs
272.43 -> potentially to get even bigger.
276.499 -> So in building that bank and
as we set out to to build it,
280.819 -> we wanted to kind of create
the best possible experiences
284.15 -> I've already said, but we also
had to create a trust, right?
287.302 -> Banks are in a kind of unique position,
288.952 -> unlike the other apps
that sit on your phone,
291.609 -> we have all your money and
people really care about their
294.03 -> money and particularly care if
they can't get their hands on
297.171 -> their money.
298.397 -> So we needed to build a bank
that would scale a bank that
301.153 -> was easy to use, but also a
bank that you could trust.
305.28 -> And we needed to kind of
do all that and make it
307.169 -> frictionless at the same
time. Because you know,
308.936 -> back to that point about you
don't compare your banking app
311.192 -> to someone else's banking app.
312.35 -> What you do is you compare your
banking app to your favorite
314.443 -> digital apps.
316.286 -> But we have to a balance,
318.472 -> we need to strike at the same
time as we want to make that
320.686 -> experience as good as possible
for you and as frictionless
323.203 -> as possible for you,
324.512 -> we have to keep the bad guys
out because we've got your
327.024 -> money.
327.904 -> You don't wanna let the bad guys out.
329.604 -> So we really kind of focused
on the customer experience and
331.891 -> how we could use customer experience,
334.649 -> to build that trust with you
and to give you a wonderful
337.163 -> experience of banking.
338.742 -> So for instance, you know,
339.745 -> every time you tap your card
you get an instant notification
341.68 -> of your cards, you've used
your card and you might think,
343.814 -> well I've just used it, I,
345.264 -> I know that I've just used my card.
347.139 -> Well that's not the point.
348.142 -> What we're doing is we're
training you in some way to kind
350.537 -> of recognize that when your
card is used you get an alert.
352.926 -> So if you were to get an
alert but you hadn't used your
355.192 -> card, then somebody else
must must be using your card.
357.587 -> So we can kind of build
that kind of trust with you.
361.795 -> We send you real time push notifications.
363.528 -> We do other things like we
pre-auth you when you phone up,
367.403 -> if you need to phone us up,
368.595 -> we pre-auth you so you
phone us from the app,
371.531 -> you're bio metrically secured because
372.854 -> you've just come in via your phone,
374.664 -> you come through to the contact center.
376.311 -> Courtney's gonna talk a
little bit about how the magic
377.899 -> happens later,
378.984 -> but we know it's you
379.817 -> and we can take you
straight into the journey.
381.851 -> If you phoned from something
other than your phone,
384.496 -> we'd send you a push notification
to your phone and say,
386.552 -> hey, this is you.
387.6 -> Same way as you get two factor
authentication when you're
389.595 -> using Google to log in and all
these things help build trust
393.083 -> and help build a frictionless experience.
395.635 -> And that then allows us to
kind of offer you more services
398.584 -> because you know, we're
building that relationship.
406.123 -> Talked about kind of
building a bank in the UK,
407.796 -> it's a great place to start.
408.879 -> I'm sure some of you look at
kind of Jamie Diamond when he
411.158 -> gives an investor updates.
412.776 -> You have an idea about how
much we're spending, you know,
415.503 -> we are not building a bank
for the UK I think, you know,
418.785 -> we started in the UK and
that's definitely the plan.
422.048 -> But to be sustainable as a bank,
424.304 -> you have to kind of be
quite a large scale.
427.184 -> And now we have a big advantage
here against the legacy
430.244 -> banks because of the way we've
set about building the bank.
433.234 -> And Courtney's gonna talk about
that in far greater detail
435.138 -> than I could ever understand.
437.866 -> We, because of the way we've built it,
439.871 -> it means that we can scale
it up and the kind of,
442.388 -> and kind of marginal cost of
adding new customers to that
444.197 -> platform is, is quite low.
446.375 -> So compared to the legacy
banks which might end up having
449.154 -> kind of a system per geography let's say,
451.882 -> and then all the costs
that come with that,
453.42 -> if we do this right and I
believe we are doing it right and
456.47 -> what we should be able to
do is scale the bank up into
458.335 -> multiple geographies,
459.314 -> scale the bank up into millions
and millions of customers
461.959 -> and we should be able to keep
our costs low because of the
464.628 -> way we use the underlying
platforms that support the way we
467.799 -> built it.
468.632 -> This will give us a much better
cost income ratio than the
471.234 -> legacy banks and really,
472.943 -> really help drive
profitability in the long term.
475.573 -> So yes, we've had kind of
a lot of initial input,
477.918 -> you know, a lot of cash
input at the beginning
479.852 -> to get this thing set up,
480.958 -> but really the kind of payback
comes in the long term as we
483.588 -> scale the platform on and on.
488.276 -> We worked really hard to try
490.457 -> and get everything on a slide to,
493.06 -> but we couldn't work out how
to do that for the last one.
494.87 -> So apologies for, for not
being able to do that.
497.87 -> But yeah,
499.057 -> these are the core development tenants
500.765 -> and it's not just for development,
502.253 -> it's how we really think about
503.086 -> how we build a bank from scratch.
507.025 -> Now if other people stood on
the stage here today and talked
510.006 -> about their wonderful startup
or their digital company,
512.913 -> they'd all probably talk about similar,
514.817 -> similar kinds of things, right?
516.518 -> We wanna make it really,
517.351 -> really easy for our engineers
to do the right thing.
519.51 -> We wanna make it really
easy for our engineers to,
521.815 -> to do what they're good
at, which is creating IP,
524.614 -> creating services that our
customers love and, and you know,
527.905 -> really kind of delivering
value constantly.
531.174 -> And we work in a very heavily
regulated environment.
536.301 -> We have a control,
537.145 -> a control environment which
is there for all very,
539.756 -> very good reasons, right?
540.804 -> Protect us to protect you, the customers,
543.209 -> to protect the money, to
protect our reputation.
546.238 -> And what we set out to do at
the very beginning was ask
549.48 -> ourselves the question,
550.313 -> how could we build a tool
chains and and and our kind of
553.529 -> automation that would allow us
to really kind of manage that
557.969 -> control environment in a way
which is almost invisible to
560.37 -> the engineers so they can
just get on and do their job,
562.905 -> but in doing their job,
563.738 -> they're doing the right
things in the right ways
565.696 -> to protect us, to protect the customer.
568.923 -> And for us it was about building
platforms and you hear the
572.082 -> joke, you know,
572.924 -> it's totals all the way down
kind of platforms all the way
575.863 -> down these days really.
577.102 -> So you know, we built a
banking platform, you know,
578.683 -> customers come on they
can kind of do their,
581.41 -> I think the Americans call
it a checking account,
584.062 -> we call it a current account.
585.088 -> They can come in and use
their bank account, you know,
587.091 -> that's on a banking platform.
588.371 -> It's powered by our developer
experience platform that
590.865 -> allows me to kind of build, test, deploy,
592.187 -> operate my software that's its, you know,
594.371 -> next to our observability
platform so we can keep an eye on
596.776 -> everything that's happening.
598.338 -> You know,
599.171 -> all of the output of that gets
fed into our data platform
602.305 -> that sits on top of
our kind of tech stack,
604.648 -> our tech platform,
605.481 -> which eventually at
the bottom sits on AWS.
608.547 -> I think that scales up
and down, you know, and,
610.95 -> and what we did here is, you know,
613.099 -> we tried to simplify it for
the engineers that they could
615.026 -> just get on and do the right
thing and we could then move at
617.877 -> pace and at scale we do somewhere
in the region about 4,000
620.695 -> production releases a year. You know,
622.447 -> we try to make it really easy
for our engineers to do the
624.77 -> right thing, deliver value constantly.
627.815 -> The other thing to mention
here is that last point,
630.002 -> low power distance. You know,
631.861 -> we're not talking about hierarchy,
633.402 -> although within Chase in the
UK we do try to maintain a
637.36 -> really flat hierarchy.
638.709 -> So I sit out on the floor
with everybody else,
640.258 -> but low power distance means
allowing the engineers to just
643.33 -> be able to get on with stuff self-serve.
644.981 -> So what we don't want to do is
create a series of platforms
647.157 -> that everybody has to raise tickets for.
648.723 -> We want the engineers to be
able to get their hands on the
651.206 -> tools so they can get ahead
and they can then create the
654.287 -> software that's required.
655.62 -> So we have low power
distance both in hierarchy
658.605 -> and in in how the
engineers kind of go about
660.65 -> their day-to-day jobs.
662.431 -> And this isn't just for how
we think about kind of writing
665.743 -> code and engineering. This is
how we think about everything.
667.724 -> So we think about resilience,
669.813 -> both technical and business resilience.
671.423 -> We try to kind of make sure
that we can automate those
673.621 -> processes and kind of get
all those ileitis in place.
679.294 -> With that said,
682.295 -> I'm gonna hand over to Courtney
and Courtney's gonna talk in
685.306 -> a little bit more detail
about all that happens,
687.642 -> how that all that happens.
688.76 -> Courtney runs our
engineering team at Chase.
693.392 -> - Thanks Paul.
694.909 -> Paul has mentioned a lot of
platform teams there and because
697.961 -> one of my teams is
called the platform team,
700.033 -> it's probably good to
explain which that one is.
701.665 -> And it's a team that you might
have heard referred to as a
703.9 -> DevOps team infrastructure
or cloud enablement team.
707.342 -> So it fits into that
kind of platform space.
709.74 -> And one of the things that
we are doing here is we are
712.668 -> building a platform and it's a product.
714.561 -> So we are calling the teams
platform teams because these are
717.38 -> related to products.
718.95 -> Products have customers,
DevOps doesn't have a customer,
721.921 -> it's a way of working The
cloud isn't a product.
725.121 -> It might be Colins actually,
but no you lied to me.
732.018 -> So we have the cloud, which
is a technology choice for us,
735.965 -> but it's not a product.
737.141 -> Our product that we are building
is the engineers that are
739.499 -> building the bank for us.
741.158 -> So we need to empower them
to be able to build great
744.422 -> customer experiences with as
minimal friction as they can.
747.181 -> And one of the things that
happens when you have a platform
749.185 -> team or a core engineering team
or an infrastructure team is
752.689 -> it might be tempting to
consolidate control into that one
755.457 -> team and erode the low power distance.
757.844 -> And that's because it seems to
be an easy place to implement
760.487 -> these controls and requirements.
762.05 -> Imagine that you wanna mandate
that you have an 80% code
765.358 -> coverage throughout your experience.
767.139 -> This is great for the team who
768.472 -> wants to have 90% code coverage.
770.228 -> They can meet that
requirement really easily.
772.121 -> But what about the team who
doesn't want to do unit testing
773.989 -> at all?
774.822 -> They want to do their testing
through integration testing
777.13 -> might not be something I would recommend,
778.71 -> but they're not gonna have any
code coverage metrics that'll
781.337 -> feed into your system and then
you will have reduced their
783.583 -> ability to do what they want to do.
785.553 -> It's our job as a platform
team to now enable
788.502 -> other teams to do their job
789.781 -> and not to tell them how to do it.
793.738 -> You've probably heard the
term go fast and break things,
796.24 -> right?
797.073 -> This is a fairly common
thing around startups
799.018 -> and moving on.
799.871 -> But as a bank and to maintain
that trust that we have that
802.701 -> Paul was just talking
about, we have to go,
804.591 -> we have to go fast but
we can't break things.
807.965 -> So one of the things that we
started with was coming up with
810.525 -> our account structure at the
start we knew that we were
813.905 -> gonna be building a new bank
with a new way of working new
816.744 -> principles and potentially even
a slightly different culture
819.333 -> than the rest of JP Morgan as a whole.
821.823 -> So one of the first things that
we knew we wanted to do was
824.351 -> to separate out all of our
accounts and our workloads.
827.623 -> It's quite easy to understand
that we'd want to have our
829.958 -> services, our banking, our accounts,
831.653 -> et cetera in different locations.
833.734 -> But we wanted to go a bit
step further and we ended up
836.053 -> separating out so that we run
our own bit bucket instance,
839.952 -> we run our own tool chain,
841.677 -> we run our own observability stack,
843.277 -> we run our own data platform,
we own everything separate.
846.295 -> And one of the main problems
that you face when you're
848.317 -> starting on this is where do
you actually start on designing
850.895 -> your account structure?
852.353 -> You could start off with the
AWS well-architected framework.
855.162 -> So it's a great resource and
you can go into the finer
857.517 -> details of the implementations
that you might want to do.
860.357 -> This might end up getting you
towards analysis paralysis
862.895 -> where you're asking do we want
to have a singular account
865.117 -> where everything is in one
place or do we want to have a
867.063 -> cellular architecture where
everything is isolated?
870.263 -> But for us we wanted to have a
look at what the requirements
873.162 -> of these accounts were gonna
be and why we would do them.
875.86 -> For instance,
876.693 -> maybe we wanna have role
based access control for some
878.791 -> accounts or dedicated access for others.
881.013 -> Some of these accounts might
need to have three nights of
882.887 -> availability or others might need five.
886.109 -> RPO and RTO.
888.074 -> There's gonna be some things
that do need an RPO of zero,
890.924 -> but then there's gonna be a
lot of things that can actually
892.634 -> tolerate some failure.
894.085 -> And in terms of failure domain,
you might need to think,
897.186 -> oh if we separate everything,
898.698 -> nothing can impact each other if it fails.
900.741 -> But then you're trading off
economies of scale and you're
902.906 -> gonna be paying a lot of money
for compute and resources
905.098 -> that you're not using.
906.57 -> And I think given we're in an FSI track,
908.975 -> a lot of us probably are
regulated to some extent here.
911.782 -> And so you need to think about
audit ability of your systems
913.81 -> as well.
914.879 -> You're gonna be asked by your
regulators to explain how
917.199 -> things work and where you draw
the box around your accounts
919.904 -> is gonna impact how much you
have to explain to them when
923.281 -> they're asking you how does this work?
924.943 -> And also if you create them too small,
927.217 -> you're gonna have to
explain the relationships
928.675 -> between all of those,
929.944 -> which is probably just as big a work
930.777 -> as doing it all into a single account.
932.941 -> So what we have here is
we have the internet edge,
935.89 -> which is kind of like a DMZed.
937.104 -> It does the same things,
938.173 -> it does load shedding when
you have incorrect payloads,
940.501 -> it does attack analysis,
941.789 -> it does high level authentication
for requests before we
944.273 -> dive into zero trust,
945.538 -> lower down in core banking stack
947.533 -> and the core banking accounts,
948.741 -> this is where we run most of
the services that make the bank
950.945 -> actually work and the customer experience
952.632 -> associated with it.
953.85 -> You have an account, we
need to track the balance
955.702 -> of that account somewhere you
956.899 -> have scheduled payments to
transfer some money to mom,
959.642 -> we need to run that somewhere.
961.344 -> And we also have savings accounts
which deliver some market
963.96 -> leading interest rates on them,
965.56 -> but they also need to pay into somewhere.
967.856 -> Now these two accounts that
I've just described have to be
970.886 -> highly available and have low
latency because when you're
973.148 -> trying to pay for a train
ticket home at 9:00 PM at night,
975.8 -> you don't wanna be waiting for
976.862 -> your bank cause it's not working.
978.534 -> However, to the right hand side of that,
980.315 -> we have the analytics
stack or data platform.
984.102 -> This data platform is allowing
us to measure whether or not
987.625 -> our market leading interest
rates are actually having impact
991.02 -> on the customer experience.
992.426 -> Are people actually enjoying
having more money paid into
994.831 -> their accounts at the end of the month?
996.282 -> Probably yes.
997.984 -> And then,
998.998 -> but then one of the things
there is that this doesn't need
1001.206 -> to be nearly as available
1002.664 -> or have as low latency
as the other systems.
1004.85 -> Yes, if they're not performing well
1006.642 -> or there's something wrong,
1007.89 -> it's gonna impact internal customers,
1009.885 -> internal users of that platform.
1011.802 -> But these customers are probably
gonna be more tolerant of
1014.458 -> things not working quite so well.
1016.402 -> And to the right hand side of that,
1017.642 -> we have our observability platform,
1019.413 -> which is collecting metrics
from all of the points
1021.224 -> throughout these other areas
into a single pane of glass,
1024.274 -> which lets us know whether
or not our internet edge,
1026.872 -> our core banking and our
element platforms are actually
1029.401 -> working as we expect them to.
1031.109 -> Linking all of these together,
1032.642 -> We have a set of shared services
that run message brokers,
1035.47 -> distribute artifacts and a
few other common things and
1038.32 -> that's on top of the JP Morgan
infrastructure backbone,
1041.59 -> which is a direct connect between
1043.006 -> the two networks that we have.
1046.179 -> Once we've started on some accounts,
1047.728 -> we end up working out what's
going to go into them.
1050.199 -> Again, AWS well architecture
framework comes into play here,
1053.603 -> but if you're looking at
this diagram you're probably
1055.256 -> thinking, oh that's pretty
simple, why am I here?
1057.432 -> And you can learn most of
what's in this in the associate
1060.683 -> cloud tradition of course,
1062.211 -> which I think you can actually get
1063.112 -> certified downstairs at the moment.
1065.276 -> We have public,
1066.109 -> private public subnet pairs
and we run all of our,
1068.85 -> our workloads inside the private subnets.
1070.97 -> We use managed services as
much as possible across these.
1074.173 -> For instance,
1075.092 -> we're running Kubernetes using
EKS and while we have several
1078.143 -> hundred EC two instances running
across all of our accounts,
1081.723 -> we only manage a number of these
1083.354 -> that's around about 10 or so.
1084.956 -> Like we do manage and use
manage services a lot here.
1089.799 -> We make heavy use of private
links as you can probably see
1092.108 -> going horizontally across here
to enable access for accounts
1095.295 -> to talk between each other.
1098.106 -> This is to allow us to manage
access control and it also
1101.928 -> means that we can create new
accounts and connectivity
1105.212 -> patterns without having to
worry about managing cyber page
1107.639 -> ranges.
1109.199 -> We can create as many accounts
as we want and connect them
1111.295 -> however we want without
planning ahead for this.
1113.61 -> And you can see even on the
right hand side of this diagram
1116.996 -> that we have a high
security area where the only
1119.999 -> connectivity is via private
subnet private link.
1123.773 -> And then for some areas we
don't even need to use direct
1127.034 -> connectivity through anything at all.
1128.746 -> We can utilize AWS cross
account access to mean that
1131.756 -> services can write data
directly into the source where
1134.676 -> it's needed.
1135.509 -> And we'll touch a little bit of that,
1136.922 -> how we do that for logging a bit later.
1139.348 -> Then in analytics and tool chain
and other kind of accounts,
1141.984 -> we need to have the ability
to let engineers show their
1145.188 -> accounts to people who are
actually on their workstations,
1147.391 -> which is where the transit
gateway and the backbone come
1149.898 -> into play to make this work.
1154.148 -> So we know that our success
to is gonna be through our
1157.863 -> ability to innovate.
1159.029 -> You know we're gonna
have competitive pressure
1160.745 -> that comes in,
1161.657 -> people are gonna be coming up with
1162.74 -> new banking products all the time.
1164.507 -> Our interest rate may not be quite
1165.715 -> so competitive as it once was,
1168.264 -> but we're also running in
a regulated industry and
1171.16 -> regulators are gonna be doing
things in the interest of
1174.304 -> people outside of us and
they're gonna be able to legally
1176.86 -> mandate us to do these things
and we're not gonna be able to
1179.136 -> say, you know what,
1179.969 -> we're just gonna deprioritize
that because we want to
1181.662 -> implement this feature further.
1183.155 -> We have to do it when they tell us to.
1184.997 -> And you can see that that
some of these regulations are
1186.629 -> starting to come into industries
outside of finance as well.
1189.035 -> You know,
1189.868 -> there's a lot of privacy related
things coming in in the EU.
1192.701 -> We have GDPR which requires
us to be able to display the
1196.437 -> data that we have on any given
customer and if they ask us
1199.088 -> to delete it and we have similar
things coming in around the
1201.923 -> world such as the California
Consumer Privacy Act,
1205.135 -> customers are expecting us to
deliver them new value and new
1207.893 -> features all the time while
regulators are expecting us to
1211.81 -> be compliant with them and
they don't really necessarily
1214.373 -> care which one you're trying
to do at the same time,
1216.835 -> this means that it's important
for us to be able to respond
1219.096 -> quickly across all of our
services and to go quick.
1222.929 -> To do this successfully, You don't need
1225.25 -> organizational design.
1226.41 -> Creating sympathy between
people, processes,
1228.634 -> culture and people all across
the organization and teams.
1233.244 -> There's one thing that you need to do,
1234.622 -> even if you're doing all of
those things perfectly and that
1236.94 -> thing is actually build your code fast.
1239.293 -> If he tries to build a banking assembly,
1241.007 -> you're probably not gonna
get there very quickly.
1243.225 -> And in some cases microservices
might be a little bit closer
1245.906 -> to assembly than a lot lot of
engineers would like to admit.
1248.866 -> So when we turn to the contact center,
1251.3 -> which is something that Paul
was describing before as one of
1253.57 -> our key brand differentiators
and important aspects of what
1256.647 -> we deliver our customers.
1258.319 -> Remember we're a digital only bank,
1259.991 -> so there's only two ways people
can interact with us through
1262.308 -> the app or by calling us.
1263.927 -> So this is really important
for us to get right.
1266.132 -> Connect allowed us to build this,
1267.791 -> this channel in a really simple way.
1269.738 -> If you look at this diagram,
1271.239 -> you can only see that connect
is actually only one of these
1273.722 -> icons on here,
1274.859 -> but the rest of them are of
the AWS resource times and the
1278.011 -> power really comes in to
the ability to integrate,
1280.721 -> connect into this ecosystem.
1284.084 -> One of the things that we do
for authentication when people
1287.063 -> call in is a feature which
we call collect to call.
1290.596 -> I'm sure that you've had an
experience where you've rung up
1292.831 -> your bank trying to answer
a relatively simple question
1294.956 -> about your banking profile
and they actually ask you
1296.972 -> something more complex
to prove who you are.
1298.994 -> So they might say, ah,
1300.301 -> can you tell us how much you spent on
1301.759 -> the Tuesday of last week?
1303.098 -> Or how much did you have to
pay a your credit card at the
1305.034 -> end of the month because we've
got you in the app and the
1307.378 -> biometrics associated, we
already know who you are.
1310.498 -> And so we don't need
to ask these questions.
1312.573 -> What we do is when you click the button
1314.031 -> in the app to call us,
1315.275 -> we log into our IDP,
create a short lived token,
1318.541 -> and then once the call is
connected to the contact center,
1321.13 -> we send this short lived token via DTMF,
1324.687 -> which is the dial tone
modulated frequency.
1326.93 -> And those are the sounds that
happen when you press the keys
1328.754 -> on your keyboard.
1329.842 -> Once the call is connected
and those are sent,
1331.975 -> AWS Connect is then able to
interpret those call into our
1335.119 -> RDP and it knows who we are
within a matter of milliseconds.
1338.399 -> This means that once you've
completed the IVR flow and
1340.946 -> you're talking to a real agent,
1342.415 -> they are able to have all of
the information about you in
1345.154 -> front of them and we're
able to see that's relevant,
1347.61 -> including if you've been
using our in-app chat feature.
1350.66 -> But we don't have to stop there.
1352.753 -> Once we know who you are
at the start of this flow,
1355.398 -> we can dynamically change the
IVR flow that you are taking.
1358.954 -> One of the common kind of
cases that you have on this is
1361.367 -> when you have a fraud alert.
1363.369 -> And that is if you've made
a purchase that's slightly
1365.595 -> outside of your normal spending pattern,
1367.588 -> the bank will sometimes send
you a message or give you a
1369.419 -> call and say, Hey, was this
transaction actually you,
1373.216 -> in our case when someone
calls up about this thing,
1376.226 -> we are able to look into your account,
1378.714 -> determine do you have any
fraud alerts pending and then
1381.875 -> direct you into a different
IVR flow which will connect you
1384.452 -> directly to the fraud
specialist for you to have that
1386.445 -> conversation with them directly.
1388.339 -> And other situations you might have to
1389.763 -> go through an IVR flow,
1391.112 -> get connected to an everyday
banking consultant who will
1393.417 -> then transfer to your fraud specialist.
1395.651 -> But obviously we don't always
get this heuristic right and
1398.531 -> sometimes we will connect
people who aren't calling about
1401.102 -> fraud to a fraud specialist.
1402.707 -> In this case, the fraud specialists have
1404.499 -> also been trained to handle most cases
1406.652 -> and they can probably help
without having to transfer,
1408.718 -> but they can still do that
transfer if they need to.
1411.769 -> And because the call center
is one of the the key
1415.494 -> differentiators for our our proposition,
1418.443 -> we knew that we had to
start training everyone
1420.401 -> on how this will work
1421.919 -> and build our the muscle for this early.
1424.759 -> We started doing this in around
about early 2020 in fact,
1428.14 -> which as a number of customer
companies experienced,
1431.148 -> brought with us working from
home requirements and was a
1435.036 -> little bit of a surprise to us
and it was no different with
1437.132 -> our contact center and anyone
who's tried to listen to a
1438.324 -> YouTube video from their remote
desktop through their home
1443.862 -> speakers probably knows
where this is going.
1446.951 -> Turns out when you're trying
to connect a soft phone through
1449.685 -> a remote desktop to another cloud service,
1451.652 -> voice quality drops a lot.
1454.238 -> We were able to modify our flow,
1456.591 -> create a new AWS Cognito pool
to allow call center agents to
1460.83 -> log in from their Chromebook,
1463.108 -> which we distributed to them
so they could work from home.
1465.513 -> They were able to log in that
and then connect directly to
1468.932 -> Connect without having to
use their remote desktop.
1471.916 -> They could then open their remote desktop,
1473.649 -> use their normal applications
on network and then when a
1476.097 -> call was received from
this mock soft phone,
1478.826 -> it would go directly from
Connect to their Chromebook and
1481.442 -> this dramatically increased the quality
1483.525 -> of the voice calls as we know.
1486.226 -> We knew this quantitatively
because we are disseminating
1488.508 -> a lot of our information from
1490.101 -> the contact center through
to our observability stack.
1493.213 -> This includes the number of
soft tokens that we've issued,
1495.611 -> the number that have failed to
resolve how many calls we've
1497.917 -> routed intelligently customer
and agent transaction records
1502.418 -> and we've even included
some code into our software.
1505.381 -> We can track the amount
of background noise
1506.214 -> and the packet lost in the call itself.
1509.892 -> By using Connect,
1511.013 -> We've been able to easily
deliver a great customer
1513.488 -> experience to our users and
some innovative features that
1516.277 -> you won't see anywhere else.
1517.523 -> And we've been able to do this with
1518.731 -> minimal engineering effort.
1522.204 -> But Amazon hasn't been able
to turn everything into a
1525.07 -> managed service just yet.
1528.028 -> You know,
1528.861 -> we do a lot of things with
microservices and we currently
1530.909 -> are running several hundred
of these in production today.
1533.931 -> When when I was starting
my work at JP Morgan,
1537.277 -> I was going through the interview process,
1539.002 -> I was talking to an engineer
called Alex and he was,
1540.867 -> you know, telling me about the
project, how was Greenfield,
1543.395 -> how the culture was great,
how we were a startup,
1545.207 -> cloud native, all these great things.
1547.304 -> And one of the things that
he mentioned was that we were
1549.237 -> building a microservice
architecture and being a developer
1552.513 -> microservice architectures is something
1553.721 -> that I'm quite keen on.
1555.274 -> And I did ask him,
1556.138 -> why are you doing
microservice architectures?
1558.394 -> There's a lot of things where
a monolithic application can
1561.021 -> probably serve you just as
well and putting a network call
1563.429 -> in between those things isn't gonna
1564.68 -> make things better for you.
1566.104 -> But Alex was able to answer
me in a somewhat paraphrased
1569.635 -> thing here saying that we're
using microservices because
1573.039 -> it's gonna empower our teams
to operate independently with
1575.854 -> flexibility,
1576.896 -> but will also give us
consistency where it counts.
1580.851 -> When we're talking about a microservice,
1582.683 -> there's a lot more that goes
into them than just code.
1585.278 -> And sometimes we might
think about the jar file
1587.736 -> or a docker image said
this is a microservice,
1589.977 -> but really it's a lot more than that.
1591.389 -> And we wanna make sure
that we're giving teams the
1593.131 -> flexibility and empowering
them to make the choices that
1595.335 -> they want to do.
1596.552 -> They're able to use open source
projects where they want,
1599.069 -> they're able to use vendor
products if they they want to as
1603.389 -> well and if they want to go
through the hassle of doing a
1606.085 -> commercial arrangement with them,
1607.699 -> which can't always be
the easiest thing to do,
1609.501 -> we don't mandate any frameworks.
1611.192 -> We all slightly customized
proprietary ones,
1615.828 -> and we let microservice
teams use cloud services
1619.362 -> in their microservice.
1620.801 -> For instance, if you're using Redis,
1623.21 -> you might want use a managed
version of that through
1624.934 -> ElastiCache.
1625.83 -> You might want to use S3 KMS
or you might even want to go
1628.637 -> off and do something a
little bit more crazy
1630.47 -> and use something like Ground Station.
1633.803 -> As a platform team is not our
1635.627 -> job to choose what people
are using and it's our
1637.702 -> responsibility to empower them to do that.
1640.265 -> But these cloud resources and
infrastructures code that goes
1643.582 -> with them are part of the
microservices themselves.
1645.955 -> This means that we have to
have the code next to the
1648.232 -> microservice and not in a
centralized cloud repository
1651.603 -> somewhere.
1653.174 -> We do have some common
interfaces that microservices are
1655.029 -> expected to implement,
1657.355 -> such as structured logging
Prometheus for metrics and there
1660.091 -> are some API standards
around this as well.
1662.366 -> We request that everyone uses swagger to
1664.574 -> define their rest APIs.
1666.32 -> ARO for event based APIs.
1668.286 -> And the reason for this is
why we do wanna make sure that
1671.326 -> teams have flexibility
to build what they want,
1673.794 -> that we are creating a
cohesive ecosystem overall.
1676.462 -> It makes a lot of sense that
people can do what they want in
1678.895 -> their domain context, but they,
1680.198 -> the power of microservices
is in the relationships
1681.031 -> that are built with other teams.
1685.926 -> Along with this,
1686.792 -> Paul mentioned before that
we have a you build it,
1688.808 -> you run it mindset.
1689.959 -> This means that teams are
responsible for owning everything
1692.038 -> to do with their
microservice, the database,
1694.24 -> the schema of that data
associated with it.
1697.478 -> So as as a platform team,
1699.391 -> we're empowering teams to to
choose what data they want use
1702.954 -> and how to store it.
1703.834 -> They could, if they wanted to run Postgres
1706.029 -> on their own pods and manage
it, make all the backups,
1708.727 -> do all the compliance and
all of that sort of things,
1711.212 -> they might take a step away
and use RDS and configure that
1713.77 -> directly.
1714.677 -> Or they could use one of the
paved highways that we have as
1718.221 -> a platform team and they could
use a Kubernetes resource
1720.477 -> that goes off and creates
it all scaffold for them.
1722.845 -> And they know everything is
compliant but the power is in
1725.119 -> their hands and they're able to choose.
1728.827 -> As part of Build It and New Runner,
1730.568 -> they're also responsible for
the SLAs associated with that
1733.106 -> code base.
1733.971 -> So we've just gone through
Black Friday and Cyber Monday,
1736.692 -> it's the team's responsibility
who are operating these
1738.935 -> services to be collecting the metrics,
1740.626 -> to know what's expected of
their service when things launch
1743.767 -> and to make sure they're
able to cope with this.
1746.316 -> And having gone through this event,
1748.73 -> we went through it perfectly fine.
1749.567 -> A couple of teams made some
adjustments to the scaling but
1751.71 -> they were able to do that fine.
1754.082 -> With our microservice
architecture that we have,
1756.239 -> We've empowered our teams
to make their own choices,
1757.916 -> their own flexibility because
they're able to use open
1760.383 -> source technologies,
1761.38 -> they can come in and they
can use their industry wide
1763.756 -> experience in our teams.
1765.799 -> And because of a couple of small
interfaces that we request,
1769.906 -> there's a cohesive ecosystem across
1771.535 -> Chase International as a whole.
1775.108 -> But you have to build
these microservices somehow
1777.191 -> and then you have to manage them.
1779.074 -> If you are building a monolith
and you had 200 hundred
1781.146 -> engineers working on it,
1782.426 -> it might make a lot of sense
to just have five people who
1784.771 -> were there just tweaking the knobs,
1786.142 -> making sure everything kept running fine,
1788.123 -> deploying it when there was a
change that needed to be made,
1790.009 -> et cetera.
1791.075 -> But when you have 30 teams,
seven engineers on it,
1793.791 -> it doesn't make quite so much sense
1795.785 -> to have five of those
engineers doing the same
1797.337 -> thing.
1798.395 -> So you need to rely on automation
1800.353 -> for them obviously.
1802.235 -> When we're creating a microservice,
1804.072 -> we start off with what we call starters.
1806.6 -> These starters are kind
of prepackaged recipes
1809.808 -> for creating a service.
1811.325 -> They're not going to stick there,
1813.164 -> they're not going to long live
1814.497 -> and you don't have to fit into them.
1816.038 -> They're a library rather than a framework.
1817.985 -> These starters are actually managed
1819.193 -> by the microservice teams themselves.
1821.436 -> They go into a centralized area
1822.581 -> and they're evolved in
an inner sourced model.
1825.664 -> The platform team at imark,
1827.914 -> we contribute to these as
we would any other service,
1830.263 -> but it's important that
we are not the ones
1831.971 -> mandating what those are.
1833.495 -> When we're talking about what
the best practices are for a
1835.612 -> Java service,
1836.807 -> a bunch of go developers
aren't really gonna be able to
1838.7 -> answer that question well.
1841.692 -> We also use tongue charts and
terraform modules and in these
1843.149 -> areas it's still the the
microservice teams who are
1847.698 -> responsible for these as a platform team.
1849.723 -> And because we're working with
these technologies a lot more
1852.373 -> often, we provide more input into these,
1854.194 -> but it's a consulting role,
1855.738 -> not one of ownership and control.
1858.474 -> We again enabling our teams
to make their own decisions.
1861.786 -> And when we get there, it's
interesting that on the slide,
1864.754 -> that's one part there that
happens once per service.
1867.988 -> So we don't need to scale that that much,
1869.73 -> but everything else happens many times.
1871.965 -> And when we started Chase International,
1873.714 -> we started with the stretch
goal of a thousand releases per
1876.557 -> day, which is quite a lot,
1879.498 -> particularly since we had no
services to release at all.
1881.752 -> So we couldn't manage that.
1883.453 -> And to be honest, we're still
not close to that today.
1885.789 -> But what that statement does
is it focuses your mindset
1889.304 -> and thinking towards what are
the things that we need to do
1891.548 -> to enable that if we are able to do it,
1894.16 -> it means that we can't do things
like have manual processes
1896.664 -> in the middle that's slowing things down.
1898.538 -> We can't have an approval
that takes two days.
1900.528 -> There's no way you're gonna get that.
1902.281 -> But in a regulated
environment such as we are,
1904.608 -> we need to have confidence
that the things that are going
1906.272 -> into production are what we are need.
1908.736 -> So we make sure that we
have repeatable builds
1911.694 -> cause we need to know
what's in production.
1914.101 -> And if your build isn't repeatable,
1915.378 -> what you're gonna find is
that it works fine and that at
1918.003 -> some point you're gonna
have to come back and go,
1920.149 -> actually what was in that
artifact that I just deployed?
1923.352 -> It's gonna slow you down.
1925.794 -> And the same goes for the artifacts that
1927.002 -> are produced out of these builds.
1928.645 -> They need to remain immutable.
1930.821 -> Immutability sounds relatively
simple when you're creating a
1933.264 -> docker image or something like that,
1935.33 -> but it can come in in an interesting ways
1937.629 -> and you need to think about this.
1939.08 -> So if you're deploying
something using Terraform,
1941.219 -> you need to look at the modules
that your Terraform module
1943.296 -> is referencing.
1944.224 -> Are you pinning them to an exact version
1945.682 -> or are you using an approximate version.
1949.259 -> And you're deploying things by a pipeline?
1951.724 -> And if in there you have some templating
1953.682 -> and injecting environments
from your pipeline,
1956.024 -> do you know what version of
your pipeline ran and what the
1958.43 -> value of those variable
variable was at the time that
1960.243 -> something went into production?
1961.681 -> And will you be confident
about that in a year's time?
1965.667 -> To enable all this,
1966.798 -> you need to have good quality metrics
1968.006 -> about what you're deploying.
1969.609 -> I don't necessarily mean
Prometheus metrics here,
1972.196 -> which are telling us how many
transactions per second hour
1975.156 -> service is getting, but
we do need those as well.
1977.623 -> But instead I'm talking
about the code quality.
1980.422 -> What is cyclomatic complexity of the code?
1982.966 -> What is the test coverage?
1984.385 -> If you're using that or something?
1985.782 -> What are the supply chain and
static analysis results when
1988.372 -> you've been degree to scanning this?
1990.455 -> What are the results of
the integration load tests?
1993.336 -> Are you doing chaos testing,
1995.47 -> which we've got as part of this as well?
1997.343 -> And then once you've deployed
1998.583 -> to your production environment,
2000.275 -> you can still do more things
and cover that such as smoke
2002.477 -> test and canary releases.
2004.734 -> Now
2016.684 -> The last thing on this step
is once you've deployed,
2019.779 -> you need to have the ability
to automatically roll back if
2022.344 -> you find something has gone wrong.
2024.278 -> A lot of engineers that I've
talked to in the past have
2026.03 -> said, you know, our,
2027.777 -> our strategy when we release
something bad is to fix Ford.
2031.022 -> And to me that's,
2031.855 -> that's an incredibly scary
concept and that's because when I
2035.592 -> talk to people about
estimating a piece of work,
2038.161 -> they often use things like
perfect day estimates or T-shirt
2041.182 -> sizing or something like this.
2043.263 -> And these are assuming that
you know the problem out front.
2045.571 -> Now imagine that you're trying to
2046.779 -> solve this problem at 2:00 AM.
2048.164 -> You've got a whole bunch
of stress because people
2050.28 -> can't buy buy their tickets home
2052.392 -> and you need to solve it quickly.
2054.67 -> Instead,
2055.503 -> the better option in most of
the cases is to roll back to
2058.162 -> your last known good version.
2059.937 -> And you need to have this ability,
2062.137 -> and you'll be noticing that
I've been talking a lot about
2063.811 -> deployments, but there's
another element here,
2067.054 -> which is that a deployment
doesn't mean that anyone's using
2069.774 -> your service.
2070.757 -> You know,
2071.59 -> we've got a few people
around here who are selling,
2073.355 -> you know, vendor products
to help us deploy
2076.623 -> and then release separately.
2078.467 -> And deployment is an engineering practice.
2080.622 -> Releasing is a business practice
and you need to separate
2083.289 -> these two things and you can
deploy a thousand times a day
2086.401 -> and release once a week.
2088.977 -> And in summary,
2089.814 -> we have several hundred
microservices that we release daily.
2095.43 -> So what does this look for?
2096.778 -> An engineer on the Chase
team on the left hand side,
2099.398 -> we have a fairly standard
home chart and I hope that
2101.875 -> everyone in the back can read that,
2103.598 -> but it wouldn't be a technology
kind of discussion if we
2106.578 -> didn't put some small font on
there and hope everyone can
2108.205 -> see it.
2109.443 -> And this helm chart is for an
awesome and amazing service
2113.161 -> that's been written by amazing team.
2115.326 -> You can see here that it has
a dependency of the awesome
2117.852 -> team starter that is,
2119.454 -> that the awesome team has
decided this is what they want
2122.353 -> their services to look like.
2123.614 -> This is how they get reuse
across all of the services that
2125.989 -> they're building.
2128.054 -> It might be that that awesome
start that they've used
2130.187 -> references,
2131.386 -> the the Chase international starter
2133.344 -> or they might have gone
there complete own way.
2135.49 -> Inside there, we can expect it to
2137.011 -> be defining all the resources
that we expect to make a
2138.928 -> service run on Kubernetes
deployment service,
2141.97 -> auto scaling policies,
authorization policies, et cetera.
2145.716 -> On the right hand side you
can see a definition or an
2148.415 -> extract of our pipelines as code.
2150.679 -> You can see here we're defined
an environment specific
2153.073 -> pipeline that this is gonna
be reused across the different
2155.808 -> environments.
2157.281 -> What it does is it sequentially
applies some terraform does
2160.349 -> a helm apply and then runs a smoke test.
2163.321 -> And you'll notice that this
looks like a programming
2165.253 -> language, in this case
it's Java, JavaScript.
2168.444 -> And the reason for that is
quite important and that is that
2171.001 -> we've empowered our engineers
to use technology that
2174.215 -> they want to use and to
evolve the ecosystem.
2177.367 -> This wasn't produced
as part of a platform.
2180.519 -> This was produced by an engineer
who wanted to change the
2182.73 -> way that they were releasing.
2185.52 -> You can see here that we're using some,
2186.92 -> some variables that are coming
from the pipeline and some
2189.408 -> that are presumably coming
from a package that is being
2192.137 -> reused across them.
2193.777 -> And I think that one of the
important things here with
2195.892 -> having this pipeline that's
been defined in code and by
2200.026 -> engineers is that they were
able to make that choice
2203.063 -> themselves.
2204.935 -> But how do we let them do that?
2207.361 -> Well,
2208.194 -> we need to have some guardrails
in place to make sure that
2209.937 -> things are safe right.
2211.895 -> Here we have a a guardrail that
2214.103 -> prevents engineers from
deploying docker images that are
2217.58 -> coming from any random
source. In this case,
2219.9 -> we only allow people to deploy
from our internal repository
2222.841 -> and from ECR itself.
2225.346 -> And those are the actual
EKS AMIs in that bottom one.
2230.208 -> Now guardrails I think are a
really interesting thing in our
2233.389 -> space. It can be really easy
to, to go too far with them.
2237.862 -> And an analogy I'd like to use,
2239.262 -> which I don't really like
analogies in general,
2241.104 -> but imagine that you're
driving down the highway,
2243.451 -> you're doing about 60 miles
an hour, there's three lanes,
2245.764 -> you can pick which one you
want and somewhere over to the
2247.889 -> side you've got a guard rail
that you probably haven't even
2250.448 -> noticed and you're able to
drive down that 60 miles fine.
2253.838 -> You know what you're doing, you're fine.
2255.643 -> Now imagine that we start bringing
2256.476 -> these guardrails in closer.
2258.214 -> Imagine we put it right at
the edge of the highway.
2260.251 -> Do you think you'd still
be doing 60 or maybe 55?
2262.768 -> And if we keep bringing
it in closer and closer,
2265.462 -> imagine that the car just
fits in between them.
2266.931 -> You can't even open the doors.
2268.443 -> You're probably not gonna be
going much more than five miles
2270.283 -> an hour at that point.
2272.104 -> So guard rails are there.
2273.624 -> In the case of a catastrophic era,
2276.118 -> they're not there to make sure
that you do things exactly
2278.032 -> the same way or consistently
or to keep you on the path.
2280.695 -> Exactly, you know?
2283.848 -> And when we're doing
engineering guardrails,
2285.31 -> we need to think the same thing.
2286.526 -> We need to trust our engineers.
2287.683 -> We're hiring great people and
we need to make sure that they
2290.142 -> have the ability to go
where they need to go.
2294.061 -> In this case of this guardrail
that we have here, you know,
2297.057 -> people aren't gonna
hit into it very often.
2299.334 -> Doesn't mean that they haven't.
2300.76 -> And it's just something that happens
2302.528 -> as part of that process.
2304.36 -> Now we can use these guard rails
2306.592 -> and keep them to the right hand side.
2308.082 -> I think an important bit
here is that this guard rail,
2311.285 -> we shift things to the right
hand side with these rails.
2314.92 -> And that's a bit different
because a lot of what we hear
2317.234 -> today is we need to shift everything left.
2319.792 -> What we need to do is shift feedback left,
2322.086 -> not necessarily controls.
2323.89 -> If we were to shift this guard
rail to the left hand side,
2326.154 -> we wouldn't be allowing
engineers to use DACA hub or
2329.789 -> anything like that for the development.
2331.522 -> Take it to an extreme case
with security scanning.
2333.864 -> If someone wrote some code
that had a buffer overrun in it
2336.796 -> and they weren't able to run
that to see what the APR was
2339.202 -> doing, it'd be a terrible experience.
2341.167 -> What's much better is to say
you've written some code,
2343.541 -> it's got a buffer overflow
and you're vulnerable to it.
2346.643 -> But to not allow that
to run in production,
2348.785 -> we wanna shift that as far
to the right as we can.
2351.606 -> So we want protect what
we really care about.
2354.897 -> And by doing so,
2355.982 -> we also wanna make sure
that people can go fast.
2360.161 -> So that's a little bit about
preventative controls now,
2363.684 -> but here we come into what happens when
2366.176 -> you're running services,
2367.393 -> things will go wrong in
production and they're not
2369.259 -> necessarily things that you
can think of ahead of time.
2371.902 -> They're gonna be things
that go out and so on.
2374.531 -> So we need to be able
to see what's happening.
2376.517 -> Now for our logging stack,
which you can see on here,
2378.097 -> we've chosen to make sure
that this is as reliable and
2382.888 -> available as possible. And
to do that with, again,
2385.67 -> turn to manage services,
2386.777 -> you can see here we have a
number of log sources all feeding
2389.504 -> into CloudWatch,
2390.665 -> which then get synced
into S3 via subscription.
2393.913 -> And we can also see that our
containers are using fluent bit
2397.084 -> to push to s3.
2398.523 -> What happens here is that
with all of these sources,
2403.809 -> we're collecting them into
a single place that's highly
2405.62 -> available and resilient.
2410.489 -> Now in this bucket we grew
up something that's slightly
2412.881 -> different than what a
normal pattern would be.
2415.158 -> We care a lot more about who
can read this bucket than who
2418.11 -> can write to this bucket.
2419.417 -> We have a lot of sources all
sending logs in and if they put
2421.701 -> something in there that
we're not expecting,
2423.587 -> we can probably deal with
it on the right hand side,
2425.529 -> but we're secretly implementing
another guard rail in here.
2428.873 -> And that's because we
might end up logging PII
2431.514 -> in any number of ways.
2432.834 -> A team might enable a debug
flag and production to try and
2436.273 -> debug and understand what's
happening in their service.
2438.284 -> And that might inherently
enable request logging.
2440.967 -> And you might get
request headers in there,
2442.597 -> which might contain stuff
we don't wanna share people.
2444.69 -> It might be that an object has
ended up in a stack trace and
2447.596 -> that's logged in,
2448.429 -> or maybe someone's just logged an object
2449.887 -> that has it in there,
2451.258 -> but we can't let that go in
there and we need to be able to
2453.117 -> support cleansing, which is
where Log Guardian comes in.
2456.684 -> Log Guardians is a tool
that was written by the
2458.186 -> observability platform team
whose draw and job it is,
2461.548 -> is to sanitize this PII data
out before it gets into the
2464.042 -> hands of any operators.
2467.704 -> So Log Guardian is subscribing to SQSs.
2470.083 -> which are triggered by S3
event notification topics.
2472.82 -> It pulls those objects in, sanitizes them,
2476.153 -> and then pushes these into OpenSearch.
2479.536 -> This gives us a couple
of benefits, actually.
2481.519 -> One, this is now a pool based system.
2483.665 -> If OpenSearch is under heavy
load for whatever reason,
2486.051 -> all that's gonna happen is Log
Guardian is gonna slow down,
2489.273 -> stop pushing things into
OpenSearch and it's gonna slow down
2492.14 -> on how fast it consumes things.
2493.964 -> All that's gonna happen here
is that our log messages will
2496.399 -> be slightly delayed getting
into our OpenSearch and
2499.079 -> container instances. They won't be lost.
2501.14 -> And eventually once the
OpenSearch cluster becomes more
2503.847 -> available, they'll make their way through.
2505.983 -> Secondly,
2507.09 -> it also means that we're
able to scrub our indexes.
2510.756 -> So Log Guardian is built using
a series of filters that look
2516.251 -> for known patterns of PII. Can't
just get rid of everything.
2519.481 -> Well then the logging system
wouldn't be very useful.
2521.856 -> So what happens is sometimes
there'll be a new form of PI
2525.535 -> that comes in and Log Guardian
doesn't actually know that
2527.375 -> it's PII, so it makes it way
through to our OpenSearch.
2530.035 -> With this model,
2531.179 -> we're able to purge all
the OpenSearch indexes,
2534.938 -> update the Log Guardian rules so that
2536.271 -> it filters everything out
2537.979 -> and then republish the messages onto SQS,
2540.842 -> causing Log Guardian to
pull all the logs again,
2542.816 -> run the updated rule set
and then publish them back.
2546.42 -> So in this way,
2547.596 -> we're able to respond to developer
errors or changes in PII.
2553.343 -> Fine.
2556.107 -> Behind this we have some metrics.
2557.938 -> So we need to know what's
happening on these systems and how
2560.335 -> they all work and how
they're performing. You know,
2563.01 -> we talked about the internet
edge and the latency
2565.388 -> requirements.
2566.564 -> So what happens is services
are generating from this
2568.689 -> metrics, they just play those
onto different ports as they,
2572.596 -> as they determine. So Java
services might do port 1990,
2575.924 -> infrastructure services might do six 90,
2579.242 -> and then who knows what else. So again,
2581.681 -> we make sure that we are
empowering engineers by using
2584.62 -> Prometheus operator and letting
engineering teams who are
2586.783 -> responsible for the services,
defining how that those,
2589.178 -> those metrics are scraped.
2591.119 -> Some cases these services
might need a resolution of once
2594.322 -> every second to be useful,
and in other cases,
2596.159 -> once every five minutes might be enough.
2598.617 -> Again, this is in the control
2599.663 -> of the team who's running the service.
2602.302 -> These Prometheus instances
actually running in the clusters
2604.857 -> that are next to the services.
2606.551 -> So you need to be able to log
into the cluster and find it,
2609.58 -> which can be a little bit of a pain,
2611.088 -> particularly over a kind of
a distributor system running
2613.485 -> many, many, many Kubernetes clusters.
2615.943 -> So to make this a bit easier, we have a,
2618.783 -> a centralized Thanos instance.
2620.899 -> And this Thanos instance is
federating all of the Prometheus
2623.183 -> instances that we know about.
2624.743 -> It's collecting them at a
lower resolution and archiving
2626.987 -> that data and then displaying
that into a single pane of
2629.353 -> glass that everyone can use to
get kind of a decent picture
2632.375 -> of what's happening over the entire state.
2635.276 -> And finally,
2636.332 -> another part of observability is tracing.
2638.785 -> What these logging and metrics
2641.009 -> are great for getting a high level
2642.119 -> view of what's going on.
2643.879 -> You can set up alerts on metrics,
2645.287 -> you know, our response rate has changed,
2647.896 -> but what happens when a
single thing goes wrong?
2650.1 -> So tracing,
2651.167 -> we're able to instrument a
request at the moment it enters
2654.49 -> our system and follow that
request all the way through the
2656.894 -> system that's managing.
2659.116 -> At the moment,
2659.949 -> We sample 100% of these traces
and we do this because we
2662.9 -> don't know which request we're
gonna want to know more data
2665.497 -> on and what to find. We do
this using Yaya, open tracing.
2670.67 -> We're looking into adot for
Amazon managed services and we
2674.75 -> store this again into OpenSearch.
2678.104 -> So we've been able to prevent
some mistakes with guardrails.
2681.086 -> We've been able to get visibility
into what's going on with
2683.35 -> some observability and
detective. And you know,
2686.824 -> as Verna said, you know,
2688.595 -> everything's going wrong all the time.
2690.486 -> So I pass you back to Paul,
2691.967 -> he'll tell you a little bit
about bit more about what we do.
2694.324 -> And things do get wrong.
2704.647 -> - I mean, this is great, right?
2705.631 -> Because I'm just gonna talk
about event response and what
2708.773 -> happens when something goes wrong.
2710.453 -> I think the gods are smiling on me.
2712.397 -> So as Courtney already
discussed, we kind of run,
2717.053 -> you build it, you run it.
2719.41 -> And we do this for very
good reasons, right?
2722.863 -> As we said, Verna Vough
was once famous he said,
2724.956 -> everything fails all the time.
2726.246 -> You have to kind of deal with it. And we,
2727.645 -> despite our best efforts and you know,
2729.459 -> everything that Courtney's
been talking about
2731.866 -> instance still happen.
2733.766 -> And we believe that the fastest way
2735.816 -> to shut down instant and to
ensure that we're kind of
2737.86 -> maintaining that kind of
fantastic customer experience we
2740.165 -> talked about earlier
about how we build trust,
2742.711 -> is to have the experts, the
people that build the service,
2744.743 -> deal with the alerts. And so
that's exactly what we do.
2748.098 -> If you build a service,
you own that service.
2750.234 -> If the service alerts at
2:00 AM in the morning on a
2753.089 -> Saturday night,
2753.922 -> you get the alert 2:00
AM on a Saturday night.
2756.546 -> And it's incredible once you
start putting your engineers on
2759.375 -> call, how quickly your
quality of services improves.
2762.935 -> So I can, I can highly recommend it.
2765.084 -> However,
2765.917 -> we work in a a very highly
regulated environment as we
2768.54 -> previously discussed.
2770.391 -> And you know, at the point,
2772.783 -> an instant happens and it
escalates beyond what an engineer
2775.775 -> could just kind of shut
down very quickly by,
2777.042 -> by dealing with alerts.
2778.407 -> We need to bring in an
instant management team
2780.294 -> to deal with that.
2781.603 -> We have many stakeholders that
need to be communicated to
2784.142 -> both externally and externally.
Our regulator, the PRA,
2786.771 -> the FCA, these people want to,
2788.719 -> to understand what's going on. And so we,
2790.298 -> we have a major instant
management team. Now,
2792.574 -> the job of the major instant
management team is not to fix
2795.407 -> the problem.
2796.754 -> The job of the major instant
management team is to
2798.978 -> coordinate the response and
what in coordinating that
2801.838 -> response,
2803.028 -> what they're doing is they're
freeing up the engineering
2805.695 -> resource that we have to really
focus on what the problem is
2808.967 -> and to make sure that the problem
is shut down as quickly as
2812.442 -> possible. And then once the
problem has been shut down,
2814.756 -> it's the job of the
instance management team
2816.322 -> to make sure that the postmortems are
2817.871 -> run the problem tickets are raised,
2819.653 -> and then crucially the
problem tickets are shut down.
2822.685 -> We have a very kind of
strict mindset on this one.
2826.403 -> When we identify a problem,
2827.936 -> we don't just fix the problem
where it was what we ask,
2830.615 -> we step back and we ask
ourselves a question.
2832.213 -> Are there any other classes?
2834.549 -> Are there any other places
where this could happen?
2837.549 -> What is similar to this?
2839.613 -> And we read across the estate
horizontally and we say,
2841.696 -> well if it could happen
here, could it happen there?
2843.8 -> And we tend to broaden out our
search quite a lot to try and
2846.528 -> find out whether or not an
instant could happen elsewhere.
2850.827 -> And what that allows us to
do over time and you know,
2853.313 -> is to improve the quality of
the platform for everybody
2856.083 -> instance, are gonna happen anyway.
2858.247 -> It's really how well you
coordinate your response to it.
2862.241 -> And you know, we we're just
like everybody else, right?
2864.752 -> We, we, we take best practice
where we can find it.
2867.235 -> We kind of, we model ourselves roughly
2869.286 -> on how kind of instant
responders happen to kind of real
2873.277 -> world instance.
2874.115 -> So kind of your fire brigades
and your whatnot who are
2876.472 -> dealing with disasters, how
do they organize themselves?
2879.248 -> We tend to align along that.
2881.165 -> But the important thing here
is have your expertise manage
2884.052 -> the alerts and get them on the
call as quickly as possible.
2889.185 -> So we've talked a lot,
2890.317 -> we've talked about the culture
We've talked about how we
2891.877 -> build things, we've gone
into detail about about many,
2895.233 -> many of the elements of the platform and,
2897.437 -> and hopefully there's a lot
there for you to think about.
2900.476 -> I could, I could read out all
the bullet points on this,
2903.381 -> but it's probably as boring
for you as it would be for me.
2906.923 -> So what would be easier would
be for me to kind of just call
2909.483 -> out the things that, you know,
2910.457 -> if you were thinking about
doing this yourself are the most
2912.49 -> important.
2914.467 -> So for me it's about
building the tooling, right?
2916.64 -> We keep talking about hiring
great engineers and creating
2918.51 -> value for customers and
building a bank that people can
2921.307 -> trust.
2922.766 -> To do that,
2923.599 -> we have to have an extensive
amount of tooling to automate
2926.577 -> how we build, test, deploy,
2927.843 -> and operate software and how
we do that within the control
2929.759 -> environment we want.
2931.169 -> And how we do that in a
way which hands autonomy to
2932.665 -> engineers.
2934.789 -> So do that, start off with that.
2936.263 -> But you don't have to build
the whole goal plated thing in
2939.004 -> the first go.
2939.852 -> You have to build the first
version of it that will support
2941.553 -> what you need. And as you
iterate your platform,
2944.333 -> iterate your services so you
iterate your tool chains as
2947.57 -> well.
2948.403 -> And so all those things
kind of moved together.
2950.861 -> When you're doing that
optimize for self-service.
2953.578 -> You don't wanna create
bottlenecks where people are con
2956.224 -> constantly blocked waiting for
someone else to do something
2959.086 -> on their behalf as much as possible.
2960.413 -> What you wanna do is kind of
that low power distancing you
2963.776 -> want kind of push responsibility
to the the people that are
2966.133 -> Writing the code and getting things done.
2968.802 -> And then thirdly, the reason
we do this is autonomy.
2972.185 -> We want to go out and hire the
best engineers that we could
2974.659 -> possibly find.
2975.93 -> We want to bring them in and
we want to give them every
2978.033 -> opportunity to create great
experiences for our customers.
2981.064 -> And if that is what we want
to achieve, that autonomy,
2984.8 -> we need to bring those people
in and then we need to get out
2987.066 -> of their way because by bringing
them in and getting them
2989.68 -> out of their way allows them
to do their work and to create
2991.794 -> that kind of wonderful
experience for our customers.
2994.221 -> That platform, that scales,
2995.287 -> that platform that we talked
about that can kind of build
2997.15 -> trust.
2997.983 -> And that's what we've been doing
for the last three years at
3000.399 -> Chase. We launched, you know,
back in September last year,
3002.857 -> we've already acquired
over a million customers.
3005.421 -> We've got over 10 billion
pounds in deposits.
3009.395 -> I think it's kind of working.
3010.835 -> I see a really kind of
bright future for it.
3012.549 -> So that would be my advice to you.
3014.573 -> So they're the final words
for me and I'm gonna hand over
3017.191 -> to Colin and I think
he's gonna close it out.
3019.999 -> - Try and keep my microphone on.
3022.555 -> And just last opportunity
to say a big thank you.
3026.545 -> Firstly, thank you to our
speakers, Paul and Courtney,
3029.696 -> for making the time to tell their story.
3032.15 -> Secondly, to our audience,
3033.722 -> I'm sure many of you have
traveled quite a long way to be
3035.813 -> here, so thanks for coming.
3037.888 -> And finally to our customers
at Chase International who make
3041.445 -> all of this possible.
3042.646 -> So please make sure to fill
out the survey and if you see
3046.37 -> any of us about throughout the event,
3048.505 -> please do make sure to come
and say hi at some point.
3051.154 -> Brilliant. All right, thanks everyone.