AWS re:Invent 2022 - Chase International: Always-on customer experience at scale (FSI308)

AWS re:Invent 2022 - Chase International: Always-on customer experience at scale (FSI308)


AWS re:Invent 2022 - Chase International: Always-on customer experience at scale (FSI308)

With a 200-year history as a pioneer in banking, JPMorgan Chase faces the challenge of meeting the ever-changing needs of its customers. Launched in 2021, Chase International combines the trustworthiness of an established brand with the seamless experience of a digital retail bank. Learn how Chase is providing fast access and personalized service 24/7 using AWS services, including Amazon Connect, Amazon EKS, AWS Glue, and Amazon OpenSearch Service. Chase details how it built a modern banking platform and scaled a new entity in an established market, attracting more than 500,000 consumers and over $10 billion in deposits.

Learn more about AWS re:Invent at https://go.aws/3ikK4dD.

Subscribe:
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4

ABOUT AWS
Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.

AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

#reInvent2022 #AWSreInvent2022 #AWSEvents


Content

1.06 -> - Hi everyone and welcome to FSI session 308.
5.832 -> Chase International, always-on customer experience at scale.
10.28 -> My name is Colin Marden and I'm an AWS solutions architect.
15.341 -> As an architect, I help AWS customers
17.979 -> build secure, scalable,
18.812 -> and reliant reliable architectures on AWS.
23.821 -> I'm part of the AWS team that's dedicated to JP Morgan Chase
28.445 -> and more specifically their payments, merchant services,
31.464 -> and international businesses.
33.996 -> It's my secondary event,
35.568 -> but it's special for me 'cause at this re:Invent I get to
38.776 -> introduce you to Chase International,
40.706 -> to their CIO Paul Clark and to Courtney de Lautour
44.384 -> an executive director and engineering lead.
48.011 -> Chase is a digital only retail bank
51.469 -> that launched around one year ago.
57.301 -> and they're owned by JP Morgan Chase, the larger business.
60.961 -> If you've not heard of JP Morgan Chase,
62.862 -> they're a leading global financial services firm
65.814 -> with operations worldwide
67.263 -> and around $3.4 trillion in assets.
71.603 -> They're a leader in investment banking, commercial banking,
74.874 -> consumer and community banking for small businesses,
78.738 -> financial transaction processing and asset management.
82.717 -> With that, let's take a look at the agenda for the day.
86.579 -> So we'll start with a few introductions to our speakers
89.563 -> and to Chase International.
91.596 -> From there we'll discuss customer experience
94.152 -> and why that matters so much to Chase International.
97.492 -> Before we dive a little deeper into the architecture
100.555 -> and exactly how the cloud has enabled them to move fast,
104.4 -> but equally to stay secure.
106.989 -> Finally, we'll close by walking through
108.769 -> some of the key takeaways and challenges
111.124 -> from the last 12 months of operation.
115.159 -> So why did JP Morgan Chase build Chase International?
119.479 -> To answer that question, I'm joined by Paul Clark.
122.432 -> Paul, can you tell us a little more
124.242 -> about the JP Morgan opportunity in retail banking
127.322 -> and exactly why you built Chase?
130.408 -> - Yeah, thanks Colin.
131.499 -> Hi everybody. It's nice to be here.
133.792 -> It's my first re:Invent,
135.558 -> so I'm sure for those in the audience that are American,
138.456 -> you're familiar with Chase, the brand and the bank.
141.532 -> It's never been launched outside the US before.
144.142 -> It had been considered but it was not thought
147.275 -> to be the right time.
148.501 -> Launching a bank requires a lot of infrastructure.
151.801 -> You have to build branches,
153.026 -> you have to do a lot of brand marketing.
154.417 -> It's, it's very, very expensive.
156.865 -> But however, you know,
158.057 -> the world has moved on a little bit and we're much more used
160.092 -> to now working in kind of digital ways
163.032 -> and digital banking is becoming much more prevalent.
165.279 -> If you are in the UK,
166.921 -> we have banks like Staling,
168.428 -> we have banks like Monzo in Europe,
170.048 -> there's N 26 over here,
171.192 -> you guys have got Chime and Venmo and people like that.
173.928 -> So people are much more used to it.
176.203 -> And we saw this point in time as the point in in time to,
179.346 -> to launch a digital bank. You know,
181.141 -> we have a quite an advantage over the kind of newer kind of
184.539 -> entrant banks if you like,
186.159 -> having the heritage of JP Morgan behind us and having that
188.625 -> kind of weight behind us.
190.731 -> So we set out to build a platform that would scout,
193.606 -> would scout to millions of customers,
194.814 -> would scout up and down as it was required.
197.628 -> You know,
198.461 -> AWS was what we chose to do it on
200.278 -> and the reason we chose to do it, you know,
202.311 -> was because to do it that way is because people,
204.446 -> everybody uses digital.
205.816 -> Everybody in this room will have a phone in their pocket.
207.873 -> Everybody in this room uses apps all the time.
210.238 -> And what we want to do is build a digital bank that would
212.23 -> scale but at the same time provide
214.313 -> the best possible service.
216.769 -> Because people don't compare banks, right?
218.924 -> You don't compare your bank to your friends bank.
221.289 -> People compare their banks to the digital apps.
223.421 -> They're used to using their Instagrams,
225.807 -> their Facebooks, whatever your favorite mobile app is.
229.332 -> And so we wanted to build a bank using digital first kind of
232.808 -> methods that would scale that would
235.391 -> give that kind of a wonderful service.
236.947 -> And then ironically we decided to do it in the hardest
238.771 -> possible way.
239.703 -> We decided to launch it in the UK and we decided to do that
243.424 -> because for those of you familiar with the idea of do the
245.576 -> hardest thing first,
246.557 -> the UK is a very competitive banking landscape and it has
249.842 -> the most kind of competitive digital banks.
252.349 -> It also has a very high bar when it comes to regulation.
256.464 -> The Bank of England and the PR and FCA
259.231 -> who regulators is considered to be kind
261.2 -> of like almost the gold standard really.
263.03 -> So if we could launch a digital bank in the UK,
266.208 -> if we could make that successful,
267.518 -> if we could do that in that regulatory regime,
269.44 -> we kind of ticked a box that this thing would have legs
272.43 -> potentially to get even bigger.
276.499 -> So in building that bank and as we set out to to build it,
280.819 -> we wanted to kind of create the best possible experiences
284.15 -> I've already said, but we also had to create a trust, right?
287.302 -> Banks are in a kind of unique position,
288.952 -> unlike the other apps that sit on your phone,
291.609 -> we have all your money and people really care about their
294.03 -> money and particularly care if they can't get their hands on
297.171 -> their money.
298.397 -> So we needed to build a bank that would scale a bank that
301.153 -> was easy to use, but also a bank that you could trust.
305.28 -> And we needed to kind of do all that and make it
307.169 -> frictionless at the same time. Because you know,
308.936 -> back to that point about you don't compare your banking app
311.192 -> to someone else's banking app.
312.35 -> What you do is you compare your banking app to your favorite
314.443 -> digital apps.
316.286 -> But we have to a balance,
318.472 -> we need to strike at the same time as we want to make that
320.686 -> experience as good as possible for you and as frictionless
323.203 -> as possible for you,
324.512 -> we have to keep the bad guys out because we've got your
327.024 -> money.
327.904 -> You don't wanna let the bad guys out.
329.604 -> So we really kind of focused on the customer experience and
331.891 -> how we could use customer experience,
334.649 -> to build that trust with you and to give you a wonderful
337.163 -> experience of banking.
338.742 -> So for instance, you know,
339.745 -> every time you tap your card you get an instant notification
341.68 -> of your cards, you've used your card and you might think,
343.814 -> well I've just used it, I,
345.264 -> I know that I've just used my card.
347.139 -> Well that's not the point.
348.142 -> What we're doing is we're training you in some way to kind
350.537 -> of recognize that when your card is used you get an alert.
352.926 -> So if you were to get an alert but you hadn't used your
355.192 -> card, then somebody else must must be using your card.
357.587 -> So we can kind of build that kind of trust with you.
361.795 -> We send you real time push notifications.
363.528 -> We do other things like we pre-auth you when you phone up,
367.403 -> if you need to phone us up,
368.595 -> we pre-auth you so you phone us from the app,
371.531 -> you're bio metrically secured because
372.854 -> you've just come in via your phone,
374.664 -> you come through to the contact center.
376.311 -> Courtney's gonna talk a little bit about how the magic
377.899 -> happens later,
378.984 -> but we know it's you
379.817 -> and we can take you straight into the journey.
381.851 -> If you phoned from something other than your phone,
384.496 -> we'd send you a push notification to your phone and say,
386.552 -> hey, this is you.
387.6 -> Same way as you get two factor authentication when you're
389.595 -> using Google to log in and all these things help build trust
393.083 -> and help build a frictionless experience.
395.635 -> And that then allows us to kind of offer you more services
398.584 -> because you know, we're building that relationship.
406.123 -> Talked about kind of building a bank in the UK,
407.796 -> it's a great place to start.
408.879 -> I'm sure some of you look at kind of Jamie Diamond when he
411.158 -> gives an investor updates.
412.776 -> You have an idea about how much we're spending, you know,
415.503 -> we are not building a bank for the UK I think, you know,
418.785 -> we started in the UK and that's definitely the plan.
422.048 -> But to be sustainable as a bank,
424.304 -> you have to kind of be quite a large scale.
427.184 -> And now we have a big advantage here against the legacy
430.244 -> banks because of the way we've set about building the bank.
433.234 -> And Courtney's gonna talk about that in far greater detail
435.138 -> than I could ever understand.
437.866 -> We, because of the way we've built it,
439.871 -> it means that we can scale it up and the kind of,
442.388 -> and kind of marginal cost of adding new customers to that
444.197 -> platform is, is quite low.
446.375 -> So compared to the legacy banks which might end up having
449.154 -> kind of a system per geography let's say,
451.882 -> and then all the costs that come with that,
453.42 -> if we do this right and I believe we are doing it right and
456.47 -> what we should be able to do is scale the bank up into
458.335 -> multiple geographies,
459.314 -> scale the bank up into millions and millions of customers
461.959 -> and we should be able to keep our costs low because of the
464.628 -> way we use the underlying platforms that support the way we
467.799 -> built it.
468.632 -> This will give us a much better cost income ratio than the
471.234 -> legacy banks and really,
472.943 -> really help drive profitability in the long term.
475.573 -> So yes, we've had kind of a lot of initial input,
477.918 -> you know, a lot of cash input at the beginning
479.852 -> to get this thing set up,
480.958 -> but really the kind of payback comes in the long term as we
483.588 -> scale the platform on and on.
488.276 -> We worked really hard to try
490.457 -> and get everything on a slide to,
493.06 -> but we couldn't work out how to do that for the last one.
494.87 -> So apologies for, for not being able to do that.
497.87 -> But yeah,
499.057 -> these are the core development tenants
500.765 -> and it's not just for development,
502.253 -> it's how we really think about
503.086 -> how we build a bank from scratch.
507.025 -> Now if other people stood on the stage here today and talked
510.006 -> about their wonderful startup or their digital company,
512.913 -> they'd all probably talk about similar,
514.817 -> similar kinds of things, right?
516.518 -> We wanna make it really,
517.351 -> really easy for our engineers to do the right thing.
519.51 -> We wanna make it really easy for our engineers to,
521.815 -> to do what they're good at, which is creating IP,
524.614 -> creating services that our customers love and, and you know,
527.905 -> really kind of delivering value constantly.
531.174 -> And we work in a very heavily regulated environment.
536.301 -> We have a control,
537.145 -> a control environment which is there for all very,
539.756 -> very good reasons, right?
540.804 -> Protect us to protect you, the customers,
543.209 -> to protect the money, to protect our reputation.
546.238 -> And what we set out to do at the very beginning was ask
549.48 -> ourselves the question,
550.313 -> how could we build a tool chains and and and our kind of
553.529 -> automation that would allow us to really kind of manage that
557.969 -> control environment in a way which is almost invisible to
560.37 -> the engineers so they can just get on and do their job,
562.905 -> but in doing their job,
563.738 -> they're doing the right things in the right ways
565.696 -> to protect us, to protect the customer.
568.923 -> And for us it was about building platforms and you hear the
572.082 -> joke, you know,
572.924 -> it's totals all the way down kind of platforms all the way
575.863 -> down these days really.
577.102 -> So you know, we built a banking platform, you know,
578.683 -> customers come on they can kind of do their,
581.41 -> I think the Americans call it a checking account,
584.062 -> we call it a current account.
585.088 -> They can come in and use their bank account, you know,
587.091 -> that's on a banking platform.
588.371 -> It's powered by our developer experience platform that
590.865 -> allows me to kind of build, test, deploy,
592.187 -> operate my software that's its, you know,
594.371 -> next to our observability platform so we can keep an eye on
596.776 -> everything that's happening.
598.338 -> You know,
599.171 -> all of the output of that gets fed into our data platform
602.305 -> that sits on top of our kind of tech stack,
604.648 -> our tech platform,
605.481 -> which eventually at the bottom sits on AWS.
608.547 -> I think that scales up and down, you know, and,
610.95 -> and what we did here is, you know,
613.099 -> we tried to simplify it for the engineers that they could
615.026 -> just get on and do the right thing and we could then move at
617.877 -> pace and at scale we do somewhere in the region about 4,000
620.695 -> production releases a year. You know,
622.447 -> we try to make it really easy for our engineers to do the
624.77 -> right thing, deliver value constantly.
627.815 -> The other thing to mention here is that last point,
630.002 -> low power distance. You know,
631.861 -> we're not talking about hierarchy,
633.402 -> although within Chase in the UK we do try to maintain a
637.36 -> really flat hierarchy.
638.709 -> So I sit out on the floor with everybody else,
640.258 -> but low power distance means allowing the engineers to just
643.33 -> be able to get on with stuff self-serve.
644.981 -> So what we don't want to do is create a series of platforms
647.157 -> that everybody has to raise tickets for.
648.723 -> We want the engineers to be able to get their hands on the
651.206 -> tools so they can get ahead and they can then create the
654.287 -> software that's required.
655.62 -> So we have low power distance both in hierarchy
658.605 -> and in in how the engineers kind of go about
660.65 -> their day-to-day jobs.
662.431 -> And this isn't just for how we think about kind of writing
665.743 -> code and engineering. This is how we think about everything.
667.724 -> So we think about resilience,
669.813 -> both technical and business resilience.
671.423 -> We try to kind of make sure that we can automate those
673.621 -> processes and kind of get all those ileitis in place.
679.294 -> With that said,
682.295 -> I'm gonna hand over to Courtney and Courtney's gonna talk in
685.306 -> a little bit more detail about all that happens,
687.642 -> how that all that happens.
688.76 -> Courtney runs our engineering team at Chase.
693.392 -> - Thanks Paul.
694.909 -> Paul has mentioned a lot of platform teams there and because
697.961 -> one of my teams is called the platform team,
700.033 -> it's probably good to explain which that one is.
701.665 -> And it's a team that you might have heard referred to as a
703.9 -> DevOps team infrastructure or cloud enablement team.
707.342 -> So it fits into that kind of platform space.
709.74 -> And one of the things that we are doing here is we are
712.668 -> building a platform and it's a product.
714.561 -> So we are calling the teams platform teams because these are
717.38 -> related to products.
718.95 -> Products have customers, DevOps doesn't have a customer,
721.921 -> it's a way of working The cloud isn't a product.
725.121 -> It might be Colins actually, but no you lied to me.
732.018 -> So we have the cloud, which is a technology choice for us,
735.965 -> but it's not a product.
737.141 -> Our product that we are building is the engineers that are
739.499 -> building the bank for us.
741.158 -> So we need to empower them to be able to build great
744.422 -> customer experiences with as minimal friction as they can.
747.181 -> And one of the things that happens when you have a platform
749.185 -> team or a core engineering team or an infrastructure team is
752.689 -> it might be tempting to consolidate control into that one
755.457 -> team and erode the low power distance.
757.844 -> And that's because it seems to be an easy place to implement
760.487 -> these controls and requirements.
762.05 -> Imagine that you wanna mandate that you have an 80% code
765.358 -> coverage throughout your experience.
767.139 -> This is great for the team who
768.472 -> wants to have 90% code coverage.
770.228 -> They can meet that requirement really easily.
772.121 -> But what about the team who doesn't want to do unit testing
773.989 -> at all?
774.822 -> They want to do their testing through integration testing
777.13 -> might not be something I would recommend,
778.71 -> but they're not gonna have any code coverage metrics that'll
781.337 -> feed into your system and then you will have reduced their
783.583 -> ability to do what they want to do.
785.553 -> It's our job as a platform team to now enable
788.502 -> other teams to do their job
789.781 -> and not to tell them how to do it.
793.738 -> You've probably heard the term go fast and break things,
796.24 -> right?
797.073 -> This is a fairly common thing around startups
799.018 -> and moving on.
799.871 -> But as a bank and to maintain that trust that we have that
802.701 -> Paul was just talking about, we have to go,
804.591 -> we have to go fast but we can't break things.
807.965 -> So one of the things that we started with was coming up with
810.525 -> our account structure at the start we knew that we were
813.905 -> gonna be building a new bank with a new way of working new
816.744 -> principles and potentially even a slightly different culture
819.333 -> than the rest of JP Morgan as a whole.
821.823 -> So one of the first things that we knew we wanted to do was
824.351 -> to separate out all of our accounts and our workloads.
827.623 -> It's quite easy to understand that we'd want to have our
829.958 -> services, our banking, our accounts,
831.653 -> et cetera in different locations.
833.734 -> But we wanted to go a bit step further and we ended up
836.053 -> separating out so that we run our own bit bucket instance,
839.952 -> we run our own tool chain,
841.677 -> we run our own observability stack,
843.277 -> we run our own data platform, we own everything separate.
846.295 -> And one of the main problems that you face when you're
848.317 -> starting on this is where do you actually start on designing
850.895 -> your account structure?
852.353 -> You could start off with the AWS well-architected framework.
855.162 -> So it's a great resource and you can go into the finer
857.517 -> details of the implementations that you might want to do.
860.357 -> This might end up getting you towards analysis paralysis
862.895 -> where you're asking do we want to have a singular account
865.117 -> where everything is in one place or do we want to have a
867.063 -> cellular architecture where everything is isolated?
870.263 -> But for us we wanted to have a look at what the requirements
873.162 -> of these accounts were gonna be and why we would do them.
875.86 -> For instance,
876.693 -> maybe we wanna have role based access control for some
878.791 -> accounts or dedicated access for others.
881.013 -> Some of these accounts might need to have three nights of
882.887 -> availability or others might need five.
886.109 -> RPO and RTO.
888.074 -> There's gonna be some things that do need an RPO of zero,
890.924 -> but then there's gonna be a lot of things that can actually
892.634 -> tolerate some failure.
894.085 -> And in terms of failure domain, you might need to think,
897.186 -> oh if we separate everything,
898.698 -> nothing can impact each other if it fails.
900.741 -> But then you're trading off economies of scale and you're
902.906 -> gonna be paying a lot of money for compute and resources
905.098 -> that you're not using.
906.57 -> And I think given we're in an FSI track,
908.975 -> a lot of us probably are regulated to some extent here.
911.782 -> And so you need to think about audit ability of your systems
913.81 -> as well.
914.879 -> You're gonna be asked by your regulators to explain how
917.199 -> things work and where you draw the box around your accounts
919.904 -> is gonna impact how much you have to explain to them when
923.281 -> they're asking you how does this work?
924.943 -> And also if you create them too small,
927.217 -> you're gonna have to explain the relationships
928.675 -> between all of those,
929.944 -> which is probably just as big a work
930.777 -> as doing it all into a single account.
932.941 -> So what we have here is we have the internet edge,
935.89 -> which is kind of like a DMZed.
937.104 -> It does the same things,
938.173 -> it does load shedding when you have incorrect payloads,
940.501 -> it does attack analysis,
941.789 -> it does high level authentication for requests before we
944.273 -> dive into zero trust,
945.538 -> lower down in core banking stack
947.533 -> and the core banking accounts,
948.741 -> this is where we run most of the services that make the bank
950.945 -> actually work and the customer experience
952.632 -> associated with it.
953.85 -> You have an account, we need to track the balance
955.702 -> of that account somewhere you
956.899 -> have scheduled payments to transfer some money to mom,
959.642 -> we need to run that somewhere.
961.344 -> And we also have savings accounts which deliver some market
963.96 -> leading interest rates on them,
965.56 -> but they also need to pay into somewhere.
967.856 -> Now these two accounts that I've just described have to be
970.886 -> highly available and have low latency because when you're
973.148 -> trying to pay for a train ticket home at 9:00 PM at night,
975.8 -> you don't wanna be waiting for
976.862 -> your bank cause it's not working.
978.534 -> However, to the right hand side of that,
980.315 -> we have the analytics stack or data platform.
984.102 -> This data platform is allowing us to measure whether or not
987.625 -> our market leading interest rates are actually having impact
991.02 -> on the customer experience.
992.426 -> Are people actually enjoying having more money paid into
994.831 -> their accounts at the end of the month?
996.282 -> Probably yes.
997.984 -> And then,
998.998 -> but then one of the things there is that this doesn't need
1001.206 -> to be nearly as available
1002.664 -> or have as low latency as the other systems.
1004.85 -> Yes, if they're not performing well
1006.642 -> or there's something wrong,
1007.89 -> it's gonna impact internal customers,
1009.885 -> internal users of that platform.
1011.802 -> But these customers are probably gonna be more tolerant of
1014.458 -> things not working quite so well.
1016.402 -> And to the right hand side of that,
1017.642 -> we have our observability platform,
1019.413 -> which is collecting metrics from all of the points
1021.224 -> throughout these other areas into a single pane of glass,
1024.274 -> which lets us know whether or not our internet edge,
1026.872 -> our core banking and our element platforms are actually
1029.401 -> working as we expect them to.
1031.109 -> Linking all of these together,
1032.642 -> We have a set of shared services that run message brokers,
1035.47 -> distribute artifacts and a few other common things and
1038.32 -> that's on top of the JP Morgan infrastructure backbone,
1041.59 -> which is a direct connect between
1043.006 -> the two networks that we have.
1046.179 -> Once we've started on some accounts,
1047.728 -> we end up working out what's going to go into them.
1050.199 -> Again, AWS well architecture framework comes into play here,
1053.603 -> but if you're looking at this diagram you're probably
1055.256 -> thinking, oh that's pretty simple, why am I here?
1057.432 -> And you can learn most of what's in this in the associate
1060.683 -> cloud tradition of course,
1062.211 -> which I think you can actually get
1063.112 -> certified downstairs at the moment.
1065.276 -> We have public,
1066.109 -> private public subnet pairs and we run all of our,
1068.85 -> our workloads inside the private subnets.
1070.97 -> We use managed services as much as possible across these.
1074.173 -> For instance,
1075.092 -> we're running Kubernetes using EKS and while we have several
1078.143 -> hundred EC two instances running across all of our accounts,
1081.723 -> we only manage a number of these
1083.354 -> that's around about 10 or so.
1084.956 -> Like we do manage and use manage services a lot here.
1089.799 -> We make heavy use of private links as you can probably see
1092.108 -> going horizontally across here to enable access for accounts
1095.295 -> to talk between each other.
1098.106 -> This is to allow us to manage access control and it also
1101.928 -> means that we can create new accounts and connectivity
1105.212 -> patterns without having to worry about managing cyber page
1107.639 -> ranges.
1109.199 -> We can create as many accounts as we want and connect them
1111.295 -> however we want without planning ahead for this.
1113.61 -> And you can see even on the right hand side of this diagram
1116.996 -> that we have a high security area where the only
1119.999 -> connectivity is via private subnet private link.
1123.773 -> And then for some areas we don't even need to use direct
1127.034 -> connectivity through anything at all.
1128.746 -> We can utilize AWS cross account access to mean that
1131.756 -> services can write data directly into the source where
1134.676 -> it's needed.
1135.509 -> And we'll touch a little bit of that,
1136.922 -> how we do that for logging a bit later.
1139.348 -> Then in analytics and tool chain and other kind of accounts,
1141.984 -> we need to have the ability to let engineers show their
1145.188 -> accounts to people who are actually on their workstations,
1147.391 -> which is where the transit gateway and the backbone come
1149.898 -> into play to make this work.
1154.148 -> So we know that our success to is gonna be through our
1157.863 -> ability to innovate.
1159.029 -> You know we're gonna have competitive pressure
1160.745 -> that comes in,
1161.657 -> people are gonna be coming up with
1162.74 -> new banking products all the time.
1164.507 -> Our interest rate may not be quite
1165.715 -> so competitive as it once was,
1168.264 -> but we're also running in a regulated industry and
1171.16 -> regulators are gonna be doing things in the interest of
1174.304 -> people outside of us and they're gonna be able to legally
1176.86 -> mandate us to do these things and we're not gonna be able to
1179.136 -> say, you know what,
1179.969 -> we're just gonna deprioritize that because we want to
1181.662 -> implement this feature further.
1183.155 -> We have to do it when they tell us to.
1184.997 -> And you can see that that some of these regulations are
1186.629 -> starting to come into industries outside of finance as well.
1189.035 -> You know,
1189.868 -> there's a lot of privacy related things coming in in the EU.
1192.701 -> We have GDPR which requires us to be able to display the
1196.437 -> data that we have on any given customer and if they ask us
1199.088 -> to delete it and we have similar things coming in around the
1201.923 -> world such as the California Consumer Privacy Act,
1205.135 -> customers are expecting us to deliver them new value and new
1207.893 -> features all the time while regulators are expecting us to
1211.81 -> be compliant with them and they don't really necessarily
1214.373 -> care which one you're trying to do at the same time,
1216.835 -> this means that it's important for us to be able to respond
1219.096 -> quickly across all of our services and to go quick.
1222.929 -> To do this successfully, You don't need
1225.25 -> organizational design.
1226.41 -> Creating sympathy between people, processes,
1228.634 -> culture and people all across the organization and teams.
1233.244 -> There's one thing that you need to do,
1234.622 -> even if you're doing all of those things perfectly and that
1236.94 -> thing is actually build your code fast.
1239.293 -> If he tries to build a banking assembly,
1241.007 -> you're probably not gonna get there very quickly.
1243.225 -> And in some cases microservices might be a little bit closer
1245.906 -> to assembly than a lot lot of engineers would like to admit.
1248.866 -> So when we turn to the contact center,
1251.3 -> which is something that Paul was describing before as one of
1253.57 -> our key brand differentiators and important aspects of what
1256.647 -> we deliver our customers.
1258.319 -> Remember we're a digital only bank,
1259.991 -> so there's only two ways people can interact with us through
1262.308 -> the app or by calling us.
1263.927 -> So this is really important for us to get right.
1266.132 -> Connect allowed us to build this,
1267.791 -> this channel in a really simple way.
1269.738 -> If you look at this diagram,
1271.239 -> you can only see that connect is actually only one of these
1273.722 -> icons on here,
1274.859 -> but the rest of them are of the AWS resource times and the
1278.011 -> power really comes in to the ability to integrate,
1280.721 -> connect into this ecosystem.
1284.084 -> One of the things that we do for authentication when people
1287.063 -> call in is a feature which we call collect to call.
1290.596 -> I'm sure that you've had an experience where you've rung up
1292.831 -> your bank trying to answer a relatively simple question
1294.956 -> about your banking profile and they actually ask you
1296.972 -> something more complex to prove who you are.
1298.994 -> So they might say, ah,
1300.301 -> can you tell us how much you spent on
1301.759 -> the Tuesday of last week?
1303.098 -> Or how much did you have to pay a your credit card at the
1305.034 -> end of the month because we've got you in the app and the
1307.378 -> biometrics associated, we already know who you are.
1310.498 -> And so we don't need to ask these questions.
1312.573 -> What we do is when you click the button
1314.031 -> in the app to call us,
1315.275 -> we log into our IDP, create a short lived token,
1318.541 -> and then once the call is connected to the contact center,
1321.13 -> we send this short lived token via DTMF,
1324.687 -> which is the dial tone modulated frequency.
1326.93 -> And those are the sounds that happen when you press the keys
1328.754 -> on your keyboard.
1329.842 -> Once the call is connected and those are sent,
1331.975 -> AWS Connect is then able to interpret those call into our
1335.119 -> RDP and it knows who we are within a matter of milliseconds.
1338.399 -> This means that once you've completed the IVR flow and
1340.946 -> you're talking to a real agent,
1342.415 -> they are able to have all of the information about you in
1345.154 -> front of them and we're able to see that's relevant,
1347.61 -> including if you've been using our in-app chat feature.
1350.66 -> But we don't have to stop there.
1352.753 -> Once we know who you are at the start of this flow,
1355.398 -> we can dynamically change the IVR flow that you are taking.
1358.954 -> One of the common kind of cases that you have on this is
1361.367 -> when you have a fraud alert.
1363.369 -> And that is if you've made a purchase that's slightly
1365.595 -> outside of your normal spending pattern,
1367.588 -> the bank will sometimes send you a message or give you a
1369.419 -> call and say, Hey, was this transaction actually you,
1373.216 -> in our case when someone calls up about this thing,
1376.226 -> we are able to look into your account,
1378.714 -> determine do you have any fraud alerts pending and then
1381.875 -> direct you into a different IVR flow which will connect you
1384.452 -> directly to the fraud specialist for you to have that
1386.445 -> conversation with them directly.
1388.339 -> And other situations you might have to
1389.763 -> go through an IVR flow,
1391.112 -> get connected to an everyday banking consultant who will
1393.417 -> then transfer to your fraud specialist.
1395.651 -> But obviously we don't always get this heuristic right and
1398.531 -> sometimes we will connect people who aren't calling about
1401.102 -> fraud to a fraud specialist.
1402.707 -> In this case, the fraud specialists have
1404.499 -> also been trained to handle most cases
1406.652 -> and they can probably help without having to transfer,
1408.718 -> but they can still do that transfer if they need to.
1411.769 -> And because the call center is one of the the key
1415.494 -> differentiators for our our proposition,
1418.443 -> we knew that we had to start training everyone
1420.401 -> on how this will work
1421.919 -> and build our the muscle for this early.
1424.759 -> We started doing this in around about early 2020 in fact,
1428.14 -> which as a number of customer companies experienced,
1431.148 -> brought with us working from home requirements and was a
1435.036 -> little bit of a surprise to us and it was no different with
1437.132 -> our contact center and anyone who's tried to listen to a
1438.324 -> YouTube video from their remote desktop through their home
1443.862 -> speakers probably knows where this is going.
1446.951 -> Turns out when you're trying to connect a soft phone through
1449.685 -> a remote desktop to another cloud service,
1451.652 -> voice quality drops a lot.
1454.238 -> We were able to modify our flow,
1456.591 -> create a new AWS Cognito pool to allow call center agents to
1460.83 -> log in from their Chromebook,
1463.108 -> which we distributed to them so they could work from home.
1465.513 -> They were able to log in that and then connect directly to
1468.932 -> Connect without having to use their remote desktop.
1471.916 -> They could then open their remote desktop,
1473.649 -> use their normal applications on network and then when a
1476.097 -> call was received from this mock soft phone,
1478.826 -> it would go directly from Connect to their Chromebook and
1481.442 -> this dramatically increased the quality
1483.525 -> of the voice calls as we know.
1486.226 -> We knew this quantitatively because we are disseminating
1488.508 -> a lot of our information from
1490.101 -> the contact center through to our observability stack.
1493.213 -> This includes the number of soft tokens that we've issued,
1495.611 -> the number that have failed to resolve how many calls we've
1497.917 -> routed intelligently customer and agent transaction records
1502.418 -> and we've even included some code into our software.
1505.381 -> We can track the amount of background noise
1506.214 -> and the packet lost in the call itself.
1509.892 -> By using Connect,
1511.013 -> We've been able to easily deliver a great customer
1513.488 -> experience to our users and some innovative features that
1516.277 -> you won't see anywhere else.
1517.523 -> And we've been able to do this with
1518.731 -> minimal engineering effort.
1522.204 -> But Amazon hasn't been able to turn everything into a
1525.07 -> managed service just yet.
1528.028 -> You know,
1528.861 -> we do a lot of things with microservices and we currently
1530.909 -> are running several hundred of these in production today.
1533.931 -> When when I was starting my work at JP Morgan,
1537.277 -> I was going through the interview process,
1539.002 -> I was talking to an engineer called Alex and he was,
1540.867 -> you know, telling me about the project, how was Greenfield,
1543.395 -> how the culture was great, how we were a startup,
1545.207 -> cloud native, all these great things.
1547.304 -> And one of the things that he mentioned was that we were
1549.237 -> building a microservice architecture and being a developer
1552.513 -> microservice architectures is something
1553.721 -> that I'm quite keen on.
1555.274 -> And I did ask him,
1556.138 -> why are you doing microservice architectures?
1558.394 -> There's a lot of things where a monolithic application can
1561.021 -> probably serve you just as well and putting a network call
1563.429 -> in between those things isn't gonna
1564.68 -> make things better for you.
1566.104 -> But Alex was able to answer me in a somewhat paraphrased
1569.635 -> thing here saying that we're using microservices because
1573.039 -> it's gonna empower our teams to operate independently with
1575.854 -> flexibility,
1576.896 -> but will also give us consistency where it counts.
1580.851 -> When we're talking about a microservice,
1582.683 -> there's a lot more that goes into them than just code.
1585.278 -> And sometimes we might think about the jar file
1587.736 -> or a docker image said this is a microservice,
1589.977 -> but really it's a lot more than that.
1591.389 -> And we wanna make sure that we're giving teams the
1593.131 -> flexibility and empowering them to make the choices that
1595.335 -> they want to do.
1596.552 -> They're able to use open source projects where they want,
1599.069 -> they're able to use vendor products if they they want to as
1603.389 -> well and if they want to go through the hassle of doing a
1606.085 -> commercial arrangement with them,
1607.699 -> which can't always be the easiest thing to do,
1609.501 -> we don't mandate any frameworks.
1611.192 -> We all slightly customized proprietary ones,
1615.828 -> and we let microservice teams use cloud services
1619.362 -> in their microservice.
1620.801 -> For instance, if you're using Redis,
1623.21 -> you might want use a managed version of that through
1624.934 -> ElastiCache.
1625.83 -> You might want to use S3 KMS or you might even want to go
1628.637 -> off and do something a little bit more crazy
1630.47 -> and use something like Ground Station.
1633.803 -> As a platform team is not our
1635.627 -> job to choose what people are using and it's our
1637.702 -> responsibility to empower them to do that.
1640.265 -> But these cloud resources and infrastructures code that goes
1643.582 -> with them are part of the microservices themselves.
1645.955 -> This means that we have to have the code next to the
1648.232 -> microservice and not in a centralized cloud repository
1651.603 -> somewhere.
1653.174 -> We do have some common interfaces that microservices are
1655.029 -> expected to implement,
1657.355 -> such as structured logging Prometheus for metrics and there
1660.091 -> are some API standards around this as well.
1662.366 -> We request that everyone uses swagger to
1664.574 -> define their rest APIs.
1666.32 -> ARO for event based APIs.
1668.286 -> And the reason for this is why we do wanna make sure that
1671.326 -> teams have flexibility to build what they want,
1673.794 -> that we are creating a cohesive ecosystem overall.
1676.462 -> It makes a lot of sense that people can do what they want in
1678.895 -> their domain context, but they,
1680.198 -> the power of microservices is in the relationships
1681.031 -> that are built with other teams.
1685.926 -> Along with this,
1686.792 -> Paul mentioned before that we have a you build it,
1688.808 -> you run it mindset.
1689.959 -> This means that teams are responsible for owning everything
1692.038 -> to do with their microservice, the database,
1694.24 -> the schema of that data associated with it.
1697.478 -> So as as a platform team,
1699.391 -> we're empowering teams to to choose what data they want use
1702.954 -> and how to store it.
1703.834 -> They could, if they wanted to run Postgres
1706.029 -> on their own pods and manage it, make all the backups,
1708.727 -> do all the compliance and all of that sort of things,
1711.212 -> they might take a step away and use RDS and configure that
1713.77 -> directly.
1714.677 -> Or they could use one of the paved highways that we have as
1718.221 -> a platform team and they could use a Kubernetes resource
1720.477 -> that goes off and creates it all scaffold for them.
1722.845 -> And they know everything is compliant but the power is in
1725.119 -> their hands and they're able to choose.
1728.827 -> As part of Build It and New Runner,
1730.568 -> they're also responsible for the SLAs associated with that
1733.106 -> code base.
1733.971 -> So we've just gone through Black Friday and Cyber Monday,
1736.692 -> it's the team's responsibility who are operating these
1738.935 -> services to be collecting the metrics,
1740.626 -> to know what's expected of their service when things launch
1743.767 -> and to make sure they're able to cope with this.
1746.316 -> And having gone through this event,
1748.73 -> we went through it perfectly fine.
1749.567 -> A couple of teams made some adjustments to the scaling but
1751.71 -> they were able to do that fine.
1754.082 -> With our microservice architecture that we have,
1756.239 -> We've empowered our teams to make their own choices,
1757.916 -> their own flexibility because they're able to use open
1760.383 -> source technologies,
1761.38 -> they can come in and they can use their industry wide
1763.756 -> experience in our teams.
1765.799 -> And because of a couple of small interfaces that we request,
1769.906 -> there's a cohesive ecosystem across
1771.535 -> Chase International as a whole.
1775.108 -> But you have to build these microservices somehow
1777.191 -> and then you have to manage them.
1779.074 -> If you are building a monolith and you had 200 hundred
1781.146 -> engineers working on it,
1782.426 -> it might make a lot of sense to just have five people who
1784.771 -> were there just tweaking the knobs,
1786.142 -> making sure everything kept running fine,
1788.123 -> deploying it when there was a change that needed to be made,
1790.009 -> et cetera.
1791.075 -> But when you have 30 teams, seven engineers on it,
1793.791 -> it doesn't make quite so much sense
1795.785 -> to have five of those engineers doing the same
1797.337 -> thing.
1798.395 -> So you need to rely on automation
1800.353 -> for them obviously.
1802.235 -> When we're creating a microservice,
1804.072 -> we start off with what we call starters.
1806.6 -> These starters are kind of prepackaged recipes
1809.808 -> for creating a service.
1811.325 -> They're not going to stick there,
1813.164 -> they're not going to long live
1814.497 -> and you don't have to fit into them.
1816.038 -> They're a library rather than a framework.
1817.985 -> These starters are actually managed
1819.193 -> by the microservice teams themselves.
1821.436 -> They go into a centralized area
1822.581 -> and they're evolved in an inner sourced model.
1825.664 -> The platform team at imark,
1827.914 -> we contribute to these as we would any other service,
1830.263 -> but it's important that we are not the ones
1831.971 -> mandating what those are.
1833.495 -> When we're talking about what the best practices are for a
1835.612 -> Java service,
1836.807 -> a bunch of go developers aren't really gonna be able to
1838.7 -> answer that question well.
1841.692 -> We also use tongue charts and terraform modules and in these
1843.149 -> areas it's still the the microservice teams who are
1847.698 -> responsible for these as a platform team.
1849.723 -> And because we're working with these technologies a lot more
1852.373 -> often, we provide more input into these,
1854.194 -> but it's a consulting role,
1855.738 -> not one of ownership and control.
1858.474 -> We again enabling our teams to make their own decisions.
1861.786 -> And when we get there, it's interesting that on the slide,
1864.754 -> that's one part there that happens once per service.
1867.988 -> So we don't need to scale that that much,
1869.73 -> but everything else happens many times.
1871.965 -> And when we started Chase International,
1873.714 -> we started with the stretch goal of a thousand releases per
1876.557 -> day, which is quite a lot,
1879.498 -> particularly since we had no services to release at all.
1881.752 -> So we couldn't manage that.
1883.453 -> And to be honest, we're still not close to that today.
1885.789 -> But what that statement does is it focuses your mindset
1889.304 -> and thinking towards what are the things that we need to do
1891.548 -> to enable that if we are able to do it,
1894.16 -> it means that we can't do things like have manual processes
1896.664 -> in the middle that's slowing things down.
1898.538 -> We can't have an approval that takes two days.
1900.528 -> There's no way you're gonna get that.
1902.281 -> But in a regulated environment such as we are,
1904.608 -> we need to have confidence that the things that are going
1906.272 -> into production are what we are need.
1908.736 -> So we make sure that we have repeatable builds
1911.694 -> cause we need to know what's in production.
1914.101 -> And if your build isn't repeatable,
1915.378 -> what you're gonna find is that it works fine and that at
1918.003 -> some point you're gonna have to come back and go,
1920.149 -> actually what was in that artifact that I just deployed?
1923.352 -> It's gonna slow you down.
1925.794 -> And the same goes for the artifacts that
1927.002 -> are produced out of these builds.
1928.645 -> They need to remain immutable.
1930.821 -> Immutability sounds relatively simple when you're creating a
1933.264 -> docker image or something like that,
1935.33 -> but it can come in in an interesting ways
1937.629 -> and you need to think about this.
1939.08 -> So if you're deploying something using Terraform,
1941.219 -> you need to look at the modules that your Terraform module
1943.296 -> is referencing.
1944.224 -> Are you pinning them to an exact version
1945.682 -> or are you using an approximate version.
1949.259 -> And you're deploying things by a pipeline?
1951.724 -> And if in there you have some templating
1953.682 -> and injecting environments from your pipeline,
1956.024 -> do you know what version of your pipeline ran and what the
1958.43 -> value of those variable variable was at the time that
1960.243 -> something went into production?
1961.681 -> And will you be confident about that in a year's time?
1965.667 -> To enable all this,
1966.798 -> you need to have good quality metrics
1968.006 -> about what you're deploying.
1969.609 -> I don't necessarily mean Prometheus metrics here,
1972.196 -> which are telling us how many transactions per second hour
1975.156 -> service is getting, but we do need those as well.
1977.623 -> But instead I'm talking about the code quality.
1980.422 -> What is cyclomatic complexity of the code?
1982.966 -> What is the test coverage?
1984.385 -> If you're using that or something?
1985.782 -> What are the supply chain and static analysis results when
1988.372 -> you've been degree to scanning this?
1990.455 -> What are the results of the integration load tests?
1993.336 -> Are you doing chaos testing,
1995.47 -> which we've got as part of this as well?
1997.343 -> And then once you've deployed
1998.583 -> to your production environment,
2000.275 -> you can still do more things and cover that such as smoke
2002.477 -> test and canary releases.
2004.734 -> Now
2016.684 -> The last thing on this step is once you've deployed,
2019.779 -> you need to have the ability to automatically roll back if
2022.344 -> you find something has gone wrong.
2024.278 -> A lot of engineers that I've talked to in the past have
2026.03 -> said, you know, our,
2027.777 -> our strategy when we release something bad is to fix Ford.
2031.022 -> And to me that's,
2031.855 -> that's an incredibly scary concept and that's because when I
2035.592 -> talk to people about estimating a piece of work,
2038.161 -> they often use things like perfect day estimates or T-shirt
2041.182 -> sizing or something like this.
2043.263 -> And these are assuming that you know the problem out front.
2045.571 -> Now imagine that you're trying to
2046.779 -> solve this problem at 2:00 AM.
2048.164 -> You've got a whole bunch of stress because people
2050.28 -> can't buy buy their tickets home
2052.392 -> and you need to solve it quickly.
2054.67 -> Instead,
2055.503 -> the better option in most of the cases is to roll back to
2058.162 -> your last known good version.
2059.937 -> And you need to have this ability,
2062.137 -> and you'll be noticing that I've been talking a lot about
2063.811 -> deployments, but there's another element here,
2067.054 -> which is that a deployment doesn't mean that anyone's using
2069.774 -> your service.
2070.757 -> You know,
2071.59 -> we've got a few people around here who are selling,
2073.355 -> you know, vendor products to help us deploy
2076.623 -> and then release separately.
2078.467 -> And deployment is an engineering practice.
2080.622 -> Releasing is a business practice and you need to separate
2083.289 -> these two things and you can deploy a thousand times a day
2086.401 -> and release once a week.
2088.977 -> And in summary,
2089.814 -> we have several hundred microservices that we release daily.
2095.43 -> So what does this look for?
2096.778 -> An engineer on the Chase team on the left hand side,
2099.398 -> we have a fairly standard home chart and I hope that
2101.875 -> everyone in the back can read that,
2103.598 -> but it wouldn't be a technology kind of discussion if we
2106.578 -> didn't put some small font on there and hope everyone can
2108.205 -> see it.
2109.443 -> And this helm chart is for an awesome and amazing service
2113.161 -> that's been written by amazing team.
2115.326 -> You can see here that it has a dependency of the awesome
2117.852 -> team starter that is,
2119.454 -> that the awesome team has decided this is what they want
2122.353 -> their services to look like.
2123.614 -> This is how they get reuse across all of the services that
2125.989 -> they're building.
2128.054 -> It might be that that awesome start that they've used
2130.187 -> references,
2131.386 -> the the Chase international starter
2133.344 -> or they might have gone there complete own way.
2135.49 -> Inside there, we can expect it to
2137.011 -> be defining all the resources that we expect to make a
2138.928 -> service run on Kubernetes deployment service,
2141.97 -> auto scaling policies, authorization policies, et cetera.
2145.716 -> On the right hand side you can see a definition or an
2148.415 -> extract of our pipelines as code.
2150.679 -> You can see here we're defined an environment specific
2153.073 -> pipeline that this is gonna be reused across the different
2155.808 -> environments.
2157.281 -> What it does is it sequentially applies some terraform does
2160.349 -> a helm apply and then runs a smoke test.
2163.321 -> And you'll notice that this looks like a programming
2165.253 -> language, in this case it's Java, JavaScript.
2168.444 -> And the reason for that is quite important and that is that
2171.001 -> we've empowered our engineers to use technology that
2174.215 -> they want to use and to evolve the ecosystem.
2177.367 -> This wasn't produced as part of a platform.
2180.519 -> This was produced by an engineer who wanted to change the
2182.73 -> way that they were releasing.
2185.52 -> You can see here that we're using some,
2186.92 -> some variables that are coming from the pipeline and some
2189.408 -> that are presumably coming from a package that is being
2192.137 -> reused across them.
2193.777 -> And I think that one of the important things here with
2195.892 -> having this pipeline that's been defined in code and by
2200.026 -> engineers is that they were able to make that choice
2203.063 -> themselves.
2204.935 -> But how do we let them do that?
2207.361 -> Well,
2208.194 -> we need to have some guardrails in place to make sure that
2209.937 -> things are safe right.
2211.895 -> Here we have a a guardrail that
2214.103 -> prevents engineers from deploying docker images that are
2217.58 -> coming from any random source. In this case,
2219.9 -> we only allow people to deploy from our internal repository
2222.841 -> and from ECR itself.
2225.346 -> And those are the actual EKS AMIs in that bottom one.
2230.208 -> Now guardrails I think are a really interesting thing in our
2233.389 -> space. It can be really easy to, to go too far with them.
2237.862 -> And an analogy I'd like to use,
2239.262 -> which I don't really like analogies in general,
2241.104 -> but imagine that you're driving down the highway,
2243.451 -> you're doing about 60 miles an hour, there's three lanes,
2245.764 -> you can pick which one you want and somewhere over to the
2247.889 -> side you've got a guard rail that you probably haven't even
2250.448 -> noticed and you're able to drive down that 60 miles fine.
2253.838 -> You know what you're doing, you're fine.
2255.643 -> Now imagine that we start bringing
2256.476 -> these guardrails in closer.
2258.214 -> Imagine we put it right at the edge of the highway.
2260.251 -> Do you think you'd still be doing 60 or maybe 55?
2262.768 -> And if we keep bringing it in closer and closer,
2265.462 -> imagine that the car just fits in between them.
2266.931 -> You can't even open the doors.
2268.443 -> You're probably not gonna be going much more than five miles
2270.283 -> an hour at that point.
2272.104 -> So guard rails are there.
2273.624 -> In the case of a catastrophic era,
2276.118 -> they're not there to make sure that you do things exactly
2278.032 -> the same way or consistently or to keep you on the path.
2280.695 -> Exactly, you know?
2283.848 -> And when we're doing engineering guardrails,
2285.31 -> we need to think the same thing.
2286.526 -> We need to trust our engineers.
2287.683 -> We're hiring great people and we need to make sure that they
2290.142 -> have the ability to go where they need to go.
2294.061 -> In this case of this guardrail that we have here, you know,
2297.057 -> people aren't gonna hit into it very often.
2299.334 -> Doesn't mean that they haven't.
2300.76 -> And it's just something that happens
2302.528 -> as part of that process.
2304.36 -> Now we can use these guard rails
2306.592 -> and keep them to the right hand side.
2308.082 -> I think an important bit here is that this guard rail,
2311.285 -> we shift things to the right hand side with these rails.
2314.92 -> And that's a bit different because a lot of what we hear
2317.234 -> today is we need to shift everything left.
2319.792 -> What we need to do is shift feedback left,
2322.086 -> not necessarily controls.
2323.89 -> If we were to shift this guard rail to the left hand side,
2326.154 -> we wouldn't be allowing engineers to use DACA hub or
2329.789 -> anything like that for the development.
2331.522 -> Take it to an extreme case with security scanning.
2333.864 -> If someone wrote some code that had a buffer overrun in it
2336.796 -> and they weren't able to run that to see what the APR was
2339.202 -> doing, it'd be a terrible experience.
2341.167 -> What's much better is to say you've written some code,
2343.541 -> it's got a buffer overflow and you're vulnerable to it.
2346.643 -> But to not allow that to run in production,
2348.785 -> we wanna shift that as far to the right as we can.
2351.606 -> So we want protect what we really care about.
2354.897 -> And by doing so,
2355.982 -> we also wanna make sure that people can go fast.
2360.161 -> So that's a little bit about preventative controls now,
2363.684 -> but here we come into what happens when
2366.176 -> you're running services,
2367.393 -> things will go wrong in production and they're not
2369.259 -> necessarily things that you can think of ahead of time.
2371.902 -> They're gonna be things that go out and so on.
2374.531 -> So we need to be able to see what's happening.
2376.517 -> Now for our logging stack, which you can see on here,
2378.097 -> we've chosen to make sure that this is as reliable and
2382.888 -> available as possible. And to do that with, again,
2385.67 -> turn to manage services,
2386.777 -> you can see here we have a number of log sources all feeding
2389.504 -> into CloudWatch,
2390.665 -> which then get synced into S3 via subscription.
2393.913 -> And we can also see that our containers are using fluent bit
2397.084 -> to push to s3.
2398.523 -> What happens here is that with all of these sources,
2403.809 -> we're collecting them into a single place that's highly
2405.62 -> available and resilient.
2410.489 -> Now in this bucket we grew up something that's slightly
2412.881 -> different than what a normal pattern would be.
2415.158 -> We care a lot more about who can read this bucket than who
2418.11 -> can write to this bucket.
2419.417 -> We have a lot of sources all sending logs in and if they put
2421.701 -> something in there that we're not expecting,
2423.587 -> we can probably deal with it on the right hand side,
2425.529 -> but we're secretly implementing another guard rail in here.
2428.873 -> And that's because we might end up logging PII
2431.514 -> in any number of ways.
2432.834 -> A team might enable a debug flag and production to try and
2436.273 -> debug and understand what's happening in their service.
2438.284 -> And that might inherently enable request logging.
2440.967 -> And you might get request headers in there,
2442.597 -> which might contain stuff we don't wanna share people.
2444.69 -> It might be that an object has ended up in a stack trace and
2447.596 -> that's logged in,
2448.429 -> or maybe someone's just logged an object
2449.887 -> that has it in there,
2451.258 -> but we can't let that go in there and we need to be able to
2453.117 -> support cleansing, which is where Log Guardian comes in.
2456.684 -> Log Guardians is a tool that was written by the
2458.186 -> observability platform team whose draw and job it is,
2461.548 -> is to sanitize this PII data out before it gets into the
2464.042 -> hands of any operators.
2467.704 -> So Log Guardian is subscribing to SQSs.
2470.083 -> which are triggered by S3 event notification topics.
2472.82 -> It pulls those objects in, sanitizes them,
2476.153 -> and then pushes these into OpenSearch.
2479.536 -> This gives us a couple of benefits, actually.
2481.519 -> One, this is now a pool based system.
2483.665 -> If OpenSearch is under heavy load for whatever reason,
2486.051 -> all that's gonna happen is Log Guardian is gonna slow down,
2489.273 -> stop pushing things into OpenSearch and it's gonna slow down
2492.14 -> on how fast it consumes things.
2493.964 -> All that's gonna happen here is that our log messages will
2496.399 -> be slightly delayed getting into our OpenSearch and
2499.079 -> container instances. They won't be lost.
2501.14 -> And eventually once the OpenSearch cluster becomes more
2503.847 -> available, they'll make their way through.
2505.983 -> Secondly,
2507.09 -> it also means that we're able to scrub our indexes.
2510.756 -> So Log Guardian is built using a series of filters that look
2516.251 -> for known patterns of PII. Can't just get rid of everything.
2519.481 -> Well then the logging system wouldn't be very useful.
2521.856 -> So what happens is sometimes there'll be a new form of PI
2525.535 -> that comes in and Log Guardian doesn't actually know that
2527.375 -> it's PII, so it makes it way through to our OpenSearch.
2530.035 -> With this model,
2531.179 -> we're able to purge all the OpenSearch indexes,
2534.938 -> update the Log Guardian rules so that
2536.271 -> it filters everything out
2537.979 -> and then republish the messages onto SQS,
2540.842 -> causing Log Guardian to pull all the logs again,
2542.816 -> run the updated rule set and then publish them back.
2546.42 -> So in this way,
2547.596 -> we're able to respond to developer errors or changes in PII.
2553.343 -> Fine.
2556.107 -> Behind this we have some metrics.
2557.938 -> So we need to know what's happening on these systems and how
2560.335 -> they all work and how they're performing. You know,
2563.01 -> we talked about the internet edge and the latency
2565.388 -> requirements.
2566.564 -> So what happens is services are generating from this
2568.689 -> metrics, they just play those onto different ports as they,
2572.596 -> as they determine. So Java services might do port 1990,
2575.924 -> infrastructure services might do six 90,
2579.242 -> and then who knows what else. So again,
2581.681 -> we make sure that we are empowering engineers by using
2584.62 -> Prometheus operator and letting engineering teams who are
2586.783 -> responsible for the services, defining how that those,
2589.178 -> those metrics are scraped.
2591.119 -> Some cases these services might need a resolution of once
2594.322 -> every second to be useful, and in other cases,
2596.159 -> once every five minutes might be enough.
2598.617 -> Again, this is in the control
2599.663 -> of the team who's running the service.
2602.302 -> These Prometheus instances actually running in the clusters
2604.857 -> that are next to the services.
2606.551 -> So you need to be able to log into the cluster and find it,
2609.58 -> which can be a little bit of a pain,
2611.088 -> particularly over a kind of a distributor system running
2613.485 -> many, many, many Kubernetes clusters.
2615.943 -> So to make this a bit easier, we have a,
2618.783 -> a centralized Thanos instance.
2620.899 -> And this Thanos instance is federating all of the Prometheus
2623.183 -> instances that we know about.
2624.743 -> It's collecting them at a lower resolution and archiving
2626.987 -> that data and then displaying that into a single pane of
2629.353 -> glass that everyone can use to get kind of a decent picture
2632.375 -> of what's happening over the entire state.
2635.276 -> And finally,
2636.332 -> another part of observability is tracing.
2638.785 -> What these logging and metrics
2641.009 -> are great for getting a high level
2642.119 -> view of what's going on.
2643.879 -> You can set up alerts on metrics,
2645.287 -> you know, our response rate has changed,
2647.896 -> but what happens when a single thing goes wrong?
2650.1 -> So tracing,
2651.167 -> we're able to instrument a request at the moment it enters
2654.49 -> our system and follow that request all the way through the
2656.894 -> system that's managing.
2659.116 -> At the moment,
2659.949 -> We sample 100% of these traces and we do this because we
2662.9 -> don't know which request we're gonna want to know more data
2665.497 -> on and what to find. We do this using Yaya, open tracing.
2670.67 -> We're looking into adot for Amazon managed services and we
2674.75 -> store this again into OpenSearch.
2678.104 -> So we've been able to prevent some mistakes with guardrails.
2681.086 -> We've been able to get visibility into what's going on with
2683.35 -> some observability and detective. And you know,
2686.824 -> as Verna said, you know,
2688.595 -> everything's going wrong all the time.
2690.486 -> So I pass you back to Paul,
2691.967 -> he'll tell you a little bit about bit more about what we do.
2694.324 -> And things do get wrong.
2704.647 -> - I mean, this is great, right?
2705.631 -> Because I'm just gonna talk about event response and what
2708.773 -> happens when something goes wrong.
2710.453 -> I think the gods are smiling on me.
2712.397 -> So as Courtney already discussed, we kind of run,
2717.053 -> you build it, you run it.
2719.41 -> And we do this for very good reasons, right?
2722.863 -> As we said, Verna Vough was once famous he said,
2724.956 -> everything fails all the time.
2726.246 -> You have to kind of deal with it. And we,
2727.645 -> despite our best efforts and you know,
2729.459 -> everything that Courtney's been talking about
2731.866 -> instance still happen.
2733.766 -> And we believe that the fastest way
2735.816 -> to shut down instant and to ensure that we're kind of
2737.86 -> maintaining that kind of fantastic customer experience we
2740.165 -> talked about earlier about how we build trust,
2742.711 -> is to have the experts, the people that build the service,
2744.743 -> deal with the alerts. And so that's exactly what we do.
2748.098 -> If you build a service, you own that service.
2750.234 -> If the service alerts at 2:00 AM in the morning on a
2753.089 -> Saturday night,
2753.922 -> you get the alert 2:00 AM on a Saturday night.
2756.546 -> And it's incredible once you start putting your engineers on
2759.375 -> call, how quickly your quality of services improves.
2762.935 -> So I can, I can highly recommend it.
2765.084 -> However,
2765.917 -> we work in a a very highly regulated environment as we
2768.54 -> previously discussed.
2770.391 -> And you know, at the point,
2772.783 -> an instant happens and it escalates beyond what an engineer
2775.775 -> could just kind of shut down very quickly by,
2777.042 -> by dealing with alerts.
2778.407 -> We need to bring in an instant management team
2780.294 -> to deal with that.
2781.603 -> We have many stakeholders that need to be communicated to
2784.142 -> both externally and externally. Our regulator, the PRA,
2786.771 -> the FCA, these people want to,
2788.719 -> to understand what's going on. And so we,
2790.298 -> we have a major instant management team. Now,
2792.574 -> the job of the major instant management team is not to fix
2795.407 -> the problem.
2796.754 -> The job of the major instant management team is to
2798.978 -> coordinate the response and what in coordinating that
2801.838 -> response,
2803.028 -> what they're doing is they're freeing up the engineering
2805.695 -> resource that we have to really focus on what the problem is
2808.967 -> and to make sure that the problem is shut down as quickly as
2812.442 -> possible. And then once the problem has been shut down,
2814.756 -> it's the job of the instance management team
2816.322 -> to make sure that the postmortems are
2817.871 -> run the problem tickets are raised,
2819.653 -> and then crucially the problem tickets are shut down.
2822.685 -> We have a very kind of strict mindset on this one.
2826.403 -> When we identify a problem,
2827.936 -> we don't just fix the problem where it was what we ask,
2830.615 -> we step back and we ask ourselves a question.
2832.213 -> Are there any other classes?
2834.549 -> Are there any other places where this could happen?
2837.549 -> What is similar to this?
2839.613 -> And we read across the estate horizontally and we say,
2841.696 -> well if it could happen here, could it happen there?
2843.8 -> And we tend to broaden out our search quite a lot to try and
2846.528 -> find out whether or not an instant could happen elsewhere.
2850.827 -> And what that allows us to do over time and you know,
2853.313 -> is to improve the quality of the platform for everybody
2856.083 -> instance, are gonna happen anyway.
2858.247 -> It's really how well you coordinate your response to it.
2862.241 -> And you know, we we're just like everybody else, right?
2864.752 -> We, we, we take best practice where we can find it.
2867.235 -> We kind of, we model ourselves roughly
2869.286 -> on how kind of instant responders happen to kind of real
2873.277 -> world instance.
2874.115 -> So kind of your fire brigades and your whatnot who are
2876.472 -> dealing with disasters, how do they organize themselves?
2879.248 -> We tend to align along that.
2881.165 -> But the important thing here is have your expertise manage
2884.052 -> the alerts and get them on the call as quickly as possible.
2889.185 -> So we've talked a lot,
2890.317 -> we've talked about the culture We've talked about how we
2891.877 -> build things, we've gone into detail about about many,
2895.233 -> many of the elements of the platform and,
2897.437 -> and hopefully there's a lot there for you to think about.
2900.476 -> I could, I could read out all the bullet points on this,
2903.381 -> but it's probably as boring for you as it would be for me.
2906.923 -> So what would be easier would be for me to kind of just call
2909.483 -> out the things that, you know,
2910.457 -> if you were thinking about doing this yourself are the most
2912.49 -> important.
2914.467 -> So for me it's about building the tooling, right?
2916.64 -> We keep talking about hiring great engineers and creating
2918.51 -> value for customers and building a bank that people can
2921.307 -> trust.
2922.766 -> To do that,
2923.599 -> we have to have an extensive amount of tooling to automate
2926.577 -> how we build, test, deploy,
2927.843 -> and operate software and how we do that within the control
2929.759 -> environment we want.
2931.169 -> And how we do that in a way which hands autonomy to
2932.665 -> engineers.
2934.789 -> So do that, start off with that.
2936.263 -> But you don't have to build the whole goal plated thing in
2939.004 -> the first go.
2939.852 -> You have to build the first version of it that will support
2941.553 -> what you need. And as you iterate your platform,
2944.333 -> iterate your services so you iterate your tool chains as
2947.57 -> well.
2948.403 -> And so all those things kind of moved together.
2950.861 -> When you're doing that optimize for self-service.
2953.578 -> You don't wanna create bottlenecks where people are con
2956.224 -> constantly blocked waiting for someone else to do something
2959.086 -> on their behalf as much as possible.
2960.413 -> What you wanna do is kind of that low power distancing you
2963.776 -> want kind of push responsibility to the the people that are
2966.133 -> Writing the code and getting things done.
2968.802 -> And then thirdly, the reason we do this is autonomy.
2972.185 -> We want to go out and hire the best engineers that we could
2974.659 -> possibly find.
2975.93 -> We want to bring them in and we want to give them every
2978.033 -> opportunity to create great experiences for our customers.
2981.064 -> And if that is what we want to achieve, that autonomy,
2984.8 -> we need to bring those people in and then we need to get out
2987.066 -> of their way because by bringing them in and getting them
2989.68 -> out of their way allows them to do their work and to create
2991.794 -> that kind of wonderful experience for our customers.
2994.221 -> That platform, that scales,
2995.287 -> that platform that we talked about that can kind of build
2997.15 -> trust.
2997.983 -> And that's what we've been doing for the last three years at
3000.399 -> Chase. We launched, you know, back in September last year,
3002.857 -> we've already acquired over a million customers.
3005.421 -> We've got over 10 billion pounds in deposits.
3009.395 -> I think it's kind of working.
3010.835 -> I see a really kind of bright future for it.
3012.549 -> So that would be my advice to you.
3014.573 -> So they're the final words for me and I'm gonna hand over
3017.191 -> to Colin and I think he's gonna close it out.
3019.999 -> - Try and keep my microphone on.
3022.555 -> And just last opportunity to say a big thank you.
3026.545 -> Firstly, thank you to our speakers, Paul and Courtney,
3029.696 -> for making the time to tell their story.
3032.15 -> Secondly, to our audience,
3033.722 -> I'm sure many of you have traveled quite a long way to be
3035.813 -> here, so thanks for coming.
3037.888 -> And finally to our customers at Chase International who make
3041.445 -> all of this possible.
3042.646 -> So please make sure to fill out the survey and if you see
3046.37 -> any of us about throughout the event,
3048.505 -> please do make sure to come and say hi at some point.
3051.154 -> Brilliant. All right, thanks everyone.

Source: https://www.youtube.com/watch?v=kzsxVZvdoJA