AWS re:Invent 2022 - How Moody’s uses serverless and microservices for ESG scores (FSI205)

AWS re:Invent 2022 - How Moody’s uses serverless and microservices for ESG scores (FSI205)


AWS re:Invent 2022 - How Moody’s uses serverless and microservices for ESG scores (FSI205)

Environment, social, and governance (ESG) considerations are reshaping financial markets and risk-based decision making for investors. Moody’s offers a scalable solution to publish ESG data-driven scores with transparency to support detailed analytics. ESG results are incorporated into client-facing platforms and risk-based models for portfolio management. Doing so requires unifying multiple data sources, cleansing that data, and analyzing it without manual intervention. In this session, learn how Moody’s built a scalable serverless platform on AWS to address this challenge. Moody’s shares how its serverless and microservices architecture allows them to seamlessly publish ESG scores with data lineage and transparency using AWS services.

Learn more about AWS re:Invent at https://go.aws/3ikK4dD.

Subscribe:
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4

ABOUT AWS
Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.

AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

#reInvent2022 #AWSreInvent2022 #AWSEvents


Content

0.45 -> - Thank you for joining us today, welcome everyone.
3.96 -> I know it's been really busy.
6.66 -> It's still the first day.
8.73 -> My name Sri Sontineni and I have Divya Elaty here.
14.28 -> Divya is the senior Vice President at Moody's ESG,
20.31 -> and she leads the cloud platform engineering team.
24.93 -> My role at AWS is a senior engagement manager,
28.92 -> and I deliver cloud projects for financial services firms
35.46 -> such as Moody's.
36.78 -> And this is one of our project that we delivered,
41.46 -> and it's around how we use serverless and microservices
47.73 -> to generate ESG data that can be leveraged
52.5 -> by other financial services firms.
56.25 -> So without further ado, starting off with the introduction.
62.52 -> Starting off with the agenda.
65.73 -> What we want to cover today is
68.25 -> give a quick introduction of Moody's
70.26 -> and explain what Moody's business is.
75.72 -> Then give a basic understanding of ESG
80.82 -> and the factors which are used to measure ESG
83.76 -> so that you can connect with the story.
86.52 -> And after that, we'll go a little bit deeper
90.27 -> into the architecture and explain the product,
94.14 -> the various steps and how we achieved the product,
100.32 -> which was released earlier this year
103.2 -> to thousands of companies.
109.68 -> Quickly talking about Moody's.
112.53 -> Moody's is a global risk assessment firm,
116.43 -> and it does two things in a great way,
121.32 -> which is primarily widening credit trading opinions,
125.64 -> and also providing financial risk assessment solutions
130.89 -> that can be leveraged by companies to do,
134.07 -> to do business better and make decisions faster.
141.06 -> Moody's has two entities.
142.71 -> One is the Moody's services, which does the credit ratings.
147.9 -> And these credit ratings are used by investors and insurers
153.15 -> for their debt instruments.
155.01 -> And most of you might have heard about credit ratings that
158.633 -> and the opinions that Moody's provide.
162.24 -> Moody's or MIS has been doing this
165.72 -> for the last hundred years.
168.93 -> So far, MIS has rated 73 trillion dollars in debt,
174.87 -> and this was done across 35,000 entities and transactions.
183.48 -> Moving on to Moody's Analytics.
185.49 -> Moody's Analytics was founded as a separate entity in 2008,
193.74 -> and Moody's Analytics initially started off
198.03 -> as a licensing of credit trading opinions,
202.32 -> but slowly it's spread and currently it provides
207.51 -> analytics solutions that helps with
210.42 -> risk management and growth.
212.64 -> And this can be a licensed product that can be used
216.577 -> and that can be used for your business in,
220.17 -> I mean for your business and gaining insights.
224.37 -> Currently, there are 14,900 customers
227.73 -> who use Moody's Analytics.
230.31 -> And going onto the next slide.
236.13 -> A quick introduction of ESG.
239.577 -> ESG stands for environmental, social, and governance.
242.79 -> And I believe you all know that, that's why you are here.
248.16 -> So diving a little bit deep into the fact that
251.88 -> what constitutes these three pillars.
254.73 -> So for the environmental pillar, as you can,
259.95 -> as you already know,
261.93 -> conservation of the natural habitat and environment
265.89 -> forms the crux of it.
267.87 -> And this is measured using carbon emissions, right?
272.13 -> Which is a common term,
273.99 -> I'm sure while you're taking a flight here,
276.84 -> you know exactly what is the carbon emissions
279.51 -> that we are leaving, right?
283.32 -> Apart from that, we do have various other factors
286.77 -> such as water usage.
289.14 -> And I was so happy to hear the news that
293.04 -> AWS is gonna be water positive by 2030, right?
298.26 -> That was the news that came in today morning.
301.29 -> What that means is all of the data centers
303.39 -> will produce more water than it consumes.
308.34 -> Other factors also include deforestation,
311.73 -> water usage, and many more.
315.9 -> The second pillar is social.
319.2 -> Social talks about the relationships between
323.85 -> the different people businesses interact with.
327.27 -> These can be customers, these can be the employees,
332.91 -> these can be the communities the businesses serve.
336.15 -> So the social is measured using factors
341.43 -> such as workforce diversity.
346.02 -> Workforce diversity is a factor that can help us understand
349.77 -> how underrepresented groups and make sure there is equality.
356.16 -> Gender and equal pay policies, health and safety,
360.87 -> some of the other factors that contribute to social.
366.51 -> Finally, we have the governance.
369.21 -> Governance are standards for a company,
374.22 -> and on how it is running its company
378.21 -> and the ethical policies that make ensure
381.99 -> that the company is running in the right way.
386.85 -> And this can be primarily make sure
392.04 -> using the board structure,
394.5 -> the audit structure to make sure there is transparency.
398.43 -> Executive compensation to just make sure that
402.36 -> it is following standards with lower policy.
408.24 -> Tax transparency are some of the factors
411.27 -> that contribute to this governance pillar.
416.1 -> Now that we know what ESG is,
419.85 -> how many of you all think that ESG is a mandatory regulation
425.73 -> that companies have to do
427.32 -> and it is another reporting that you have to do?
431.49 -> You can raise your hand or nod your head
434.25 -> if you think it's a regulation.
440.49 -> So, I was in the same bucket.
443.64 -> I was thinking it was a regulation, right?
445.59 -> This was few years back.
448.26 -> But soon, soon I realized it's a mistake, right?
452.43 -> Firms shouldn't treat ESG as a mandatory regulation,
456.3 -> though there are some reporting needs.
459.36 -> It should be treated like a risk.
462.51 -> And primarily for financial services,
464.79 -> it is a reputational risk, right?
467.55 -> The risk, and I can deep dive a little bit into it
471.54 -> because it's close to my heart.
473.58 -> It is a reputational risk because of the V.
479.22 -> Everyone here is making decisions, right?
483.66 -> Three years back, you wouldn't have made,
486.96 -> you wouldn't have probably considered the things that
492.78 -> into your decision making process
495.6 -> because of things like black life matters, right?
499.14 -> So black life matters is an example of social justice where,
504.554 -> where, where equality is at its core, right?
509.58 -> The second example we have is global pandemic,
513.75 -> where we have clearly seen there is
516.42 -> economic fissures across the world.
520.77 -> And finally, we all have seen wildfires, right?
526.2 -> And wildfires across the globe and within the US,
530.4 -> they really changed our mind about climate, right?
534.54 -> So climate is, climate change is a current thing,
539.34 -> and we all have to be trying to make,
543.72 -> we all have to make decisions
545.64 -> so we can reverse that or at least stop the decline.
550.77 -> So with that in mind,
552.18 -> reputational risk is the key word I want you to take away
555.72 -> from this slide.
558.36 -> Now with that in mind,
560.79 -> there are various financial services priorities
565.29 -> that we have.
566.123 -> Some of the examples that we do have is
569.04 -> primarily asset management, right?
571.17 -> So management firm or asset fund manager
576 -> want to create sustainable investment funds
579.6 -> with the help of underlying ESG data.
585.81 -> And secondly, banks.
589.74 -> So bank wants, banks are the biggest, what do you say?
596.55 -> Taking a step back, banks lend us loans, right?
600.54 -> Bank lend loans to the businesses as well.
603.27 -> So the biggest emissions are on its balance sheet, right?
608.46 -> If you're lending loans to a firm that causes emissions,
613.26 -> banks are asked to start disclosing the emissions
618 -> and this is called as financed emissions, right?
620.34 -> The financed emissions need to be disclosed by the banks.
623.58 -> And this slowly but surely is becoming a mandate
626.67 -> because three GHG reporting, by according to that,
631.89 -> you have to disclose the financed emissions.
636.03 -> And finally, every firm is reporting its ESG
641.07 -> using sustainability reports
643.17 -> and that also needs benchmarking data.
646.11 -> So the common layer that you see across all the priorities
650.16 -> is this central ESG data that is curated, certified,
659.31 -> and has lineage where you can make sure that everyone knows
666.09 -> where this data is sourced from and how I can use the data
672.06 -> and provide feedback in case
674.67 -> the customers need any additional factors.
683.07 -> Going to the next slide.
685.14 -> So, so Sri, what's the big deal?
687.15 -> Go collect the data.
688.41 -> Every asset fund manager can do it,
690.42 -> every bank can do it, right?
692.16 -> Just go do reporting.
694.08 -> So the key challenge that you have is, first thing,
700.56 -> there is no single standard of reporting.
703.35 -> What that means is sustainable report of Amazon is different
707.19 -> from the sustainable report that's produced by Moody's.
710.25 -> So when you deal with unstructured data,
712.78 -> there is a lot of manual work involved in it,
715.8 -> and that causes cost, right?
718.35 -> And errors.
719.58 -> So that's the first challenge, right?
722.01 -> And also it's voluminous.
723.21 -> When you think of large gap in, in,
725.61 -> in an index fund, right? SND.
727.86 -> So it's, it's like a lot of firms that you have to
731.85 -> get the data for, curate it, make it usable, right?
737.64 -> Forget about insights.
740.64 -> The second challenge that we do have is
744.33 -> primarily infrequent reporting.
747.42 -> So ESG is sustainable report is only done annually.
752.7 -> So if you are doing it annually,
755.88 -> let's take energy and gas sector
758.25 -> where there is an oil spill.
760.59 -> All of a sudden that oil spill is not reported.
764.49 -> It might be a news even, but an asset fund manager,
767.88 -> if he's not actively catching up, he wouldn't get that feed.
771.39 -> And what that means is your data, ESG data
775.32 -> is not gonna correlate with your stock price
778.02 -> or the financial information,
780.39 -> and the data will soon become irrelevant, right?
784.08 -> So that data has to be faster and proper and you know,
791.31 -> and should be accessible to the asset managers.
794.76 -> And finally, after all of this,
797.49 -> uncovering the insights from the ESG data.
801.84 -> There are 400 plus factors that you can consider
807.18 -> to make a decision.
808.74 -> Now, what is the weight
810.42 -> for each of the factor that you're gonna use?
813.48 -> This is something that you need to go to an SME,
818.16 -> create a model, build a weighted model to figure out
822.81 -> how you can even start using it.
825.9 -> So these present a challenge to all the asset managers
832.11 -> and 66% of the asset managers who were involved in a survey
839.67 -> asking why they are unable to create
842.31 -> sustainable funds using ESG data,
844.32 -> 66% of them told that the reason is
850.47 -> accessibility of this data, right?
855.48 -> This slide is talking about the financials,
858.6 -> like, well, there is a business case, I agree with you,
861.84 -> but does it make financial sense?
864.93 -> And absolutely it does, is what this slide is talking about.
869.91 -> The first three things that you see here
873.18 -> is just talking about the increase in sustainable funds,
879.84 -> investment funds, and how they are performing, right?
884.97 -> When you compare with other companies
888.75 -> who are not these sustainable indexes
892.2 -> and the funds are performing much better.
895.11 -> And finally, this is the market capitalization.
900.48 -> If someone starts providing the ESG data tomorrow,
905.07 -> the licensing of that data can fetch you this much.
909.12 -> Is a high level analysis,
911.07 -> but it just brings to the business case.
914.01 -> Is there a business case?
915.63 -> And this is talking about the business case.
921.09 -> Quickly moving to what is,
922.92 -> why is Moody's interested in this
926.07 -> and what is Moody's wanting to do.
929.67 -> So as I have introduced at the beginning that
933.42 -> Moody's is in risk management and assessment space,
939.42 -> and it has hundred years of experience in that space,
943.26 -> so it's only natural that Moody's customers
946.59 -> are going to ask, Hey, why don't you take
948.54 -> this data into your risk models, right?
951.57 -> So that, that's the inception of it.
957.84 -> Also, there's an increased ask
960.66 -> for primarily regulation needs,
962.76 -> the SFDR reports, which is a Europe based regulation,
967.11 -> European based regulation,
968.49 -> so there is a need for the reports.
972.21 -> So also tackling that for Moody's customers.
976.53 -> And then the final thing is creating a new revenue stream
980.52 -> as we have seen there is capital, there is, you know,
986.85 -> new revenue stream that they can tap into.
991.02 -> With that I'll hand over to Divya so that
994.29 -> we can go through the five steps
996.36 -> and deep dive into the architecture.
1003.65 -> - Thank you Sri.
1005.87 -> Hello everyone.
1006.703 -> I'm Divya Elaty, the Senior Vice President at Moody's,
1010.7 -> heading the ESG cloud engineering team for data, right.
1014.93 -> So I'm here today to walk through our journey
1018.92 -> in gathering ESG data, right?
1022.07 -> So what does it involve to gather ESG data?
1025.52 -> So let me break this down into five simple steps.
1029.99 -> Step one, before you start capturing any of the ESG data,
1035.36 -> you would like to, you know,
1037.22 -> lay the foundation for your ESG reference data.
1040.55 -> And I'll walk you through
1041.72 -> details of each of these steps, right?
1044.18 -> So lay that foundation of ESG reference data.
1048.05 -> Step two, once you have that foundation
1050.51 -> where you know what you are after,
1053.54 -> then you go and start collecting
1055.88 -> the publicly disclosed ESG documents by a given company.
1061.94 -> That's step two.
1062.84 -> Once you have the documents collected, right?
1066.11 -> Now, your goal is to extract the ESG related information
1070.01 -> surrounding to your methodology from within these documents.
1074.42 -> That's where AI machine learning comes into picture.
1079.01 -> And step four, once you have extracted
1082.07 -> the information from these documents,
1084.8 -> now you enrich the data, you web this data,
1088.55 -> that's where ESG specialized analyst comes into the picture
1092.15 -> to review this data and validate it.
1098.12 -> Once it's vetted and validated,
1100.31 -> the last step involves, it's now ready for distribution and
1105.56 -> given a certain ESG scores based on your methodology.
1110.27 -> Sounds so simple and straightforward, right?
1113.26 -> So now let's see how the architecture
1116.93 -> looks like if I combine all of these five steps
1120.23 -> together into one picture.
1124.28 -> That's how it looks like.
1126.47 -> Looks like a mini village of its own, right?
1129.83 -> But I will break down each of this architecture
1133.55 -> in detailed steps in the upcoming slides.
1137.84 -> All right, let's go to step one.
1141.56 -> Gather your ESG reference data.
1146.21 -> So what does gathering ESG reference data would mean, right?
1151.19 -> Establish a list of your reference tables
1154.22 -> that might contain, for example,
1157.46 -> what is your ESG methodology that you want to go after?
1162.05 -> And the methodology would involve, you know,
1164.45 -> what are these metrics, ESG related metrics?
1168.05 -> What are its criteria?
1169.58 -> What are its indicators to name a few, right?
1173.03 -> Once you have that foundation laid,
1175.52 -> the next step is obviously, which companies
1178.7 -> are you really interested in capturing this data?
1181.13 -> What is your coverage looks like for ESG data, right?
1186.35 -> So these are some of the minimal reference data examples
1190.52 -> that I can name a few, right?
1192.17 -> Once you have this and establish the relationship,
1194.87 -> of course it's reference data tightly
1196.91 -> establish those relationship within these data sets, right?
1202.55 -> We also have a need where we wanna get certain attributes,
1206.48 -> additional attributes associated to these reference dataset
1209.42 -> from our trusted external sources.
1213.68 -> For example, index data or exchange rate data to name a few.
1218.87 -> So you have to have a pipeline to fetch this data.
1222.62 -> The last step involves a very important step in this process
1226.55 -> is reference data management, right?
1229.55 -> So how do you manage these reference data?
1231.86 -> So what does reference data management involve?
1235.37 -> So it could be as simple as a change management
1238.34 -> where you might wanna add a company to your coverage,
1241.22 -> or you might wanna remove a company from the coverage,
1244.4 -> or you might wanna add a metric for your methodology,
1247.88 -> or you wanna remove them from your methodology.
1252.44 -> Sounds simple but if I dig deeper into further scenarios,
1257.12 -> what happens if your company moves sectors?
1261.44 -> From one sector to another sector, right?
1265.22 -> How does it involve or affect your subsequent
1268.61 -> data capturing process journey?
1271.13 -> What happens when there are acquisitions
1273.32 -> or mergers within a company?
1274.97 -> How does that affect your reference data, right?
1279.71 -> And how do you ensure like all of this is tied together
1284.36 -> before going into the next subsequent steps.
1287.21 -> So these are the core foundations
1288.77 -> of capturing your reference data.
1292.22 -> Now let me walk you through what are some of our challenges
1295.34 -> that we have faced in handling the reference data.
1300.44 -> Data governance is one of our biggest challenge.
1305.33 -> And it's not just for these five steps of ESG
1308.48 -> centric data gathering process that it is a challenge here.
1312.35 -> No, it's across the enterprise.
1314.99 -> When you look at this data, right?
1317.45 -> It's an, we have learned that it is not just for ESG
1321.98 -> but it is an enterprise level taxonomy
1324.77 -> used across enterprise products and services
1328.4 -> that might include your credit ratings
1330.65 -> or risk assessment products.
1332.72 -> You gotta have this, you know,
1335.18 -> consistent system of record
1337.01 -> across your enterprise data sets.
1339.95 -> So how this ESG data can evolve into core fundamental
1344.03 -> ESG reference data across an enterprise?
1348.68 -> How did we go about solving for some of these challenges?
1353.48 -> So this is how the high level architecture looks like.
1357.08 -> On the, on my right side you see
1361.37 -> a pipeline that has, you know,
1366.56 -> pulling the data sources from our trusted external vendors.
1371.36 -> We have leveraged AWS glue for that.
1373.85 -> And on the, on my left side,
1376.79 -> you see an interface for change management, right?
1381.77 -> From a user perspective.
1384.74 -> In the middle is what you see the microservices.
1388.61 -> Now, these microservices are not just used
1391.1 -> for change management of this reference data alone,
1395.54 -> but they are also integrated with the subsequent five steps
1399.47 -> that I'll walk you through in this ESG data journey.
1403.01 -> This is all reusable and the reason being
1405.267 -> what this microservice gives you two benefits.
1408.5 -> One, you're not duplicating your code.
1411.89 -> Second, you know, this is ensuring your data lineage
1415.82 -> and consistency across your other systems,
1419.3 -> whether it could be those systems could be internal to ESG
1422.3 -> or it could be external to your enterprise.
1428.45 -> There.
1429.283 -> So now we have collected our reference data.
1433.34 -> Now we are ready to go after our documents for ESG.
1439.49 -> So most companies disclose ESG data.
1441.41 -> You know, these are publicly available documents
1445.1 -> that we can go and collect after.
1447.02 -> Now, what does this document collection process involve?
1451.76 -> So we have global ESG analysts across the wall
1456.62 -> who are expertise in ESG that are
1458.99 -> going after these companies and collecting these
1461.81 -> publicly disclosed ESG documents.
1468.59 -> Now, what is important as part of this collection process is
1471.56 -> you cannot just collect a document without
1474.62 -> tagging a metadata to a document, right?
1477.44 -> So you wanna have that lineage intact.
1480.77 -> So as part of this collection process,
1483.08 -> you want to tag the document with the metadata associated
1488.21 -> that you have created, or sorry, what we have created
1491.66 -> as part of the step one process, right?
1494.6 -> So tagging that metadata to that document
1497.48 -> is important from data lineage and consistency
1499.88 -> point of view.
1501.68 -> Now, once you have collected these documents,
1504.74 -> you prepare these companies into batches or portfolios
1509.66 -> as we call it internally,
1511.34 -> to feed them into the next step in this journey,
1515.69 -> which is extracting the information
1517.28 -> off of these documents, right?
1520.22 -> Sound straightforward. Not really.
1523.19 -> Now let's see what are some of the challenges that we faced.
1527.45 -> Data inconsistency and manual collection, right?
1532.4 -> Why data inconsistency?
1534.59 -> As we stated, not all companies
1537.38 -> disclose this information in a standard form.
1540.44 -> There is no standard way of reporting ESG data,
1544.82 -> and every company has its own way and style.
1547.52 -> Some companies give this out in PDFs,
1550.04 -> some companies have their own HTML pages
1553.1 -> and some companies even go a little deeper into these
1557.39 -> where they can have embedded HTML links
1560.12 -> inside the same webpage
1561.41 -> and that level can go as deep as it can, right?
1567.2 -> So that's part of the challenge.
1569.72 -> Now, the other part of the challenge is
1574.19 -> not all companies have the same cycle to report the data.
1577.85 -> Some of them do quarterly, some of them do annually, right?
1581.69 -> So that is also causing a challenge in capturing this data.
1585.26 -> You don't have a set cycle or a set window
1587.9 -> when a company can disclose its information.
1593.03 -> And data lineage, I know you will be hearing me
1594.92 -> talking about data lineage over and over,
1596.84 -> bear with me because it is most important piece
1599.45 -> in this collection process.
1602.78 -> And of course it's a manual process today
1605.6 -> where analysts are spending so much of time
1608.63 -> in capturing this data.
1611.3 -> We are in the process of automating this,
1613.46 -> mostly leveraging machine, machine learning libraries,
1618.05 -> but we are not really there yet in that journey.
1620.99 -> For now, it's semi-automated, as I call it, right?
1625.37 -> So what does the architecture look like, right?
1628.82 -> Same serverless framework as you see in the reference data.
1632.99 -> The only difference is, you know,
1634.52 -> you have a front-end interface to upload this data
1638.12 -> on the left side,
1638.953 -> and on my right side is what you see DynamoDB
1642.44 -> that is used to store the metadata of these documents
1645.44 -> and S3 to store the PDFs or any version of the documents.
1650.51 -> In the middle, again, you see microservices
1653.03 -> and these microservices are not just
1656.3 -> microservices used to capture the documents,
1658.91 -> but they're also tightly integrated
1660.74 -> with the microservices that we have created
1662.72 -> in step one of reference data
1664.85 -> from a system of record point of view, right?
1669.17 -> And bear with me, these, these microservices are also
1673.01 -> gonna be used to be tightly integrated with the next step
1676.07 -> in this journey, right?
1680.36 -> Great.
1681.193 -> So now we have captured our reference data,
1684.77 -> collected the documents, what next?
1688.4 -> Now we are ready to extract the information
1691.49 -> from the PDFs or HTMLs or unstructured data.
1696.62 -> So that's where AI machine learning comes into play here.
1703.22 -> What does this involve?
1704.81 -> Now remember, we have created portfolios
1707.42 -> and those portfolios can include
1710.39 -> group of companies that have completed
1712.97 -> the document collection process.
1715.55 -> Or it can also include, you know,
1718.85 -> companies that are grouped under a specific sector
1722.48 -> or any combination.
1724.82 -> Now this portfolio of documents
1727.4 -> are fed into our machine learning AI,
1731.27 -> AI machine learning process,
1733.31 -> which will go and extract the information
1737 -> surrounding to our methodology of ESG
1740.24 -> and pull this key paragraphs or data out.
1745.16 -> Right.
1747.08 -> Now once it's extracted, now we have to prepare this data
1752.57 -> and be it ready for our analysts to come in
1756.02 -> and review this data in the next step.
1759.32 -> Now let's see what are some of the challenges we faced
1763.76 -> during this process.
1765.29 -> Now, with any machine learning model,
1770.12 -> you won't get a hundred percent of accuracy on day one.
1773.84 -> It's bound to have some noise
1776.06 -> or more noise depending, right?
1778.91 -> So it's that, that some of the challenges
1782.87 -> involve redundancy of the data.
1784.46 -> For example, you know,
1785.87 -> the same information of our methodology
1790.1 -> being found in multiple pages of a single document,
1793.46 -> let's say for example,
1795.53 -> that's one of the challenge.
1797.45 -> Second challenge involves the complexity
1800.63 -> of how companies might embed this data inside the document.
1805.19 -> Let's say, I don't know, maybe another HTML box
1809.09 -> or a table format, something that
1811.25 -> machine learning did not anticipate to pull this
1813.56 -> or expect to have the data inside that, right?
1817.79 -> Or it could be as simple as basic page number
1821.45 -> where the information was found to be missing.
1825.59 -> So for that machine learning to improve its efficiency
1829.64 -> over a period of time, it needs active learning, right?
1833.54 -> So you gotta make sure that the analysts
1836.39 -> who reviews this data or make any corrections,
1839 -> get actively fed back into this process
1843.14 -> and make sure that the model learns using this data.
1851.03 -> And the last but not least is scaling, right?
1855.68 -> The portfolio of companies can include X number.
1860.9 -> It could be big at point in time depending on the cycle
1863.99 -> at which company is disclosing information,
1866.42 -> or it could be small, right?
1868.16 -> So your model should be in a position to scale itself
1871.13 -> to accommodate those kind of datas.
1875.51 -> Now, in this architecture, I'm gonna walk you through
1881.15 -> what happens after machine learning
1883.13 -> extracts the information, right?
1885.29 -> It's not once the portfolio of batches
1889.67 -> is being sent to our machine learning model,
1892.94 -> once the processing is done,
1894.71 -> it sends a signal to the next process.
1898.73 -> And we have leveraged AWS Lambda to scale up or down
1905.69 -> depending on the number of records or number of
1909.02 -> the amount of data the machine learning model has,
1911.99 -> you know, pulled from that particular portfolio.
1915.8 -> So depending on how big or how small the portfolio could be,
1920.36 -> your lambda just scales up and down
1923.12 -> and consumes or in just this data
1925.25 -> into for the next step to come in.
1931.28 -> Great.
1932.113 -> So we have reference data, we have documents,
1935.3 -> we have extracted information off of these documents.
1938.3 -> What next? Right?
1940.46 -> So this is where our, what we call it as data enrichment
1944.81 -> where our analyst comes in to review this data, right?
1950.72 -> So let's see what does this involve.
1955.19 -> So the data points that we receive from
1957.47 -> machine learning process
1959.81 -> doesn't just go all ready to be consumed by an analyst yet,
1963.26 -> there are certain steps that happens in between.
1966.62 -> So what does that involve? Right?
1969.92 -> So the first and foremost is it standardizes the data.
1973.46 -> What is standardization? Right?
1976.73 -> Standardization is nothing but, you know,
1979.52 -> depending on where the company is located
1981.56 -> or what their convention look like,
1984.65 -> the metrics can be displayed in its own convention.
1988.01 -> For example, some companies can disclose them in kilos,
1992.81 -> some companies disclose the same information in tons
1995.57 -> depending on where they are
1996.95 -> or what their standard approach is, right?
2000.46 -> So we go through standardizing of that data
2003.37 -> into local convention to make it easy for an analyst to
2008.77 -> review a data so they don't have to do their math
2010.99 -> on their mind while they're reviewing the data, right?
2014.88 -> And the second thing that happens is the data validation.
2019.6 -> As we discussed, there could be some noise
2022.72 -> generated from machine learning, right?
2024.61 -> So you filter that noise as much as possible
2028.42 -> to reduce overhead on an analyst.
2031.33 -> So drop those redundant data sets,
2033.61 -> which you don't need to, right?
2036.67 -> So that's one of the cleansing.
2038.17 -> And the second step that happens is make sure that
2041.8 -> this data is neatly tied with all the previous steps
2046.12 -> that we have described.
2047.32 -> Make sure it's tied with your reference data.
2049.09 -> Make sure it's tied up with your document data
2051.13 -> that you have collected.
2052.63 -> So once you have that system of record nicely organized,
2056.47 -> your analyst to come in and review it,
2060.58 -> that's what happens, you know, before all the pre steps.
2064.69 -> Now it's ready for analysts to review it, right?
2068.47 -> What does analyst review process look like?
2072.31 -> So we have, it's done in two phases and four steps.
2078.7 -> So that's how rigorous our review process
2081.37 -> is with around this data.
2085.36 -> And I'll talk more about the architecture.
2087.37 -> What does that look like
2088.45 -> from an analyst point of view, right?
2090.697 -> And the last step is, you know, finalizing this data
2094.45 -> and ready for consumption globally.
2099.43 -> What are some of the challenges?
2102.25 -> There.
2103.083 -> Depending again on the size of your portfolio,
2106.42 -> you might have data coming back from machine learning model.
2111.43 -> Millions and millions of data points.
2114.1 -> So the architecture should be scalable enough
2117.82 -> to handle this volume.
2119.2 -> This is per portfolio by the way, right?
2122.29 -> And the second piece is you have to obviously make sure
2125.77 -> your architecture is highly available
2128.47 -> because of our global presence of our analysts
2131.56 -> who are consuming this data.
2135.22 -> The next step is, while we were part of this journey,
2139.36 -> we were still evolving in terms of data requirements.
2143.02 -> So your design should be flexible enough
2147.7 -> to handle those changes or modifications, right?
2152.71 -> And the last step obviously, which is,
2154.87 -> which I haven't stressed in my previous steps
2157.06 -> is data validation,
2158.38 -> which happens at every step from completeness perspective.
2161.8 -> Hey, did you capture all your documents in your step two?
2165.31 -> Did you capture, you know,
2166.99 -> did machine learning generate or capture
2169.09 -> all the metrics associated to your methodology
2171.76 -> in your step three?
2173.14 -> Step four comes in,
2174.58 -> did your analyst review every single data point
2177.28 -> surrounding for your methodology
2178.69 -> from completeness perspective?
2181.09 -> So, these are some of the data validations.
2184.9 -> So how does the architecture look like?
2190.48 -> So this is not as straightforward as it looks on paper,
2194.95 -> but when you look at it from left and right point of view,
2200.23 -> on the left side is where we have our user interface
2204.04 -> who comes in to look at this data
2207.82 -> and on the right side is what we have our backend,
2212.83 -> it's on DynamoDB obviously.
2215.65 -> And you can also see in this picture streams.
2220.54 -> So DynamoDB streams is what we use to standardize this data.
2224.8 -> We also have leveraged Athena connector for DynamoDB
2229.09 -> for anyone who is SQL proficient,
2233.23 -> if they want to look at this data,
2235.42 -> we don't have to do anything additional.
2238.27 -> Middle is what you see the microservices, right?
2243.76 -> Now what are some of the technical checks
2248.41 -> that happens around this,
2250.12 -> this tool here that we are talking?
2252.76 -> Flexibility, as I said is very important.
2255.28 -> What is important when you are considering the architecture
2258.34 -> is understand your query patterns,
2261.49 -> understand your needs of the data.
2264.25 -> How an analyst is asking for what kind of data,
2267.52 -> what kind of flexibility.
2268.78 -> For example, are they looking at the data
2271.57 -> at a criteria level?
2273.07 -> Are they looking the data at a sector level?
2275.86 -> Are they looking the data at a company level?
2278.65 -> Right?
2279.483 -> So give that flexibility.
2281.53 -> For you to give that flexibility
2283.45 -> you have to nicely organize the data
2285.37 -> in such a way in your DynamoDB to leverage, you know,
2289.12 -> those indexes and keys nicely organized
2291.73 -> so it can give you that optimal performance and efficiency.
2297.55 -> So flexibility was our biggest task.
2301.39 -> Now it doesn't depict in this picture,
2304.06 -> but you don't want to be waiting
2306.34 -> for analysts to complete the review and then figure out,
2309.25 -> oh you know what, this is not right, this is not right,
2312.52 -> this you have to go and fix,
2313.81 -> this you have to go and fix.
2315.64 -> So the tool has to be flexible enough that
2318.28 -> it does that check as the analyst is reviewing the data
2322 -> and notify them of these issues.
2324.28 -> Let's say, you know, what happens actually
2327.01 -> when analyst is coming in this tool is
2329.8 -> they can do two things.
2331.66 -> They can create some new entries
2334.87 -> where machine learning was not able
2336.46 -> to pull it out of their document, right?
2339.07 -> Or they can modify an existing entry where machine learning,
2342.19 -> they think that machine learning did not pick
2343.807 -> the right information.
2347.17 -> So as part of these corrections,
2349.24 -> you have to ensure that the lineage is not broken at all.
2353.77 -> It has to be still consistent across the board.
2357.16 -> So all those checks should actively happen in the tool
2360.94 -> while they're reviewing the data and notify them,
2364.15 -> Hey, go and make this correction.
2368.32 -> Right.
2370.84 -> So that's just naming a few of the data validation checks
2374.26 -> that happens on this tool,
2375.49 -> but there are many more to begin with.
2378.07 -> You might also wanna ensure that
2381.31 -> you also notify your analysts that your,
2384.85 -> there are no outliers or gaps in this data
2387.52 -> as they're correcting.
2388.69 -> And you also don't want to,
2390.46 -> pretty sure you don't want them to
2392.14 -> be looking at the same company, obviously, right?
2394.9 -> Two different analysts
2395.77 -> cannot be working on the same data set.
2398.56 -> So you have to build in all of those
2400.57 -> checks and balances inside the tool
2403.18 -> for them to be able to doing their day to day jobs.
2408.01 -> That's just to name a few,
2411.4 -> but there is more that really happens in this process.
2415.21 -> Now that the analysts had vetted this data,
2419.41 -> the last step of this is score and distribute.
2423.07 -> We are now ready to roll, right?
2426.01 -> The analyst reviews the completeness of the data
2429.58 -> for an entity and it prepares a report and then say,
2433.75 -> Hey, you know what?
2435.07 -> I'm done with these companies.
2436.87 -> Boom!
2437.703 -> It sends the, goes to the next process,
2441.04 -> generates the scores, but it's,
2444.61 -> I'll talk through what are different steps involved
2446.77 -> when they're, you know, scoring a company.
2450.31 -> Obviously, you know,
2451.33 -> validate and score the company data points are again
2455.05 -> validated in the scoring process
2457.03 -> just to ensure for completeness and beyond.
2460.06 -> And the last leg is distribution.
2463.18 -> What are some of the challenges, right?
2468.07 -> So alignment, this is more of a process challenge
2470.5 -> than a technical implementation.
2472.69 -> Alignment across different teams to ensure, you know,
2475.96 -> requirements of how they score a company versus
2479.59 -> how you collect the data should always be consistent.
2484.36 -> You cannot have,
2485.193 -> you cannot have a scoring process or scoring team
2488.17 -> expect a certain metrics which you are not,
2490.33 -> or an analyst is not being aware of to collect.
2493.72 -> That's process challenge.
2496.3 -> Adjust to constant model changes, you have to have the,
2499.33 -> the architecture for scoring should be
2502.51 -> in a way that it should accommodate for any modeling
2506.11 -> related changes in the data.
2509.59 -> Flexibility, right?
2511.18 -> Your scoring process should be flexible enough that
2513.91 -> it should be able to handle a real time
2516.55 -> way of scoring a company or a batch way of scoring
2520.87 -> X set of companies.
2524.11 -> Data lineage, as I mentioned,
2526.69 -> should always be when you score a company
2528.52 -> and give this data out,
2530.02 -> you obviously just don't want to give the score out, right?
2532.21 -> You wanna give the proof.
2533.5 -> How do you come to the conclusion of that score?
2536.05 -> You wanna make sure that you give all that information
2539.71 -> that supports how you scored a company
2541.99 -> to your investors who are interested in this data.
2545.89 -> Last but not least,
2547.03 -> you should also have flexibility in your architecture to
2549.97 -> rescore a company if given there is a need
2553.36 -> or an opportunity or if required.
2558.16 -> How did we went and solve for it?
2561.1 -> This is primarily leveraging step functions
2564.67 -> and glue back by lambdas.
2568.24 -> So as part of this process,
2570.1 -> what happens is the first step is, like I said,
2573.55 -> validated again, just to make sure you're sure, right?
2577.39 -> Every company that validated through this process
2580.15 -> goes through the next step,
2581.53 -> which is what we call as derivation.
2584.35 -> They as part of scoring process
2586.12 -> you also derive some new metrics based on the data
2589.12 -> or metrics that you have collected,
2590.62 -> that's what is called as derivation
2592.66 -> and then you eventually score the successful companies.
2596.44 -> But the important aspect here is
2599.023 -> when you are scoring a company,
2601.03 -> you're not just taking the publicly disclosed information
2605.38 -> of a given company, you also wanna ensure
2608.59 -> if that given company has any controversy generated.
2612.16 -> It's that company news for any controversy, right?
2615.13 -> Obviously that affects that company score.
2619.3 -> So that also comes in from our data link,
2622.78 -> which is not depicted here, it's,
2624.49 -> it's primarily showed as like a small S3 buck,
2627.88 -> which is called S3 data factory here,
2629.59 -> but that is a process by itself.
2633.46 -> What is our data factory looks like?
2636.67 -> So it's very, very much modern, built on AWS cloud,
2641.29 -> which is leveraging metadata driven architecture.
2645.55 -> Where you don't have to write new piece of code
2648.52 -> every time you get a new file
2650.74 -> to ingest in your data factory.
2652.93 -> You just have to define the metadata of this
2656.032 -> and the fact should be in a position to consume it
2659.35 -> automatically all the way through both.
2661.57 -> So as part of this data lake,
2663.73 -> what happens is we categorize the data in three buckets,
2667.87 -> which we call it as bronze, silver, and gold.
2672.16 -> The first step is called bronze,
2674.17 -> where it goes through minimal checks to ensure you know,
2677.74 -> your schema is right, your counts are accurate, so and so.
2682.75 -> And the second step of this is silver,
2685.15 -> which happens like if there is any transformations involved
2688.69 -> or more sophisticated data validation checks,
2691.84 -> eventually, you know,
2693.16 -> before it comes and lands in gold for distribution.
2697.48 -> Yes, that process also happens.
2700.45 -> The output of this scoring goes through
2703.36 -> that process inside of data factory
2705.61 -> before it goes out for distribution.
2713.59 -> There.
2714.85 -> This concludes all the five steps that we talked about
2717.88 -> in this village.
2720.43 -> But it's not, you know, just,
2724.15 -> it's people, process and technology all coming together
2728.5 -> to create what you're seeing on the screen here.
2732.22 -> A tidbit is, you know, IT can also contribute to ESG, right?
2737.83 -> How many times did you, you know,
2739.9 -> you can create a code that should be sustainable enough
2743.53 -> where you're not running X number of times
2746.62 -> and consuming your data center capacities.
2748.81 -> You can also contribute,
2750.25 -> all the engineers here can also contribute to ESG
2753.13 -> when you are designing a code.
2756.31 -> But, I'm not done yet.
2759.67 -> This is, yeah, very good.
2763.6 -> What next?
2764.86 -> Where is the resiliency in this picture, right?
2769.6 -> So that's the next question we usually get.
2772.09 -> Where is the resiliency?
2775.96 -> So we set some of the goals for our resiliency here
2780.13 -> where it says, okay, your RPO is
2782.77 -> X less than an hour, whatnot, right?
2785.44 -> And durability of your data that you have committed,
2790.12 -> how does it survive redundancy across your regions
2793.39 -> and how do you test your resiliency
2796.15 -> without impacting your users?
2800.23 -> This slide is just a high level view.
2802.36 -> I probably need another hour or so to talk through
2805.24 -> challenges across the resiliency
2807.61 -> for each of these architecture.
2809.98 -> But what I can state is being on serverless actually
2815.05 -> eliminated some of the complexities.
2817.24 -> I wouldn't say complexities,
2818.35 -> probably it's like additional steps that you have to
2821.83 -> probably implement with the non serverless
2824.95 -> way of building things, right?
2828.25 -> And example, things are coming out of the box for DynamoDB.
2832 -> You have global tables, S3, you replicate it,
2835 -> but with a catch that you know,
2836.74 -> if you have to go to more than two regions
2839.89 -> and you come into a mesh architecture,
2841.45 -> that's where it can get tricky for that replication, right?
2845.08 -> And at stages where we have a front-end
2848.38 -> interface was straightforward, but for data pipeline,
2852.58 -> which involves step functions and so on,
2854.47 -> you would need some customization
2857.77 -> where you have two options.
2860.08 -> One, either you start from where it has failed,
2862.87 -> rerunability, built rerunability in your code.
2865.54 -> Or you just start over from again, step one, right?
2870.7 -> Two, two choices to go with.
2874.48 -> All right.
2875.38 -> Probably I am getting bored, getting you all bored,
2877.42 -> not sure, but that's all from my part.
2879.94 -> This conclude, concluded my session here.
2882.85 -> Hope you had enjoyed the session and I now pass over to Sri
2887.38 -> for product outcome and closure.
2889.24 -> Thank you.
2898.87 -> - I promise this is the last slide, yeah.
2902.35 -> And we can take questions.
2903.37 -> So just to wrap it, wrap it up,
2908.17 -> this is the 10,000 feet level of what we have accomplished.
2912.22 -> The horizontals are the platform capabilities
2916.33 -> that we have built.
2917.32 -> The verticals or the rows versus the columns
2922.36 -> are the ones which talks about the process
2925.21 -> from inception to scoring and distribution.
2928.87 -> So data factory,
2931.09 -> what we haven't covered is Moody's acquisitions
2934.24 -> and how we consolidate data from the various sources,
2940.12 -> which are not the public documents.
2943.87 -> So that's done in the data factory.
2946.33 -> We haven't covered this at detail other than the scoring.
2950.44 -> And it converts, we consolidate the data, we score it,
2956.14 -> and we distribute the information.
2957.91 -> But it also has history and trending analytics
2961.87 -> of all the ESG data.
2964.75 -> The data capture is nothing but a workflow tool
2967.78 -> with the backbone of microservices,
2970.87 -> which we discussed at length.
2973.51 -> And this helps source hundreds and thousands of documents,
2979.9 -> run the machine learning,
2981.91 -> provide analysts a way to validate this data
2985.93 -> and certify this data at the level that they want,
2990.85 -> which can span between an entity to sector.
2995.44 -> And then this led, this also can take care of the,
2999.91 -> the same data can help with SFDR reporting as well.
3003.72 -> So the key business outcomes is we are able to
3009.03 -> launch the product and Q1 2022
3015.15 -> and this product contains, you know,
3019.5 -> we can't disclose the number,
3020.76 -> but it tens and thousands of entities,
3023.4 -> which is processed by, which is processing, you know,
3030.06 -> hundreds, hundreds and thousands of PDF documents,
3033.18 -> which is the sustainable reports, 10K, 10Q,
3036.27 -> and all of these has, each entity has
3041.1 -> more than 300 metrics that are scanned and,
3045.21 -> and if final score is also published.
3049.14 -> With that, we conclude the presentation
3051.87 -> and we really appreciate your time.
3054.04 -> Thank you.

Source: https://www.youtube.com/watch?v=tyM3OHT_0M8