AWS re:Invent 2022 - How Moody’s uses serverless and microservices for ESG scores (FSI205)

Aug 16, 2023

AWS re:Invent 2022 - How Moody’s uses serverless and microservices for ESG scores (FSI205)

Environment, social, and governance (ESG) considerations are reshaping financial markets and risk-based decision making for investors. Moody’s offers a scalable solution to publish ESG data-driven scores with transparency to support detailed analytics. ESG results are incorporated into client-facing platforms and risk-based models for portfolio management. Doing so requires unifying multiple data sources, cleansing that data, and analyzing it without manual intervention. In this session, learn how Moody’s built a scalable serverless platform on AWS to address this challenge. Moody’s shares how its serverless and microservices architecture allows them to seamlessly publish ESG scores with data lineage and transparency using AWS services.

Learn more about AWS re:Invent at https://go.aws/3ikK4dD.

Subscribe:
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4

ABOUT AWS
Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.

AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

#reInvent2022 #AWSreInvent2022 #AWSEvents

Content

0.45 -> - Thank you for joining us today, welcome everyone.

3.96 -> I know it's been really busy.

6.66 -> It's still the first day.

8.73 -> My name Sri Sontineni and I have Divya Elaty here.

14.28 -> Divya is the senior Vice President at Moody's ESG,

20.31 -> and she leads the cloud platform engineering team.

24.93 -> My role at AWS is a senior engagement manager,

28.92 -> and I deliver cloud projects for financial services firms

35.46 -> such as Moody's.

36.78 -> And this is one of our project that we delivered,

41.46 -> and it's around how we use serverless and microservices

47.73 -> to generate ESG data that can be leveraged

52.5 -> by other financial services firms.

56.25 -> So without further ado, starting off with the introduction.

62.52 -> Starting off with the agenda.

65.73 -> What we want to cover today is

68.25 -> give a quick introduction of Moody's

70.26 -> and explain what Moody's business is.

75.72 -> Then give a basic understanding of ESG

80.82 -> and the factors which are used to measure ESG

83.76 -> so that you can connect with the story.

86.52 -> And after that, we'll go a little bit deeper

90.27 -> into the architecture and explain the product,

94.14 -> the various steps and how we achieved the product,

100.32 -> which was released earlier this year

103.2 -> to thousands of companies.

109.68 -> Quickly talking about Moody's.

112.53 -> Moody's is a global risk assessment firm,

116.43 -> and it does two things in a great way,

121.32 -> which is primarily widening credit trading opinions,

125.64 -> and also providing financial risk assessment solutions

130.89 -> that can be leveraged by companies to do,

134.07 -> to do business better and make decisions faster.

141.06 -> Moody's has two entities.

142.71 -> One is the Moody's services, which does the credit ratings.

147.9 -> And these credit ratings are used by investors and insurers

153.15 -> for their debt instruments.

155.01 -> And most of you might have heard about credit ratings that

158.633 -> and the opinions that Moody's provide.

162.24 -> Moody's or MIS has been doing this

165.72 -> for the last hundred years.

168.93 -> So far, MIS has rated 73 trillion dollars in debt,

174.87 -> and this was done across 35,000 entities and transactions.

183.48 -> Moving on to Moody's Analytics.

185.49 -> Moody's Analytics was founded as a separate entity in 2008,

193.74 -> and Moody's Analytics initially started off

198.03 -> as a licensing of credit trading opinions,

202.32 -> but slowly it's spread and currently it provides

207.51 -> analytics solutions that helps with

210.42 -> risk management and growth.

212.64 -> And this can be a licensed product that can be used

216.577 -> and that can be used for your business in,

220.17 -> I mean for your business and gaining insights.

224.37 -> Currently, there are 14,900 customers

227.73 -> who use Moody's Analytics.

230.31 -> And going onto the next slide.

236.13 -> A quick introduction of ESG.

239.577 -> ESG stands for environmental, social, and governance.

242.79 -> And I believe you all know that, that's why you are here.

248.16 -> So diving a little bit deep into the fact that

251.88 -> what constitutes these three pillars.

254.73 -> So for the environmental pillar, as you can,

259.95 -> as you already know,

261.93 -> conservation of the natural habitat and environment

265.89 -> forms the crux of it.

267.87 -> And this is measured using carbon emissions, right?

272.13 -> Which is a common term,

273.99 -> I'm sure while you're taking a flight here,

276.84 -> you know exactly what is the carbon emissions

279.51 -> that we are leaving, right?

283.32 -> Apart from that, we do have various other factors

286.77 -> such as water usage.

289.14 -> And I was so happy to hear the news that

293.04 -> AWS is gonna be water positive by 2030, right?

298.26 -> That was the news that came in today morning.

301.29 -> What that means is all of the data centers

303.39 -> will produce more water than it consumes.

308.34 -> Other factors also include deforestation,

311.73 -> water usage, and many more.

315.9 -> The second pillar is social.

319.2 -> Social talks about the relationships between

323.85 -> the different people businesses interact with.

327.27 -> These can be customers, these can be the employees,

332.91 -> these can be the communities the businesses serve.

336.15 -> So the social is measured using factors

341.43 -> such as workforce diversity.

346.02 -> Workforce diversity is a factor that can help us understand

349.77 -> how underrepresented groups and make sure there is equality.

356.16 -> Gender and equal pay policies, health and safety,

360.87 -> some of the other factors that contribute to social.

366.51 -> Finally, we have the governance.

369.21 -> Governance are standards for a company,

374.22 -> and on how it is running its company

378.21 -> and the ethical policies that make ensure

381.99 -> that the company is running in the right way.

386.85 -> And this can be primarily make sure

392.04 -> using the board structure,

394.5 -> the audit structure to make sure there is transparency.

398.43 -> Executive compensation to just make sure that

402.36 -> it is following standards with lower policy.

408.24 -> Tax transparency are some of the factors

411.27 -> that contribute to this governance pillar.

416.1 -> Now that we know what ESG is,

419.85 -> how many of you all think that ESG is a mandatory regulation

425.73 -> that companies have to do

427.32 -> and it is another reporting that you have to do?

431.49 -> You can raise your hand or nod your head

434.25 -> if you think it's a regulation.

440.49 -> So, I was in the same bucket.

443.64 -> I was thinking it was a regulation, right?

445.59 -> This was few years back.

448.26 -> But soon, soon I realized it's a mistake, right?

452.43 -> Firms shouldn't treat ESG as a mandatory regulation,

456.3 -> though there are some reporting needs.

459.36 -> It should be treated like a risk.

462.51 -> And primarily for financial services,

464.79 -> it is a reputational risk, right?

467.55 -> The risk, and I can deep dive a little bit into it

471.54 -> because it's close to my heart.

473.58 -> It is a reputational risk because of the V.

479.22 -> Everyone here is making decisions, right?

483.66 -> Three years back, you wouldn't have made,

486.96 -> you wouldn't have probably considered the things that

492.78 -> into your decision making process

495.6 -> because of things like black life matters, right?

499.14 -> So black life matters is an example of social justice where,

504.554 -> where, where equality is at its core, right?

509.58 -> The second example we have is global pandemic,

513.75 -> where we have clearly seen there is

516.42 -> economic fissures across the world.

520.77 -> And finally, we all have seen wildfires, right?

526.2 -> And wildfires across the globe and within the US,

530.4 -> they really changed our mind about climate, right?

534.54 -> So climate is, climate change is a current thing,

539.34 -> and we all have to be trying to make,

543.72 -> we all have to make decisions

545.64 -> so we can reverse that or at least stop the decline.

550.77 -> So with that in mind,

552.18 -> reputational risk is the key word I want you to take away

555.72 -> from this slide.

558.36 -> Now with that in mind,

560.79 -> there are various financial services priorities

565.29 -> that we have.

566.123 -> Some of the examples that we do have is

569.04 -> primarily asset management, right?

571.17 -> So management firm or asset fund manager

576 -> want to create sustainable investment funds

579.6 -> with the help of underlying ESG data.

585.81 -> And secondly, banks.

589.74 -> So bank wants, banks are the biggest, what do you say?

596.55 -> Taking a step back, banks lend us loans, right?

600.54 -> Bank lend loans to the businesses as well.

603.27 -> So the biggest emissions are on its balance sheet, right?

608.46 -> If you're lending loans to a firm that causes emissions,

613.26 -> banks are asked to start disclosing the emissions

618 -> and this is called as financed emissions, right?

620.34 -> The financed emissions need to be disclosed by the banks.

623.58 -> And this slowly but surely is becoming a mandate

626.67 -> because three GHG reporting, by according to that,

631.89 -> you have to disclose the financed emissions.

636.03 -> And finally, every firm is reporting its ESG

641.07 -> using sustainability reports

643.17 -> and that also needs benchmarking data.

646.11 -> So the common layer that you see across all the priorities

650.16 -> is this central ESG data that is curated, certified,

659.31 -> and has lineage where you can make sure that everyone knows

666.09 -> where this data is sourced from and how I can use the data

672.06 -> and provide feedback in case

674.67 -> the customers need any additional factors.

683.07 -> Going to the next slide.

685.14 -> So, so Sri, what's the big deal?

687.15 -> Go collect the data.

688.41 -> Every asset fund manager can do it,

690.42 -> every bank can do it, right?

692.16 -> Just go do reporting.

694.08 -> So the key challenge that you have is, first thing,

700.56 -> there is no single standard of reporting.

703.35 -> What that means is sustainable report of Amazon is different

707.19 -> from the sustainable report that's produced by Moody's.

710.25 -> So when you deal with unstructured data,

712.78 -> there is a lot of manual work involved in it,

715.8 -> and that causes cost, right?

718.35 -> And errors.

719.58 -> So that's the first challenge, right?

722.01 -> And also it's voluminous.

723.21 -> When you think of large gap in, in,

725.61 -> in an index fund, right? SND.

727.86 -> So it's, it's like a lot of firms that you have to

731.85 -> get the data for, curate it, make it usable, right?

737.64 -> Forget about insights.

740.64 -> The second challenge that we do have is

744.33 -> primarily infrequent reporting.

747.42 -> So ESG is sustainable report is only done annually.

752.7 -> So if you are doing it annually,

755.88 -> let's take energy and gas sector

758.25 -> where there is an oil spill.

760.59 -> All of a sudden that oil spill is not reported.

764.49 -> It might be a news even, but an asset fund manager,

767.88 -> if he's not actively catching up, he wouldn't get that feed.

771.39 -> And what that means is your data, ESG data

775.32 -> is not gonna correlate with your stock price

778.02 -> or the financial information,

780.39 -> and the data will soon become irrelevant, right?

784.08 -> So that data has to be faster and proper and you know,

791.31 -> and should be accessible to the asset managers.

794.76 -> And finally, after all of this,

797.49 -> uncovering the insights from the ESG data.

801.84 -> There are 400 plus factors that you can consider

807.18 -> to make a decision.

808.74 -> Now, what is the weight

810.42 -> for each of the factor that you're gonna use?

813.48 -> This is something that you need to go to an SME,

818.16 -> create a model, build a weighted model to figure out

822.81 -> how you can even start using it.

825.9 -> So these present a challenge to all the asset managers

832.11 -> and 66% of the asset managers who were involved in a survey

839.67 -> asking why they are unable to create

842.31 -> sustainable funds using ESG data,

844.32 -> 66% of them told that the reason is

850.47 -> accessibility of this data, right?

855.48 -> This slide is talking about the financials,

858.6 -> like, well, there is a business case, I agree with you,

861.84 -> but does it make financial sense?

864.93 -> And absolutely it does, is what this slide is talking about.

869.91 -> The first three things that you see here

873.18 -> is just talking about the increase in sustainable funds,

879.84 -> investment funds, and how they are performing, right?

884.97 -> When you compare with other companies

888.75 -> who are not these sustainable indexes

892.2 -> and the funds are performing much better.

895.11 -> And finally, this is the market capitalization.

900.48 -> If someone starts providing the ESG data tomorrow,

905.07 -> the licensing of that data can fetch you this much.

909.12 -> Is a high level analysis,

911.07 -> but it just brings to the business case.

914.01 -> Is there a business case?

915.63 -> And this is talking about the business case.

921.09 -> Quickly moving to what is,

922.92 -> why is Moody's interested in this

926.07 -> and what is Moody's wanting to do.

929.67 -> So as I have introduced at the beginning that

933.42 -> Moody's is in risk management and assessment space,

939.42 -> and it has hundred years of experience in that space,

943.26 -> so it's only natural that Moody's customers

946.59 -> are going to ask, Hey, why don't you take

948.54 -> this data into your risk models, right?

951.57 -> So that, that's the inception of it.

957.84 -> Also, there's an increased ask

960.66 -> for primarily regulation needs,

962.76 -> the SFDR reports, which is a Europe based regulation,

967.11 -> European based regulation,

968.49 -> so there is a need for the reports.

972.21 -> So also tackling that for Moody's customers.

976.53 -> And then the final thing is creating a new revenue stream

980.52 -> as we have seen there is capital, there is, you know,

986.85 -> new revenue stream that they can tap into.

991.02 -> With that I'll hand over to Divya so that

994.29 -> we can go through the five steps

996.36 -> and deep dive into the architecture.

1003.65 -> - Thank you Sri.

1005.87 -> Hello everyone.

1006.703 -> I'm Divya Elaty, the Senior Vice President at Moody's,

1010.7 -> heading the ESG cloud engineering team for data, right.

1014.93 -> So I'm here today to walk through our journey

1018.92 -> in gathering ESG data, right?

1022.07 -> So what does it involve to gather ESG data?

1025.52 -> So let me break this down into five simple steps.

1029.99 -> Step one, before you start capturing any of the ESG data,

1035.36 -> you would like to, you know,

1037.22 -> lay the foundation for your ESG reference data.

1040.55 -> And I'll walk you through

1041.72 -> details of each of these steps, right?

1044.18 -> So lay that foundation of ESG reference data.

1048.05 -> Step two, once you have that foundation

1050.51 -> where you know what you are after,

1053.54 -> then you go and start collecting

1055.88 -> the publicly disclosed ESG documents by a given company.

1061.94 -> That's step two.

1062.84 -> Once you have the documents collected, right?

1066.11 -> Now, your goal is to extract the ESG related information

1070.01 -> surrounding to your methodology from within these documents.

1074.42 -> That's where AI machine learning comes into picture.

1079.01 -> And step four, once you have extracted

1082.07 -> the information from these documents,

1084.8 -> now you enrich the data, you web this data,

1088.55 -> that's where ESG specialized analyst comes into the picture

1092.15 -> to review this data and validate it.

1098.12 -> Once it's vetted and validated,

1100.31 -> the last step involves, it's now ready for distribution and

1105.56 -> given a certain ESG scores based on your methodology.

1110.27 -> Sounds so simple and straightforward, right?

1113.26 -> So now let's see how the architecture

1116.93 -> looks like if I combine all of these five steps

1120.23 -> together into one picture.

1124.28 -> That's how it looks like.

1126.47 -> Looks like a mini village of its own, right?

1129.83 -> But I will break down each of this architecture

1133.55 -> in detailed steps in the upcoming slides.

1137.84 -> All right, let's go to step one.

1141.56 -> Gather your ESG reference data.

1146.21 -> So what does gathering ESG reference data would mean, right?

1151.19 -> Establish a list of your reference tables

1154.22 -> that might contain, for example,

1157.46 -> what is your ESG methodology that you want to go after?

1162.05 -> And the methodology would involve, you know,

1164.45 -> what are these metrics, ESG related metrics?

1168.05 -> What are its criteria?

1169.58 -> What are its indicators to name a few, right?

1173.03 -> Once you have that foundation laid,

1175.52 -> the next step is obviously, which companies

1178.7 -> are you really interested in capturing this data?

1181.13 -> What is your coverage looks like for ESG data, right?

1186.35 -> So these are some of the minimal reference data examples

1190.52 -> that I can name a few, right?

1192.17 -> Once you have this and establish the relationship,

1194.87 -> of course it's reference data tightly

1196.91 -> establish those relationship within these data sets, right?

1202.55 -> We also have a need where we wanna get certain attributes,

1206.48 -> additional attributes associated to these reference dataset

1209.42 -> from our trusted external sources.

1213.68 -> For example, index data or exchange rate data to name a few.

1218.87 -> So you have to have a pipeline to fetch this data.

1222.62 -> The last step involves a very important step in this process

1226.55 -> is reference data management, right?

1229.55 -> So how do you manage these reference data?

1231.86 -> So what does reference data management involve?

1235.37 -> So it could be as simple as a change management

1238.34 -> where you might wanna add a company to your coverage,

1241.22 -> or you might wanna remove a company from the coverage,

1244.4 -> or you might wanna add a metric for your methodology,

1247.88 -> or you wanna remove them from your methodology.

1252.44 -> Sounds simple but if I dig deeper into further scenarios,

1257.12 -> what happens if your company moves sectors?

1261.44 -> From one sector to another sector, right?

1265.22 -> How does it involve or affect your subsequent

1268.61 -> data capturing process journey?

1271.13 -> What happens when there are acquisitions

1273.32 -> or mergers within a company?

1274.97 -> How does that affect your reference data, right?

1279.71 -> And how do you ensure like all of this is tied together

1284.36 -> before going into the next subsequent steps.

1287.21 -> So these are the core foundations

1288.77 -> of capturing your reference data.

1292.22 -> Now let me walk you through what are some of our challenges

1295.34 -> that we have faced in handling the reference data.

1300.44 -> Data governance is one of our biggest challenge.

1305.33 -> And it's not just for these five steps of ESG

1308.48 -> centric data gathering process that it is a challenge here.

1312.35 -> No, it's across the enterprise.

1314.99 -> When you look at this data, right?

1317.45 -> It's an, we have learned that it is not just for ESG

1321.98 -> but it is an enterprise level taxonomy

1324.77 -> used across enterprise products and services

1328.4 -> that might include your credit ratings

1330.65 -> or risk assessment products.

1332.72 -> You gotta have this, you know,

1335.18 -> consistent system of record

1337.01 -> across your enterprise data sets.

1339.95 -> So how this ESG data can evolve into core fundamental

1344.03 -> ESG reference data across an enterprise?

1348.68 -> How did we go about solving for some of these challenges?

1353.48 -> So this is how the high level architecture looks like.

1357.08 -> On the, on my right side you see

1361.37 -> a pipeline that has, you know,

1366.56 -> pulling the data sources from our trusted external vendors.

1371.36 -> We have leveraged AWS glue for that.

1373.85 -> And on the, on my left side,

1376.79 -> you see an interface for change management, right?

1381.77 -> From a user perspective.

1384.74 -> In the middle is what you see the microservices.

1388.61 -> Now, these microservices are not just used

1391.1 -> for change management of this reference data alone,

1395.54 -> but they are also integrated with the subsequent five steps

1399.47 -> that I'll walk you through in this ESG data journey.

1403.01 -> This is all reusable and the reason being

1405.267 -> what this microservice gives you two benefits.

1408.5 -> One, you're not duplicating your code.

1411.89 -> Second, you know, this is ensuring your data lineage

1415.82 -> and consistency across your other systems,

1419.3 -> whether it could be those systems could be internal to ESG

1422.3 -> or it could be external to your enterprise.

1428.45 -> There.

1429.283 -> So now we have collected our reference data.

1433.34 -> Now we are ready to go after our documents for ESG.

1439.49 -> So most companies disclose ESG data.

1441.41 -> You know, these are publicly available documents

1445.1 -> that we can go and collect after.

1447.02 -> Now, what does this document collection process involve?

1451.76 -> So we have global ESG analysts across the wall

1456.62 -> who are expertise in ESG that are

1458.99 -> going after these companies and collecting these

1461.81 -> publicly disclosed ESG documents.

1468.59 -> Now, what is important as part of this collection process is

1471.56 -> you cannot just collect a document without

1474.62 -> tagging a metadata to a document, right?

1477.44 -> So you wanna have that lineage intact.

1480.77 -> So as part of this collection process,

1483.08 -> you want to tag the document with the metadata associated

1488.21 -> that you have created, or sorry, what we have created

1491.66 -> as part of the step one process, right?

1494.6 -> So tagging that metadata to that document

1497.48 -> is important from data lineage and consistency

1499.88 -> point of view.

1501.68 -> Now, once you have collected these documents,

1504.74 -> you prepare these companies into batches or portfolios

1509.66 -> as we call it internally,

1511.34 -> to feed them into the next step in this journey,

1515.69 -> which is extracting the information

1517.28 -> off of these documents, right?

1520.22 -> Sound straightforward. Not really.

1523.19 -> Now let's see what are some of the challenges that we faced.

1527.45 -> Data inconsistency and manual collection, right?

1532.4 -> Why data inconsistency?

1534.59 -> As we stated, not all companies

1537.38 -> disclose this information in a standard form.

1540.44 -> There is no standard way of reporting ESG data,

1544.82 -> and every company has its own way and style.

1547.52 -> Some companies give this out in PDFs,

1550.04 -> some companies have their own HTML pages

1553.1 -> and some companies even go a little deeper into these

1557.39 -> where they can have embedded HTML links

1560.12 -> inside the same webpage

1561.41 -> and that level can go as deep as it can, right?

1567.2 -> So that's part of the challenge.

1569.72 -> Now, the other part of the challenge is

1574.19 -> not all companies have the same cycle to report the data.

1577.85 -> Some of them do quarterly, some of them do annually, right?

1581.69 -> So that is also causing a challenge in capturing this data.

1585.26 -> You don't have a set cycle or a set window

1587.9 -> when a company can disclose its information.

1593.03 -> And data lineage, I know you will be hearing me

1594.92 -> talking about data lineage over and over,

1596.84 -> bear with me because it is most important piece

1599.45 -> in this collection process.

1602.78 -> And of course it's a manual process today

1605.6 -> where analysts are spending so much of time

1608.63 -> in capturing this data.

1611.3 -> We are in the process of automating this,

1613.46 -> mostly leveraging machine, machine learning libraries,

1618.05 -> but we are not really there yet in that journey.

1620.99 -> For now, it's semi-automated, as I call it, right?

1625.37 -> So what does the architecture look like, right?

1628.82 -> Same serverless framework as you see in the reference data.

1632.99 -> The only difference is, you know,

1634.52 -> you have a front-end interface to upload this data

1638.12 -> on the left side,

1638.953 -> and on my right side is what you see DynamoDB

1642.44 -> that is used to store the metadata of these documents

1645.44 -> and S3 to store the PDFs or any version of the documents.

1650.51 -> In the middle, again, you see microservices

1653.03 -> and these microservices are not just

1656.3 -> microservices used to capture the documents,

1658.91 -> but they're also tightly integrated

1660.74 -> with the microservices that we have created

1662.72 -> in step one of reference data

1664.85 -> from a system of record point of view, right?

1669.17 -> And bear with me, these, these microservices are also

1673.01 -> gonna be used to be tightly integrated with the next step

1676.07 -> in this journey, right?

1680.36 -> Great.

1681.193 -> So now we have captured our reference data,

1684.77 -> collected the documents, what next?

1688.4 -> Now we are ready to extract the information

1691.49 -> from the PDFs or HTMLs or unstructured data.

1696.62 -> So that's where AI machine learning comes into play here.

1703.22 -> What does this involve?

1704.81 -> Now remember, we have created portfolios

1707.42 -> and those portfolios can include

1710.39 -> group of companies that have completed

1712.97 -> the document collection process.

1715.55 -> Or it can also include, you know,

1718.85 -> companies that are grouped under a specific sector

1722.48 -> or any combination.

1724.82 -> Now this portfolio of documents

1727.4 -> are fed into our machine learning AI,

1731.27 -> AI machine learning process,

1733.31 -> which will go and extract the information

1737 -> surrounding to our methodology of ESG

1740.24 -> and pull this key paragraphs or data out.

1745.16 -> Right.

1747.08 -> Now once it's extracted, now we have to prepare this data

1752.57 -> and be it ready for our analysts to come in

1756.02 -> and review this data in the next step.

1759.32 -> Now let's see what are some of the challenges we faced

1763.76 -> during this process.

1765.29 -> Now, with any machine learning model,

1770.12 -> you won't get a hundred percent of accuracy on day one.

1773.84 -> It's bound to have some noise

1776.06 -> or more noise depending, right?

1778.91 -> So it's that, that some of the challenges

1782.87 -> involve redundancy of the data.

1784.46 -> For example, you know,

1785.87 -> the same information of our methodology

1790.1 -> being found in multiple pages of a single document,

1793.46 -> let's say for example,

1795.53 -> that's one of the challenge.

1797.45 -> Second challenge involves the complexity

1800.63 -> of how companies might embed this data inside the document.

1805.19 -> Let's say, I don't know, maybe another HTML box

1809.09 -> or a table format, something that

1811.25 -> machine learning did not anticipate to pull this

1813.56 -> or expect to have the data inside that, right?

1817.79 -> Or it could be as simple as basic page number

1821.45 -> where the information was found to be missing.

1825.59 -> So for that machine learning to improve its efficiency

1829.64 -> over a period of time, it needs active learning, right?

1833.54 -> So you gotta make sure that the analysts

1836.39 -> who reviews this data or make any corrections,

1839 -> get actively fed back into this process

1843.14 -> and make sure that the model learns using this data.

1851.03 -> And the last but not least is scaling, right?

1855.68 -> The portfolio of companies can include X number.

1860.9 -> It could be big at point in time depending on the cycle

1863.99 -> at which company is disclosing information,

1866.42 -> or it could be small, right?

1868.16 -> So your model should be in a position to scale itself

1871.13 -> to accommodate those kind of datas.

1875.51 -> Now, in this architecture, I'm gonna walk you through

1881.15 -> what happens after machine learning

1883.13 -> extracts the information, right?

1885.29 -> It's not once the portfolio of batches

1889.67 -> is being sent to our machine learning model,

1892.94 -> once the processing is done,

1894.71 -> it sends a signal to the next process.

1898.73 -> And we have leveraged AWS Lambda to scale up or down

1905.69 -> depending on the number of records or number of

1909.02 -> the amount of data the machine learning model has,

1911.99 -> you know, pulled from that particular portfolio.

1915.8 -> So depending on how big or how small the portfolio could be,

1920.36 -> your lambda just scales up and down

1923.12 -> and consumes or in just this data

1925.25 -> into for the next step to come in.

1931.28 -> Great.

1932.113 -> So we have reference data, we have documents,

1935.3 -> we have extracted information off of these documents.

1938.3 -> What next? Right?

1940.46 -> So this is where our, what we call it as data enrichment

1944.81 -> where our analyst comes in to review this data, right?

1950.72 -> So let's see what does this involve.

1955.19 -> So the data points that we receive from

1957.47 -> machine learning process

1959.81 -> doesn't just go all ready to be consumed by an analyst yet,

1963.26 -> there are certain steps that happens in between.

1966.62 -> So what does that involve? Right?

1969.92 -> So the first and foremost is it standardizes the data.

1973.46 -> What is standardization? Right?

1976.73 -> Standardization is nothing but, you know,

1979.52 -> depending on where the company is located

1981.56 -> or what their convention look like,

1984.65 -> the metrics can be displayed in its own convention.

1988.01 -> For example, some companies can disclose them in kilos,

1992.81 -> some companies disclose the same information in tons

1995.57 -> depending on where they are

1996.95 -> or what their standard approach is, right?

2000.46 -> So we go through standardizing of that data

2003.37 -> into local convention to make it easy for an analyst to

2008.77 -> review a data so they don't have to do their math

2010.99 -> on their mind while they're reviewing the data, right?

2014.88 -> And the second thing that happens is the data validation.

2019.6 -> As we discussed, there could be some noise

2022.72 -> generated from machine learning, right?

2024.61 -> So you filter that noise as much as possible

2028.42 -> to reduce overhead on an analyst.

2031.33 -> So drop those redundant data sets,

2033.61 -> which you don't need to, right?

2036.67 -> So that's one of the cleansing.

2038.17 -> And the second step that happens is make sure that

2041.8 -> this data is neatly tied with all the previous steps

2046.12 -> that we have described.

2047.32 -> Make sure it's tied with your reference data.

2049.09 -> Make sure it's tied up with your document data

2051.13 -> that you have collected.

2052.63 -> So once you have that system of record nicely organized,

2056.47 -> your analyst to come in and review it,

2060.58 -> that's what happens, you know, before all the pre steps.

2064.69 -> Now it's ready for analysts to review it, right?

2068.47 -> What does analyst review process look like?

2072.31 -> So we have, it's done in two phases and four steps.

2078.7 -> So that's how rigorous our review process

2081.37 -> is with around this data.

2085.36 -> And I'll talk more about the architecture.

2087.37 -> What does that look like

2088.45 -> from an analyst point of view, right?

2090.697 -> And the last step is, you know, finalizing this data

2094.45 -> and ready for consumption globally.

2099.43 -> What are some of the challenges?

2102.25 -> There.

2103.083 -> Depending again on the size of your portfolio,

2106.42 -> you might have data coming back from machine learning model.

2111.43 -> Millions and millions of data points.

2114.1 -> So the architecture should be scalable enough

2117.82 -> to handle this volume.

2119.2 -> This is per portfolio by the way, right?

2122.29 -> And the second piece is you have to obviously make sure

2125.77 -> your architecture is highly available

2128.47 -> because of our global presence of our analysts

2131.56 -> who are consuming this data.

2135.22 -> The next step is, while we were part of this journey,

2139.36 -> we were still evolving in terms of data requirements.

2143.02 -> So your design should be flexible enough

2147.7 -> to handle those changes or modifications, right?

2152.71 -> And the last step obviously, which is,

2154.87 -> which I haven't stressed in my previous steps

2157.06 -> is data validation,

2158.38 -> which happens at every step from completeness perspective.

2161.8 -> Hey, did you capture all your documents in your step two?

2165.31 -> Did you capture, you know,

2166.99 -> did machine learning generate or capture

2169.09 -> all the metrics associated to your methodology

2171.76 -> in your step three?

2173.14 -> Step four comes in,

2174.58 -> did your analyst review every single data point

2177.28 -> surrounding for your methodology

2178.69 -> from completeness perspective?

2181.09 -> So, these are some of the data validations.

2184.9 -> So how does the architecture look like?

2190.48 -> So this is not as straightforward as it looks on paper,

2194.95 -> but when you look at it from left and right point of view,

2200.23 -> on the left side is where we have our user interface

2204.04 -> who comes in to look at this data

2207.82 -> and on the right side is what we have our backend,

2212.83 -> it's on DynamoDB obviously.

2215.65 -> And you can also see in this picture streams.

2220.54 -> So DynamoDB streams is what we use to standardize this data.

2224.8 -> We also have leveraged Athena connector for DynamoDB

2229.09 -> for anyone who is SQL proficient,

2233.23 -> if they want to look at this data,

2235.42 -> we don't have to do anything additional.

2238.27 -> Middle is what you see the microservices, right?

2243.76 -> Now what are some of the technical checks

2248.41 -> that happens around this,

2250.12 -> this tool here that we are talking?

2252.76 -> Flexibility, as I said is very important.

2255.28 -> What is important when you are considering the architecture

2258.34 -> is understand your query patterns,

2261.49 -> understand your needs of the data.

2264.25 -> How an analyst is asking for what kind of data,

2267.52 -> what kind of flexibility.

2268.78 -> For example, are they looking at the data

2271.57 -> at a criteria level?

2273.07 -> Are they looking the data at a sector level?

2275.86 -> Are they looking the data at a company level?

2278.65 -> Right?

2279.483 -> So give that flexibility.

2281.53 -> For you to give that flexibility

2283.45 -> you have to nicely organize the data

2285.37 -> in such a way in your DynamoDB to leverage, you know,

2289.12 -> those indexes and keys nicely organized

2291.73 -> so it can give you that optimal performance and efficiency.

2297.55 -> So flexibility was our biggest task.

2301.39 -> Now it doesn't depict in this picture,

2304.06 -> but you don't want to be waiting

2306.34 -> for analysts to complete the review and then figure out,

2309.25 -> oh you know what, this is not right, this is not right,

2312.52 -> this you have to go and fix,

2313.81 -> this you have to go and fix.

2315.64 -> So the tool has to be flexible enough that

2318.28 -> it does that check as the analyst is reviewing the data

2322 -> and notify them of these issues.

2324.28 -> Let's say, you know, what happens actually

2327.01 -> when analyst is coming in this tool is

2329.8 -> they can do two things.

2331.66 -> They can create some new entries

2334.87 -> where machine learning was not able

2336.46 -> to pull it out of their document, right?

2339.07 -> Or they can modify an existing entry where machine learning,

2342.19 -> they think that machine learning did not pick

2343.807 -> the right information.

2347.17 -> So as part of these corrections,

2349.24 -> you have to ensure that the lineage is not broken at all.

2353.77 -> It has to be still consistent across the board.

2357.16 -> So all those checks should actively happen in the tool

2360.94 -> while they're reviewing the data and notify them,

2364.15 -> Hey, go and make this correction.

2368.32 -> Right.

2370.84 -> So that's just naming a few of the data validation checks

2374.26 -> that happens on this tool,

2375.49 -> but there are many more to begin with.

2378.07 -> You might also wanna ensure that

2381.31 -> you also notify your analysts that your,

2384.85 -> there are no outliers or gaps in this data

2387.52 -> as they're correcting.

2388.69 -> And you also don't want to,

2390.46 -> pretty sure you don't want them to

2392.14 -> be looking at the same company, obviously, right?

2394.9 -> Two different analysts

2395.77 -> cannot be working on the same data set.

2398.56 -> So you have to build in all of those

2400.57 -> checks and balances inside the tool

2403.18 -> for them to be able to doing their day to day jobs.

2408.01 -> That's just to name a few,

2411.4 -> but there is more that really happens in this process.

2415.21 -> Now that the analysts had vetted this data,

2419.41 -> the last step of this is score and distribute.

2423.07 -> We are now ready to roll, right?

2426.01 -> The analyst reviews the completeness of the data

2429.58 -> for an entity and it prepares a report and then say,

2433.75 -> Hey, you know what?

2435.07 -> I'm done with these companies.

2436.87 -> Boom!

2437.703 -> It sends the, goes to the next process,

2441.04 -> generates the scores, but it's,

2444.61 -> I'll talk through what are different steps involved

2446.77 -> when they're, you know, scoring a company.

2450.31 -> Obviously, you know,

2451.33 -> validate and score the company data points are again

2455.05 -> validated in the scoring process

2457.03 -> just to ensure for completeness and beyond.

2460.06 -> And the last leg is distribution.

2463.18 -> What are some of the challenges, right?

2468.07 -> So alignment, this is more of a process challenge

2470.5 -> than a technical implementation.

2472.69 -> Alignment across different teams to ensure, you know,

2475.96 -> requirements of how they score a company versus

2479.59 -> how you collect the data should always be consistent.

2484.36 -> You cannot have,

2485.193 -> you cannot have a scoring process or scoring team

2488.17 -> expect a certain metrics which you are not,

2490.33 -> or an analyst is not being aware of to collect.

2493.72 -> That's process challenge.

2496.3 -> Adjust to constant model changes, you have to have the,

2499.33 -> the architecture for scoring should be

2502.51 -> in a way that it should accommodate for any modeling

2506.11 -> related changes in the data.

2509.59 -> Flexibility, right?

2511.18 -> Your scoring process should be flexible enough that

2513.91 -> it should be able to handle a real time

2516.55 -> way of scoring a company or a batch way of scoring

2520.87 -> X set of companies.

2524.11 -> Data lineage, as I mentioned,

2526.69 -> should always be when you score a company

2528.52 -> and give this data out,

2530.02 -> you obviously just don't want to give the score out, right?

2532.21 -> You wanna give the proof.

2533.5 -> How do you come to the conclusion of that score?

2536.05 -> You wanna make sure that you give all that information

2539.71 -> that supports how you scored a company

2541.99 -> to your investors who are interested in this data.

2545.89 -> Last but not least,

2547.03 -> you should also have flexibility in your architecture to

2549.97 -> rescore a company if given there is a need

2553.36 -> or an opportunity or if required.

2558.16 -> How did we went and solve for it?

2561.1 -> This is primarily leveraging step functions

2564.67 -> and glue back by lambdas.

2568.24 -> So as part of this process,

2570.1 -> what happens is the first step is, like I said,

2573.55 -> validated again, just to make sure you're sure, right?

2577.39 -> Every company that validated through this process

2580.15 -> goes through the next step,

2581.53 -> which is what we call as derivation.

2584.35 -> They as part of scoring process

2586.12 -> you also derive some new metrics based on the data

2589.12 -> or metrics that you have collected,

2590.62 -> that's what is called as derivation

2592.66 -> and then you eventually score the successful companies.

2596.44 -> But the important aspect here is

2599.023 -> when you are scoring a company,

2601.03 -> you're not just taking the publicly disclosed information

2605.38 -> of a given company, you also wanna ensure

2608.59 -> if that given company has any controversy generated.

2612.16 -> It's that company news for any controversy, right?

2615.13 -> Obviously that affects that company score.

2619.3 -> So that also comes in from our data link,

2622.78 -> which is not depicted here, it's,

2624.49 -> it's primarily showed as like a small S3 buck,

2627.88 -> which is called S3 data factory here,

2629.59 -> but that is a process by itself.

2633.46 -> What is our data factory looks like?

2636.67 -> So it's very, very much modern, built on AWS cloud,

2641.29 -> which is leveraging metadata driven architecture.

2645.55 -> Where you don't have to write new piece of code

2648.52 -> every time you get a new file

2650.74 -> to ingest in your data factory.

2652.93 -> You just have to define the metadata of this

2656.032 -> and the fact should be in a position to consume it

2659.35 -> automatically all the way through both.

2661.57 -> So as part of this data lake,

2663.73 -> what happens is we categorize the data in three buckets,

2667.87 -> which we call it as bronze, silver, and gold.

2672.16 -> The first step is called bronze,

2674.17 -> where it goes through minimal checks to ensure you know,

2677.74 -> your schema is right, your counts are accurate, so and so.

2682.75 -> And the second step of this is silver,

2685.15 -> which happens like if there is any transformations involved

2688.69 -> or more sophisticated data validation checks,

2691.84 -> eventually, you know,

2693.16 -> before it comes and lands in gold for distribution.

2697.48 -> Yes, that process also happens.

2700.45 -> The output of this scoring goes through

2703.36 -> that process inside of data factory

2705.61 -> before it goes out for distribution.

2713.59 -> There.

2714.85 -> This concludes all the five steps that we talked about

2717.88 -> in this village.

2720.43 -> But it's not, you know, just,

2724.15 -> it's people, process and technology all coming together

2728.5 -> to create what you're seeing on the screen here.

2732.22 -> A tidbit is, you know, IT can also contribute to ESG, right?

2737.83 -> How many times did you, you know,

2739.9 -> you can create a code that should be sustainable enough

2743.53 -> where you're not running X number of times

2746.62 -> and consuming your data center capacities.

2748.81 -> You can also contribute,

2750.25 -> all the engineers here can also contribute to ESG

2753.13 -> when you are designing a code.

2756.31 -> But, I'm not done yet.

2759.67 -> This is, yeah, very good.

2763.6 -> What next?

2764.86 -> Where is the resiliency in this picture, right?

2769.6 -> So that's the next question we usually get.

2772.09 -> Where is the resiliency?

2775.96 -> So we set some of the goals for our resiliency here

2780.13 -> where it says, okay, your RPO is

2782.77 -> X less than an hour, whatnot, right?

2785.44 -> And durability of your data that you have committed,

2790.12 -> how does it survive redundancy across your regions

2793.39 -> and how do you test your resiliency

2796.15 -> without impacting your users?

2800.23 -> This slide is just a high level view.

2802.36 -> I probably need another hour or so to talk through

2805.24 -> challenges across the resiliency

2807.61 -> for each of these architecture.

2809.98 -> But what I can state is being on serverless actually

2815.05 -> eliminated some of the complexities.

2817.24 -> I wouldn't say complexities,

2818.35 -> probably it's like additional steps that you have to

2821.83 -> probably implement with the non serverless

2824.95 -> way of building things, right?

2828.25 -> And example, things are coming out of the box for DynamoDB.

2832 -> You have global tables, S3, you replicate it,

2835 -> but with a catch that you know,

2836.74 -> if you have to go to more than two regions

2839.89 -> and you come into a mesh architecture,

2841.45 -> that's where it can get tricky for that replication, right?

2845.08 -> And at stages where we have a front-end

2848.38 -> interface was straightforward, but for data pipeline,

2852.58 -> which involves step functions and so on,

2854.47 -> you would need some customization

2857.77 -> where you have two options.

2860.08 -> One, either you start from where it has failed,

2862.87 -> rerunability, built rerunability in your code.

2865.54 -> Or you just start over from again, step one, right?

2870.7 -> Two, two choices to go with.

2874.48 -> All right.

2875.38 -> Probably I am getting bored, getting you all bored,

2877.42 -> not sure, but that's all from my part.

2879.94 -> This conclude, concluded my session here.

2882.85 -> Hope you had enjoyed the session and I now pass over to Sri

2887.38 -> for product outcome and closure.

2889.24 -> Thank you.

2898.87 -> - I promise this is the last slide, yeah.

2902.35 -> And we can take questions.

2903.37 -> So just to wrap it, wrap it up,

2908.17 -> this is the 10,000 feet level of what we have accomplished.

2912.22 -> The horizontals are the platform capabilities

2916.33 -> that we have built.

2917.32 -> The verticals or the rows versus the columns

2922.36 -> are the ones which talks about the process

2925.21 -> from inception to scoring and distribution.

2928.87 -> So data factory,

2931.09 -> what we haven't covered is Moody's acquisitions

2934.24 -> and how we consolidate data from the various sources,

2940.12 -> which are not the public documents.

2943.87 -> So that's done in the data factory.

2946.33 -> We haven't covered this at detail other than the scoring.

2950.44 -> And it converts, we consolidate the data, we score it,

2956.14 -> and we distribute the information.

2957.91 -> But it also has history and trending analytics

2961.87 -> of all the ESG data.

2964.75 -> The data capture is nothing but a workflow tool

2967.78 -> with the backbone of microservices,

2970.87 -> which we discussed at length.

2973.51 -> And this helps source hundreds and thousands of documents,

2979.9 -> run the machine learning,

2981.91 -> provide analysts a way to validate this data

2985.93 -> and certify this data at the level that they want,

2990.85 -> which can span between an entity to sector.

2995.44 -> And then this led, this also can take care of the,

2999.91 -> the same data can help with SFDR reporting as well.

3003.72 -> So the key business outcomes is we are able to

3009.03 -> launch the product and Q1 2022

3015.15 -> and this product contains, you know,

3019.5 -> we can't disclose the number,

3020.76 -> but it tens and thousands of entities,

3023.4 -> which is processed by, which is processing, you know,

3030.06 -> hundreds, hundreds and thousands of PDF documents,

3033.18 -> which is the sustainable reports, 10K, 10Q,

3036.27 -> and all of these has, each entity has

3041.1 -> more than 300 metrics that are scanned and,

3045.21 -> and if final score is also published.

3049.14 -> With that, we conclude the presentation

3051.87 -> and we really appreciate your time.

3054.04 -> Thank you.

Source: https://www.youtube.com/watch?v=tyM3OHT_0M8