AWS re:Invent 2022 - Learn how Black Knight is using AI to accelerate mortgage workflows (AIM214)

Aug 16, 2023

AWS re:Invent 2022 - Learn how Black Knight is using AI to accelerate mortgage workflows (AIM214)

Explore how Black Knight, a premier mortgage analytics company, is adopting AWS artificial intelligence (AI) services for its data and domains to process documents at scale. Learn how they use Amazon Textract to reduce manual processes, mitigate regulatory risks, and deliver significant cost savings to their clients, including many of the largest US lenders. The mortgage processing industry is complex due to the loan lifecycle, regulatory requirements, and the sophisticated data and analytics required to support each process. Hear how Black Knight uses AWS AI services to help clients improve and scale their business processes with automation and a complete solutions ecosystem that supports the entire real estate and mortgage lifecycle.

Learn more about AWS re:Invent at https://go.aws/3ikK4dD.

Subscribe:
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4

ABOUT AWS
Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.

AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

#reInvent2022 #AWSreInvent2022 #AWSEvents

Content

0.12 -> - So we are gonna be talking to you today

2.55 -> about how Black Knight,

5.76 -> one of the leading technology solutions providers

8.94 -> in the mortgage and the home equity lending space,

11.91 -> how they use AI solutions to power those capabilities

16.95 -> for their customers.

19.29 -> My name is Ari Krishnan and I'm the general manager

22.86 -> for Computer Vision Services at AWS.

25.32 -> That includes Amazon Recognition and Amazon Textract.

30.87 -> And what I'm going to be helping here today with is

34.2 -> provide you with the overview and introduction

38.85 -> to document processing,

40.05 -> and focus a little bit on what makes it really challenging

43.62 -> and gnarly problem to solve.

46.77 -> I'll share with you what are some of the pin points

50.61 -> that we have observed customers experience

55.17 -> when they try to build these capabilities

57.48 -> by themselves today

59.25 -> with the kinds of offerings that already exist.

62.79 -> I will then transition to share with you

65.43 -> how Amazon Textract,

68.28 -> which is our AI driven capability

71.4 -> for extraction of data from documents of all varieties,

75.69 -> how some of the capabilities over there help customers build

80.19 -> for these intelligent document processing solutions.

84.15 -> And then will come the really exciting part,

87.42 -> where Frank Poiesz,

89.76 -> the business strategy director

91.92 -> for mortgage origination technologies at Black Knight,

95.43 -> is going to share with you a deep practitioner's view

99.09 -> of how these solutions are built in the real world.

103.08 -> And hopefully that will allow you as developers,

107.88 -> as technology decision makers,

110.55 -> a good sense for how you may want to tackle these problems

114.36 -> as you go through your journey.

120.18 -> Turns out that despite the digitization

125.16 -> and the digitalization that's happening in the world,

128.94 -> there is a heck of a lot of paper

131.34 -> that still powers all sorts of businesses.

135.66 -> And this is happening virtually across all industries.

139.86 -> And barring healthcare,

141.45 -> probably one of the segments where the intensity,

145.2 -> the complexity of document processing is right up there,

150.27 -> is the mortgage and the home equity lending space.

155.88 -> We see that this industry

157.62 -> has some of the most document intensive workflows.

161.97 -> Consider a loan packet.

164.31 -> A loan packet that could have hundreds

166.74 -> if not thousands of pages across dozens of varieties

172.53 -> of different documents that contain within it.

175.47 -> That represents all sorts of data.

178.32 -> It can represent income statements, the borrower,

181.92 -> the core borrower's history of debt,

184.11 -> their credit history.

186.03 -> It can include,

187.08 -> it does include identity documents of all kinds.

190.17 -> It includes documents that try to make sense

192.78 -> of what the asset is that is being bought.

197.55 -> And this kind of complexity is something that happens

200.94 -> at massive scale every single day.

206.34 -> So how have customers tried to tackle

210.36 -> some of those challenges

211.56 -> when it comes to this incredible volume, variety

215.01 -> and diversity of document types

217.68 -> that they have to process today?

219.72 -> And typically it's fallen into one of three buckets.

225.51 -> The most common one is when customers leverage,

229.59 -> Optical Character Recognition or OCR tech.

233.64 -> Legacy OCR tech has been around for a long time now

237.81 -> and it's commonly used across

239.43 -> a variety of these document processing workflows.

242.76 -> Invariably what we have learned is that these technologies,

247.74 -> OCR technologies,

249.21 -> tend to work better on more simpler documents

253.56 -> and the extraction process invariably results

256.95 -> in a bag of words that tends to lose a lot

260.16 -> of the inherent context that the document contains.

264.78 -> It can strip away everything from

267.09 -> the structure of paragraphs, the tables,

268.86 -> the lines, the words,

270.33 -> which means that there is a heavy lifting

273.18 -> on the implementer, the developer,

276.12 -> to now start to make sense of what exactly was extracted.

281.01 -> The second big approach is manual processing

285.63 -> via human review.

286.463 -> And to be clear, these are not either or,

289.23 -> but manual processing via humans in the loop

292.5 -> is very pervasive in the industry.

296.76 -> But as you can imagine this is,

300.42 -> humans are tend to get tired, we tend to make errors.

305.82 -> And when it comes to managing

308.34 -> that kind of staffing at scale,

311.13 -> it doesn't always follow the demand cycle

316.44 -> that a customer may see when it comes to their end users.

319.11 -> And that's fundamentally challenging.

321.81 -> The third approach is around using rules and templates

329.64 -> to deal with the bag of words that have been extracted.

333.66 -> But what we have discovered then

334.95 -> is that when customers build these rules and templates,

337.38 -> they tend to be brittle.

339.03 -> They tend to be a brittle because ultimately

341.73 -> a human has to figure out exactly what template

344.37 -> and what rule to write,

345.84 -> and which may break when a new kind of document shows up

349.41 -> or it may have to be rewritten

351.27 -> if the underlying business workflow is changing.

358.29 -> When customers embark on this journey,

360.15 -> there are two big sets of issues

363.03 -> that are below the surface with these legacy approaches.

368.28 -> One big bucket is really around lost revenue.

372.03 -> And the lost revenue is really stems from the fact

374.94 -> that there is,

377.94 -> with legacy systems,

379.98 -> the way they're composed together,

381.45 -> it is invariably hard for them to grow and shrink

384.78 -> in elastic ways.

386.07 -> It is invariably harder for technologists

390.33 -> to build them and compose them

392.88 -> to leverage the best breed of technologies.

396.9 -> And that fundamentally is a throttle

398.97 -> on your ability to grow effectively.

402.54 -> The second big drawback is really around slowness

406.5 -> of the processing of data,

408.15 -> which means that the ability

409.44 -> to get to a high quality business decision

411.72 -> takes that much longer.

412.89 -> And that percolates through the entire business.

417.93 -> And this ultimately leads

419.46 -> to lower end customer satisfaction.

421.95 -> And which can drive churn.

423.9 -> On the flip side, we notice customers

427.47 -> who still deploy a lot of this legacy tech,

430.11 -> are also facing higher costs.

433.08 -> This can range from the staffing related costs

437.49 -> of managing and maintaining and keeping the lights on.

440.91 -> It can extend all the way to inaccurate extraction,

445.05 -> leading to suboptimal business decisions,

448.65 -> which can cost the business.

450.45 -> And then when you think about

451.62 -> a lot of these legacy technologies being composed together,

454.92 -> there invariably is a lot of scaffolding code,

458.79 -> a lot of tech debt that teams tend to accumulate

461.43 -> over a period of time.

466.23 -> So when we build, from the grounds up,

468.84 -> AI and machine learning services,

471.18 -> we do benefit from the fact that we have learned

473.94 -> from a lot of customers in the real world.

477.12 -> And that allows us to then build the kinds of capabilities

480.72 -> that take advantage of advances in computer vision,

486.12 -> national language processing,

487.62 -> and other machine learning innovations

489.99 -> to go beyond what traditional OCR techniques

493.5 -> have allowed us to do so.

497.67 -> The key benefits here are

499.38 -> that we are no longer talking about, you know,

501.48 -> the traditional good old fashioned OCR,

504.78 -> but we where we can now preserve document context

507.87 -> of complex forms, of complex tables,

511.5 -> where we can up level the process of data extraction

517.38 -> to think about things as documents that could be specialized

521.01 -> like identity documents or invoices, and receipts,

523.65 -> and much more.

525.12 -> And the goal here is, is that by doing so

528.39 -> we are able to get accurate and faster throughput

534.45 -> on that kind of data extraction that preserves the context,

537.3 -> which means developers and technologists

540 -> can then integrate those insights

541.89 -> as part of their business systems

543.51 -> that much more efficiently.

546.09 -> This has invariably the downstream effect

548.73 -> of reducing the total cost of processing,

551.7 -> especially when you think about the scale

553.35 -> at which these document processing workflows run.

559.17 -> So let's look at some of the features within Amazon Textract

565.35 -> that help customers get to more efficient ways

569.04 -> of document processing.

572.67 -> Some of you may already be familiar

575.13 -> with our traditional text extraction

577.5 -> and the OCR extraction capabilities.

579.87 -> But in addition to that there are lots of capabilities

582.33 -> that include handwritten text

585.3 -> as well as the ability to extract more complex structures

590.16 -> such as tables or nested tables,

592.56 -> forms of different kinds.

594.09 -> Being able to yank out key value pairs

598.5 -> that exist within the documents

600.03 -> while still preserving the context.

602.01 -> And this can happen across a wide variety of documents

606.54 -> that we have pre-trained the models on.

609.96 -> Financial statements, loan applications,

611.91 -> identity documents, and many more.

615.15 -> The other thing customers told us is that,

616.95 -> okay, we like your, you know, basic OCR,

619.68 -> and it's accurate and cheap and all that.

621.66 -> Great.

622.493 -> But could you specialize this

624.51 -> for specific kinds of documents

626.07 -> that we see commonly in our business workflows?

628.77 -> And here's where we can specialize things

631.14 -> like identity documents.

633.203 -> So for now it's US passports, US driver's licenses,

637.23 -> and so on so forth.

638.31 -> Or invoices and receipts that have very peculiar,

641.79 -> weird ways of representing data.

646.23 -> And customers then use them to accelerate their use cases.

649.53 -> So for example,

650.363 -> if you're using our analyze expense capability

654.27 -> for invoices and receipts,

655.62 -> you may have accounts payable, invoicing,

659.07 -> an internal financial processing workloads

663.66 -> that could go a little bit faster.

667.32 -> A place where I'm gonna spend

668.46 -> a little bit more time on today, is with Textract queries.

674.25 -> But before I get in there,

676.14 -> I wanna linger just a little bit to really dial in

679.89 -> on the data extraction challenges

684.66 -> that we know customers face.

688.62 -> And maybe some of this will deeply resonate with you too.

692.46 -> And if you're early in the journey then it serves hopefully

695.25 -> as for knowledge about the kinds of challenges

699.24 -> you will likely encounter.

702.54 -> The first thing is that data is just,

705.72 -> exists in so many variations across documents

710.13 -> or a similar stack of documents,

712.47 -> that it's not a given that the system

715.26 -> will auto-magically figure out that date of birth,

719.1 -> DOB, birth date, actually all mean the same thing.

724.83 -> And these are the little things that matter much

727.56 -> when it comes to data extraction accuracy.

731.37 -> The other thing is that how the data is laid out,

737.25 -> the structure with which it's represented

739.68 -> can also be incredibly complicated and diverse.

743.64 -> Data showing up in tables, nested tables,

747.12 -> forms that have very, very different structures

749.97 -> in terms of how they accept what kind of data,

753.03 -> whether it has fields that are labeled

756.03 -> or fields that are implied.

757.8 -> And all these variations exist within

760.83 -> a set of even related documents

762.81 -> that customers have to process.

765.66 -> And then you've got a bunch of variations

768.84 -> that are around how the data is laid out itself.

772.83 -> The orientation of the text.

775.44 -> How does a table get interspersed

779.1 -> between paragraphs of text.

781.44 -> How do section headers, putters,

784.29 -> how do they make sense to lend extra context

787.56 -> to the document itself?

790.56 -> All of these are real world challenges that customers face

794.1 -> and invariably, what do they have to do?

797.22 -> They end up writing a bunch of post-processing logic.

802.02 -> It's a bunch of code ultimately, that has to be written,

804.66 -> it has to be maintained.

807 -> Sometimes good old fashioned deterministic code

810.63 -> doesn't do the job.

812.85 -> So customers have to build customized

815.82 -> machine learning models.

817.89 -> And now when you start to think about the complexity

820.68 -> that you have to manage,

821.85 -> it gets expensive and time consuming.

827.28 -> So for those kinds of unique issues that exist pervasively,

833.97 -> these are not corner cases, they exist all the time.

837.33 -> We just don't know in which document is gonna show up

839.67 -> in what form.

841.08 -> For that reason,

842.31 -> and for particularly gnarly difficult documents

845.91 -> that are inherently difficult to process

848.28 -> through the set of OCR, OCR plus techniques,

851.91 -> we invented Textract queries.

854.97 -> So customers see the benefit of accuracy and speed

858.3 -> using the existing capabilities that we have

860.73 -> on forms and tables and OCR to detect text.

864.96 -> But when they hit against these roadblocks,

867.99 -> they have to figure out, you know, which key value pair,

871.32 -> which table value corresponds

872.91 -> to what information they're actually looking to extract

877.71 -> from a given document.

880.53 -> And so for such customers we built the Queries capability,

885.33 -> which enables them to specify using natural language queries

891.15 -> and extract pieces of information from these documents.

894.78 -> So given a document,

896.46 -> it enables a developer to express what exactly it is

900.27 -> that they're looking for.

901.103 -> What piece of information in a natural language form.

904.08 -> So what is the borrower's tax id?

907.35 -> What is the social security number of the co-borrower?

912 -> By doing that,

913.89 -> Textract Queries uses a combination of vision,

917.64 -> language, and spatial cues

921.33 -> to extract with much higher accuracy

923.85 -> exactly that piece of information

927.03 -> that the developer has requested

928.92 -> without having them had to build any customized model

933.09 -> or presenting new data to train such a model at all.

939.84 -> And because it's a simple Q and A based process,

942.69 -> it makes feel a lot more natural.

944.97 -> Like you may ask your colleague,

946.747 -> "Hey, can you tell me what this doc says

948.96 -> when it comes to this specific field?"

950.94 -> And because this is part of a, you know,

953.61 -> well formed API response,

955.86 -> this means that a developer can yank the output

959.07 -> and integrate it into whatever next is

961.83 -> in the business workflow.

962.67 -> Maybe it's an insert into a database

964.68 -> or maybe it kicks off a different scheduled workflow,

967.14 -> depending on what the response is.

970.95 -> Let's look at some examples here.

972.75 -> This is the common Fannie Mae Form 1003.

976.95 -> If you notice this form has has a borrower and a co-borrower

982.35 -> that's gonna split halfway through the page.

985.29 -> And traditional OCR will extract the information.

990.06 -> But now as a developer of a business application,

993.15 -> you're trying to figure out whose birthdate is this exactly?

998.13 -> Is it the borrower or the co-borrower?

999.96 -> And it turns out that systems fail

1002.96 -> at that seemingly common simple task for us humans

1007.1 -> when you're looking at it with our eyes.

1009.23 -> And so in this sort of a situation

1011.63 -> where there are these nested sections,

1013.94 -> you could ask Textract Query,

1016.16 -> what is the borrower's date of birth,

1018.56 -> or what is the co-borrower's date of birth?

1020.33 -> And you could issue many such queries for any given document

1025.01 -> and you will get precisely the answer to the query

1029.42 -> that you can then build into your application.

1034.61 -> Here's another example of a bank statement.

1038.09 -> You know, there are hundreds or thousands of banks,

1041.24 -> plus credit unions that exist at least in the United States.

1045.44 -> And this is an example of an implied field.

1050.18 -> So, highlighted in red is the customer's name and address,

1054.41 -> but notice that there's no field that says name colon,

1057.5 -> or address colon.

1059.57 -> And then on adjacent to it, within the box,

1063.71 -> is the bank's details.

1065.09 -> And there is a mailing address in there.

1067.43 -> Now when you try to extract data, you will get an address,

1070.01 -> but you don't know whose address it is.

1073.37 -> And this is where issuing a query

1077.57 -> to say, "What is the customer's name?"

1080.517 -> "What is the customer's address?"

1082.88 -> Is going to give you an accurate response back

1084.83 -> rather than having to decipher that once regular OCR

1088.76 -> that does its job.

1090.26 -> He's another more interesting example in a way

1093.41 -> of tabular data.

1096.35 -> Now if you notice the table here,

1098.03 -> this is, you know, a Pay stub.

1100.25 -> If you notice that the table here

1102.41 -> that's highlighted for gross pays,

1103.88 -> little different 'cause it's not a regular rows and columns.

1107.3 -> Gross pay is offset as a cell value under rate.

1113.12 -> Under the rate column header.

1114.35 -> And then you've got two values for, you know,

1116.48 -> for this period and the year to date gross pay.

1119.15 -> And turns out this again, seemingly simple problem,

1122.36 -> when it comes to actually using OCR

1124.19 -> tends to really complicate matters.

1126.71 -> And so in this example,

1128.36 -> a developer could issue a query that says,

1131.217 -> "What is the gross pay for this period?"

1133.76 -> And what Textract Queries does is, it takes the image,

1138.29 -> it takes the hint from the query

1140.99 -> that the developer has issued to say,

1142.737 -> "I must look for that kind of information."

1145.58 -> And that kind of abstraction,

1147.35 -> those cues are what help us extract that data

1151.46 -> with higher accuracy,

1152.96 -> even though the Textract system may not have seen

1156.41 -> that kind of document before.

1161.42 -> As I wrap up here,

1163.19 -> I do wanna share that we recently launched

1167.96 -> the Analyze Lending API,

1172.07 -> targeted to help customers build

1177.02 -> more efficient mortgage processing documents,

1180.47 -> mortgage document processing workflows.

1182.78 -> And what we observe that customers repeatedly follow

1186.56 -> certain set of workflows.

1188.24 -> Given a large set of documents,

1191.48 -> they have to classify them quickly,

1193.94 -> they need to redirect those documents

1196.58 -> to the right kind of ML models

1199.01 -> that are specialized to extract that data from them.

1203 -> They need to do a high quality job of data extraction,

1205.13 -> of course,

1206 -> and then run some validation.

1208.31 -> And since we saw these patterns happen time and time again,

1213.77 -> as we heard from our customers,

1215.03 -> we were able to build out this capability

1217.49 -> as a single well formed set of APIs

1220.97 -> to help customers just advance their journey

1223.22 -> a little bit more, with the Analyze Lending API.

1231.56 -> I would now love to transition

1233.96 -> to Frank's portion of the presentation,

1237.59 -> where he's gonna talk to us about how Black Knight

1240.53 -> extensively uses AI and ML technologies

1242.69 -> to accelerate mortgage processing workflows.

1245.54 -> Thank you, Ari.

1247.37 -> - Good afternoon everybody.

1249.98 -> So let's talk a little bit about who Black Knight is.

1253.82 -> Most of you don't know Black Knight.

1255.29 -> We are a financial services oriented technology company.

1259.13 -> We provide solutions to mortgage banks, banks,

1262.67 -> and other financial institutions that make mortgage loans

1265.43 -> and service them.

1268.19 -> Ari mentioned the residential mortgage

1270.26 -> and home equity lending markets.

1272.84 -> We do serve those markets in both origination,

1276.56 -> that is the making of the loans,

1278.45 -> as well as the servicing of the loans over their life.

1283.16 -> We have solutions that cover the entire value chain

1287.45 -> that a mortgage consumer experiences.

1290.66 -> So loan origination through our product called Empower,

1294.62 -> it's called a loan origination system.

1296.6 -> It has a series of other capabilities

1298.88 -> that are integrated with it,

1300.74 -> including our document service

1302.6 -> that allow us to handle the entire process for lenders,

1306.71 -> all in one system.

1308.63 -> Our loan servicing platform called MSP, is a market leader

1313.25 -> and services a very large percentage

1315.32 -> of the national mortgage market

1316.97 -> in terms of the loans being serviced over their entire life.

1320.54 -> Their life meaning up to 40 plus years.

1324.38 -> We also support the capital markets.

1326.51 -> The markets that provide funding

1329.06 -> for the people who want to borrow the money

1331.82 -> that finances their home through our Optimal Blue affiliate.

1335.69 -> Optimal Blue provides the loan pricing process

1338.48 -> as well as serves the capital markets needs of lenders

1342.26 -> in order to allow them to make more and more loans

1345.23 -> every year.

1346.7 -> We also have a data and analytics business

1349.19 -> that supports the entire industry

1351.17 -> by gathering real estate related data

1353.69 -> and providing it to professionals and organizations

1356.42 -> that need data to support their businesses.

1359.75 -> I'm gonna be talking today about our AIVA document service.

1364.4 -> AIVA is our artificial intelligence

1366.83 -> virtual assistant product.

1368.75 -> It is an AI and ML based solution

1371.66 -> that includes Textract as you'll see

1374.21 -> as we go through the rest of the presentation.

1378.41 -> Let's talk a little bit about why Ari's comments

1381.38 -> are so important.

1383.63 -> We're talking about, as he mentioned,

1385.37 -> thousands of financial institutions.

1388.46 -> 4,300 plus institutions in the United States

1392.21 -> provided information about loans that they made

1395.93 -> to regulators in 2021.

1401.03 -> Those entities originated $4.5 trillion worth of loans.

1407.6 -> The real estate finance industry

1409.49 -> is a significant driver of our economy.

1412.64 -> If you've been following the news lately,

1414.2 -> you know that these numbers are diminished lately

1417.77 -> because of higher interest rates.

1419.66 -> It's still a significant driver of economic activity.

1424.73 -> 13 and a half million loans were processed

1427.58 -> by organizations originating loans in 2021.

1432.08 -> During that same year,

1433.91 -> 12 and a half trillion dollars worth of financing

1437.09 -> was handled by the industry.

1438.53 -> 54 million loans serviced.

1441.47 -> Each of these loans is built on a stack of paper.

1446.21 -> And that stack of paper persists for up to 47 years.

1451.97 -> So the information that we gather

1454.88 -> in the beginning of the loan origination process

1458.54 -> is gonna persist for quite a long time.

1460.7 -> We have to get it right.

1464.36 -> This is a very complex industry,

1466.19 -> so when you think about all the paper,

1468.26 -> if you've bought a house

1470.06 -> and you've gone through the process

1472.25 -> of getting the loan closed

1474.11 -> and been concerned that there's a stack of paper

1476.45 -> about this high at the end of the process,

1479.48 -> it's largely because all of these entities

1483.05 -> have something to do in the process of financing a home.

1488.3 -> Of course, consumers will.

1489.47 -> They get paper,

1491.42 -> they get disclosures,

1492.62 -> they get information about their property.

1495.29 -> They have documents to sign,

1496.82 -> legal documents that evidence their debt.

1500.75 -> The lender has to get that documentation

1504.35 -> and gather information in order to underwrite the loan

1507.23 -> in order to know that it's a good loan.

1510.8 -> Banks finance these loans

1512.42 -> and put them on their balance sheets.

1514.04 -> They need to manage the risk

1515.54 -> associated with these transactions.

1518.87 -> The regulators, the FHFA, the FFIEC, and others,

1524.24 -> all regulate the institutions that make these loans

1527.54 -> so that consumers are treated fairly.

1530.96 -> And so that the financial system is not as fragile as it was

1536.24 -> prior to 2008 crisis.

1541.1 -> Going on, there's title services

1542.96 -> and settlement service agents

1544.34 -> that support the process of closing loans.

1547.7 -> Servicers, again, manage the loan throughout its lifetime.

1551.69 -> There are insurers in the process

1554.42 -> who help to provide secure funding.

1557.27 -> They create credit enhancements

1559.49 -> and they monitor the properties insurance.

1563.9 -> And finally, there are investors.

1567.11 -> This is one of the most interesting aspects

1569.12 -> of the mortgage lending process

1570.77 -> that most consumers don't know about.

1573.62 -> Lenders don't keep your loan on their books.

1577.88 -> They sell the loan.

1579.17 -> Many of you have heard from your lender,

1581.427 -> "Oh, we've sold your loan to another lender."

1584.3 -> That's because there is an active securities market

1588.02 -> that is driven by the debt in residential real estate

1593.21 -> in this country.

1594.14 -> It's how the industry works.

1595.91 -> All these stakeholders have an interest in the paperwork.

1602.21 -> Why is it still so crazy?

1604.67 -> There are hundreds of documents that support this process.

1608.57 -> At Black Night,

1609.403 -> we support more than 800 different document types.

1612.8 -> And we extract over 600 data points from those documents.

1616.58 -> And by the way,

1617.6 -> the demand for additional data is growing every day.

1622.19 -> As Ari mentioned,

1623.03 -> many of these documents are highly variable.

1625.49 -> We're gonna talk about bank statements in particular,

1627.8 -> and why that is such a,

1629.54 -> to use the great term Ari uses,

1631.94 -> such a gnarly problem.

1634.49 -> Many of these documents are not only highly variable,

1638.15 -> but very few of them are actually standardized.

1641 -> And even the ones that are standardized

1642.95 -> can be produced in different ways.

1645.05 -> That 1003 that was used as an example earlier

1649.1 -> is the standard application provided by

1651.89 -> and mandated by some of the regulatory bodies

1654.8 -> that manage the industry.

1656.54 -> Well, they can be handwritten,

1658.37 -> they can be produced by computer systems,

1660.89 -> the data can vary widely from one system to another.

1666.08 -> Most of the documents that support the process

1668.39 -> are originally in paper form.

1672.8 -> They're analog.

1675.92 -> The industry can drive improvement in this and has been

1679.64 -> for, well over 20 years.

1681.62 -> There have been legal abilities to e-sign documents

1684.65 -> and e-deliver documents.

1685.85 -> However, even today, 24% of consumers

1690.32 -> do not want to receive their documents digitally.

1694.52 -> 24% of the time we are printing them out

1697.22 -> and putting them in the mail.

1699.26 -> When they come back from the consumer,

1700.79 -> then we have to scan them and process them.

1704.42 -> And despite that 23 year old law, UETA,

1708.29 -> only 4% of loans are closed digitally,

1711.53 -> meaning the signing of the mortgage

1713.96 -> and all of the transactions that handle

1716.81 -> the moving of the funds from one party to another,

1720.47 -> only 4% are closed digitally.

1722.33 -> There's a lot of reasons for that.

1724.04 -> There's a whole session we could do about that process.

1728.33 -> So bottom line, it's very expensive.

1731.75 -> Just under $11,000 per loan

1735.92 -> is a statistic that the industry is fighting with today.

1739.4 -> Think about that.

1740.423 -> $11,000 per loan and millions of loans originated.

1747.62 -> So, what is our mission?

1750.02 -> AIVA Document Services has a mission of classifying

1752.42 -> those 800 plus document types.

1755.48 -> Classifying means not only knowing,

1757.91 -> okay, this document is a pay stub,

1760.49 -> but also is the second page also a pay stub.

1765.2 -> If there's an appraisal of a property,

1768.17 -> that appraisal can be five pages long or ten pages long.

1772.34 -> Understanding not only what the page is,

1775.52 -> but also what the document is.

1781.28 -> What was the solution that we arrived at?

1784.25 -> Black Knight has been working with Amazon

1786.32 -> for just about four years.

1788.66 -> Our CTO's right there, and we can ask him exactly how long,

1793.34 -> but we've been working with Amazon for a long period of time

1796.73 -> trying to slay this particular dragon.

1800.15 -> With a combination of AWS infrastructure,

1802.91 -> we're hosted on AWS,

1804.92 -> Amazon Textract,

1806.39 -> which is the subject of much of this conversation,

1809.54 -> and Black Knight's own proprietary AI and ML bot models,

1813.44 -> software and data,

1815.6 -> we've created an environment

1817.07 -> where we are finally making exceptionally good progress

1820.31 -> at nailing down those 800 documents.

1822.95 -> Classifying them correctly the first time,

1825.14 -> and extracting a large body of data

1828.2 -> that can save our customers money.

1831.65 -> We have a humans in the loop process

1834.65 -> because you have to have humans in the loop

1837.08 -> to read the ones you can't read with the computers.

1839.39 -> So if computer vision fails,

1841.58 -> we have to have a human read the document.

1843.77 -> If we don't get a return from Textract

1847.679 -> that gives us the information that we need,

1850.85 -> we have humans to do the reviews.

1852.95 -> So we have exception processes

1854.54 -> and we have processes to detect

1856.79 -> whether our models are starting to drift.

1860.18 -> One of the things we have to do with our AI and ML,

1863.09 -> just like everybody else does, our models can drift.

1866.69 -> We put new models into the wild, they can regress.

1870.35 -> So we watch them like a hawk.

1871.76 -> So we have humans doing essentially three things.

1875 -> On documents that can't be read, they're reading them,

1878.36 -> On documents that may have problems, they're checking them.

1883.28 -> And on documents that are coming through, just to be sure,

1887.99 -> we do sampling to make sure that we're getting it right.

1892.61 -> Our key strategies to solve the problem

1894.86 -> include a classifier, just an open source OCR product.

1900.2 -> Black Knight trained models,

1902.18 -> it's a transformer based natural language processing model

1905.66 -> that we've been using for several years.

1908.24 -> And it gets trained and retrained.

1909.92 -> And by the way, we also,

1911.87 -> for some documents we use Textract.

1916.55 -> We also do extraction using largely

1920.27 -> the Textract knowledge graph.

1922.22 -> The information that comes out of Textract,

1924.53 -> we use all the tools that Ari mentioned,

1928.49 -> and we've pick the best of them,

1930.14 -> but we're still handling 800 plus document types.

1933.77 -> And we have a lot of complexity there

1937.49 -> that we have to solve for our clients

1939.92 -> that just has not been able to be accomplished

1942.56 -> across the industry.

1943.49 -> So we focus hard for our clients

1945.08 -> on getting extraction right.

1947.51 -> So we've got proprietary models,

1949.13 -> we've got the Textract knowledge graph,

1952.25 -> and we have a business rules system to use the data.

1956.3 -> And this is key.

1958.73 -> It's one thing to extract data

1960.5 -> and it's one thing to get back documents.

1963.47 -> You're gonna save time by avoiding keyboarding,

1966.56 -> but what you're going to do the most

1969.08 -> by having this data available,

1971.21 -> is be able to apply business logic to accelerate processing.

1978.38 -> Don't just tell somebody that they don't have to keyboard,

1983.93 -> but bypass steps.

1986.18 -> I have all the W-2s for borrower one.

1990.32 -> Check.

1991.28 -> No human human had to look at it.

1993.83 -> That's because we have a document called a W-2,

1996.47 -> we extracted information from it

1998.06 -> that tells us that that's for borrower one.

2001.15 -> We can then pull all the document data,

2002.95 -> all the data about their income and populate the system

2006.31 -> so that a human doesn't have to touch the analysis necessary

2010.66 -> to do income validation.

2014.41 -> The business rules are the key element that drives value.

2021.76 -> So let's talk about bank statements for a minute.

2024.49 -> Again, big gnarly problem.

2027.34 -> There are 9,600 plus financial institutions in this country

2031.75 -> that are federally insured.

2034.03 -> So banks, credit unions, thrift.

2037.54 -> Each one of those has it's own bank statement.

2041.2 -> Even though there are a number of providers

2043.15 -> that are the same across banks,

2045.97 -> every bank configures them their own way.

2049.6 -> And every consumer's bank statement is different.

2054.58 -> Some consumers may have one checking account.

2058.39 -> Another consumer may have two checking accounts

2061.45 -> and a savings account.

2063.19 -> Another consumer may have a checking account,

2066.1 -> an investment account, an IRA account,

2068.08 -> and a business account.

2070.81 -> And they may all be on the same statement.

2073.81 -> So we have very, very different documents.

2078.97 -> There's also a complexity that may seem surprising,

2082.24 -> but it's not when you think about it.

2085.33 -> Transaction lists are one of the important things

2089.29 -> that we have to look at in the process.

2091.63 -> So we have to look through a list of transactions to decide

2095.68 -> is there a transaction of interest to an underwriter.

2099.61 -> As I review the document,

2100.99 -> I might be looking for large transactions that are unusual.

2105.16 -> So I have to be able to read every row

2109.27 -> in a table of transactions that may span multiple pages.

2114.55 -> And be able to identify the separate tables

2116.98 -> within that document, and extract the data reliably.

2122.08 -> This is where Textract tables comes in.

2125.23 -> However,

2126.25 -> it's all of that complexity

2128.65 -> and the fact that the tables are oriented differently,

2132.67 -> we had to do more to get it to be reliable

2135.19 -> to the point where we could drive those business rules.

2137.74 -> Driving that business value meant

2139.15 -> we had to be able to do things like

2141.28 -> take the beginning balance and the ending balance,

2143.35 -> all the transactions add up to those two numbers.

2147.76 -> We have to be able to reconcile that data

2149.8 -> in order to know we got it right.

2152.71 -> So, what we did is partner with Amazon.

2157.87 -> The Amazon Textract team got together with our team,

2161.44 -> we studied the problem,

2163.63 -> and the Amazon team delivered a model

2168.1 -> and a set of capabilities,

2169.48 -> a set of rules and model adjustments

2173.02 -> that the Black Knight team took delivery of

2175.27 -> early in the year.

2178.24 -> We then started tuning and improving that basic framework.

2184.03 -> And you can see it was a great framework

2185.68 -> because we went from very, very low scores.

2189.55 -> This is, for those of you who are statistics oriented,

2192.041 -> this is f1, a measure of accuracy that we use.

2195.7 -> We had very, very low scores at the beginning of the year.

2197.92 -> It took off very quickly,

2199.45 -> but didn't get where we needed it to for several months.

2202.84 -> But during the time from July until the end of September,

2206.47 -> we were able to achieve a 0.9 F1

2209.98 -> in handling this very complex problem.

2214.03 -> This is a great success because what that means is that

2218.26 -> 90% of the time we can do what I told you earlier.

2223.87 -> We can see everything in the bank statement,

2226.24 -> we can identify transactions of interest

2228.85 -> and we can apply business rules

2230.62 -> to help solve our customer's business problem.

2235.84 -> So, what is AIVA's value proposition for our customers?

2240.94 -> What's the ROI to our customers?

2243.52 -> Well, we are able to process now

2245.2 -> about 14 million pages a month.

2247.84 -> We are extracting over 7 million data points every month.

2254.29 -> And our clients are able to generate significant savings

2258.91 -> and the savings vary from customer to customer

2261.43 -> depending on what kinds of loans they make,

2263.8 -> but we're seeing seconds per page.

2265.81 -> Remember 14 million pages.

2268.33 -> That's a lot of time.

2270.04 -> Seconds to minutes for each data point.

2272.89 -> And remember, the minutes can turn into hours

2275.59 -> if we can take it a number of data points

2278.14 -> and use them to drive real business process improvement.

2282.64 -> And that's the phase that we're entering now at Black Knight

2286.18 -> to serve our customers in partnership with Amazon.

2292.69 -> Adoption is accelerating.

2295.21 -> We have a lineup of customers

2297.31 -> wanting to adopt this technology.

2299.44 -> And we are seeing real value among our customers

2304.36 -> and other industries as well.

2306.88 -> So it's a very exciting place to be right now.

2312.22 -> And back to Ari.

2316.72 -> Thank you.

2318.25 -> - Thanks, Frank.

2319.15 -> So,

2322.06 -> we understand though that getting through this journey

2324.85 -> can be complicated,

2325.81 -> especially given all the systems customers have to build.

2329.05 -> And so for that,

2329.92 -> there are numerous AWS partners that we work with closely

2334.66 -> to help customers solve

2338.17 -> for their toughest business problems here.

2342.07 -> There's also a nifty little matrix

2345.82 -> to help some of you who are going to embark on this journey,

2349 -> decide how do you want make some of those tradeoffs

2351.67 -> in terms of where you want to invest in-house

2354.28 -> versus where you may want to work with a partner

2358.69 -> to help you build those solutions.

2364.39 -> I do wanna wrap up here with a few links

2367.06 -> and then open up sufficient time for us

2368.83 -> to chat and answer any questions that you may have.

2373 -> But there are a number of different sources of documents,

2379.9 -> Blog posts,

2382.15 -> and source code through numerous samples

2384.49 -> that we've put upon GitHub

2386.74 -> for developers to get their hands dirty with.

2389.47 -> To play with the product, play with the capabilities,

2391.9 -> as well as several prototype solutions

2396.28 -> for you to see how that might make sense for you to do.

2399.64 -> And I would encourage you,

2403.72 -> the builders here,

2404.86 -> to give that a shot, should you choose to do so.

2410.17 -> With that,

2411.003 -> we'd like to wrap up kind of the formal presentation here.

2413.89 -> I wanna of course deeply thank Frank

2418.03 -> and the Black Knight team for the partnership.

2421.9 -> We know that we learn deeply from them

2425.08 -> and that pushes us to build better capabilities

2428.59 -> and together we are able to hopefully solve problems

2431.86 -> for as many different kinds of customers

2434.08 -> as we can as possible.

2436.72 -> And of course, you know,

2437.553 -> with Black Knight leading the charge,

2438.97 -> and how they're able to deliver these cutting edge solutions

2441.67 -> for customers in this space,

2442.79 -> we're just delighted that we can be part of that journey

2445.81 -> with them.

2446.643 -> So with that, we have a lot of time I think

2451.06 -> for us to talk about questions if we have any.

Source: https://www.youtube.com/watch?v=OEJ24XGMbEI