AWS re:Invent 2022 - Learn how Black Knight is using AI to accelerate mortgage workflows (AIM214)

AWS re:Invent 2022 - Learn how Black Knight is using AI to accelerate mortgage workflows (AIM214)


AWS re:Invent 2022 - Learn how Black Knight is using AI to accelerate mortgage workflows (AIM214)

Explore how Black Knight, a premier mortgage analytics company, is adopting AWS artificial intelligence (AI) services for its data and domains to process documents at scale. Learn how they use Amazon Textract to reduce manual processes, mitigate regulatory risks, and deliver significant cost savings to their clients, including many of the largest US lenders. The mortgage processing industry is complex due to the loan lifecycle, regulatory requirements, and the sophisticated data and analytics required to support each process. Hear how Black Knight uses AWS AI services to help clients improve and scale their business processes with automation and a complete solutions ecosystem that supports the entire real estate and mortgage lifecycle.

Learn more about AWS re:Invent at https://go.aws/3ikK4dD.

Subscribe:
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4

ABOUT AWS
Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.

AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

#reInvent2022 #AWSreInvent2022 #AWSEvents


Content

0.12 -> - So we are gonna be talking to you today
2.55 -> about how Black Knight,
5.76 -> one of the leading technology solutions providers
8.94 -> in the mortgage and the home equity lending space,
11.91 -> how they use AI solutions to power those capabilities
16.95 -> for their customers.
19.29 -> My name is Ari Krishnan and I'm the general manager
22.86 -> for Computer Vision Services at AWS.
25.32 -> That includes Amazon Recognition and Amazon Textract.
30.87 -> And what I'm going to be helping here today with is
34.2 -> provide you with the overview and introduction
38.85 -> to document processing,
40.05 -> and focus a little bit on what makes it really challenging
43.62 -> and gnarly problem to solve.
46.77 -> I'll share with you what are some of the pin points
50.61 -> that we have observed customers experience
55.17 -> when they try to build these capabilities
57.48 -> by themselves today
59.25 -> with the kinds of offerings that already exist.
62.79 -> I will then transition to share with you
65.43 -> how Amazon Textract,
68.28 -> which is our AI driven capability
71.4 -> for extraction of data from documents of all varieties,
75.69 -> how some of the capabilities over there help customers build
80.19 -> for these intelligent document processing solutions.
84.15 -> And then will come the really exciting part,
87.42 -> where Frank Poiesz,
89.76 -> the business strategy director
91.92 -> for mortgage origination technologies at Black Knight,
95.43 -> is going to share with you a deep practitioner's view
99.09 -> of how these solutions are built in the real world.
103.08 -> And hopefully that will allow you as developers,
107.88 -> as technology decision makers,
110.55 -> a good sense for how you may want to tackle these problems
114.36 -> as you go through your journey.
120.18 -> Turns out that despite the digitization
125.16 -> and the digitalization that's happening in the world,
128.94 -> there is a heck of a lot of paper
131.34 -> that still powers all sorts of businesses.
135.66 -> And this is happening virtually across all industries.
139.86 -> And barring healthcare,
141.45 -> probably one of the segments where the intensity,
145.2 -> the complexity of document processing is right up there,
150.27 -> is the mortgage and the home equity lending space.
155.88 -> We see that this industry
157.62 -> has some of the most document intensive workflows.
161.97 -> Consider a loan packet.
164.31 -> A loan packet that could have hundreds
166.74 -> if not thousands of pages across dozens of varieties
172.53 -> of different documents that contain within it.
175.47 -> That represents all sorts of data.
178.32 -> It can represent income statements, the borrower,
181.92 -> the core borrower's history of debt,
184.11 -> their credit history.
186.03 -> It can include,
187.08 -> it does include identity documents of all kinds.
190.17 -> It includes documents that try to make sense
192.78 -> of what the asset is that is being bought.
197.55 -> And this kind of complexity is something that happens
200.94 -> at massive scale every single day.
206.34 -> So how have customers tried to tackle
210.36 -> some of those challenges
211.56 -> when it comes to this incredible volume, variety
215.01 -> and diversity of document types
217.68 -> that they have to process today?
219.72 -> And typically it's fallen into one of three buckets.
225.51 -> The most common one is when customers leverage,
229.59 -> Optical Character Recognition or OCR tech.
233.64 -> Legacy OCR tech has been around for a long time now
237.81 -> and it's commonly used across
239.43 -> a variety of these document processing workflows.
242.76 -> Invariably what we have learned is that these technologies,
247.74 -> OCR technologies,
249.21 -> tend to work better on more simpler documents
253.56 -> and the extraction process invariably results
256.95 -> in a bag of words that tends to lose a lot
260.16 -> of the inherent context that the document contains.
264.78 -> It can strip away everything from
267.09 -> the structure of paragraphs, the tables,
268.86 -> the lines, the words,
270.33 -> which means that there is a heavy lifting
273.18 -> on the implementer, the developer,
276.12 -> to now start to make sense of what exactly was extracted.
281.01 -> The second big approach is manual processing
285.63 -> via human review.
286.463 -> And to be clear, these are not either or,
289.23 -> but manual processing via humans in the loop
292.5 -> is very pervasive in the industry.
296.76 -> But as you can imagine this is,
300.42 -> humans are tend to get tired, we tend to make errors.
305.82 -> And when it comes to managing
308.34 -> that kind of staffing at scale,
311.13 -> it doesn't always follow the demand cycle
316.44 -> that a customer may see when it comes to their end users.
319.11 -> And that's fundamentally challenging.
321.81 -> The third approach is around using rules and templates
329.64 -> to deal with the bag of words that have been extracted.
333.66 -> But what we have discovered then
334.95 -> is that when customers build these rules and templates,
337.38 -> they tend to be brittle.
339.03 -> They tend to be a brittle because ultimately
341.73 -> a human has to figure out exactly what template
344.37 -> and what rule to write,
345.84 -> and which may break when a new kind of document shows up
349.41 -> or it may have to be rewritten
351.27 -> if the underlying business workflow is changing.
358.29 -> When customers embark on this journey,
360.15 -> there are two big sets of issues
363.03 -> that are below the surface with these legacy approaches.
368.28 -> One big bucket is really around lost revenue.
372.03 -> And the lost revenue is really stems from the fact
374.94 -> that there is,
377.94 -> with legacy systems,
379.98 -> the way they're composed together,
381.45 -> it is invariably hard for them to grow and shrink
384.78 -> in elastic ways.
386.07 -> It is invariably harder for technologists
390.33 -> to build them and compose them
392.88 -> to leverage the best breed of technologies.
396.9 -> And that fundamentally is a throttle
398.97 -> on your ability to grow effectively.
402.54 -> The second big drawback is really around slowness
406.5 -> of the processing of data,
408.15 -> which means that the ability
409.44 -> to get to a high quality business decision
411.72 -> takes that much longer.
412.89 -> And that percolates through the entire business.
417.93 -> And this ultimately leads
419.46 -> to lower end customer satisfaction.
421.95 -> And which can drive churn.
423.9 -> On the flip side, we notice customers
427.47 -> who still deploy a lot of this legacy tech,
430.11 -> are also facing higher costs.
433.08 -> This can range from the staffing related costs
437.49 -> of managing and maintaining and keeping the lights on.
440.91 -> It can extend all the way to inaccurate extraction,
445.05 -> leading to suboptimal business decisions,
448.65 -> which can cost the business.
450.45 -> And then when you think about
451.62 -> a lot of these legacy technologies being composed together,
454.92 -> there invariably is a lot of scaffolding code,
458.79 -> a lot of tech debt that teams tend to accumulate
461.43 -> over a period of time.
466.23 -> So when we build, from the grounds up,
468.84 -> AI and machine learning services,
471.18 -> we do benefit from the fact that we have learned
473.94 -> from a lot of customers in the real world.
477.12 -> And that allows us to then build the kinds of capabilities
480.72 -> that take advantage of advances in computer vision,
486.12 -> national language processing,
487.62 -> and other machine learning innovations
489.99 -> to go beyond what traditional OCR techniques
493.5 -> have allowed us to do so.
497.67 -> The key benefits here are
499.38 -> that we are no longer talking about, you know,
501.48 -> the traditional good old fashioned OCR,
504.78 -> but we where we can now preserve document context
507.87 -> of complex forms, of complex tables,
511.5 -> where we can up level the process of data extraction
517.38 -> to think about things as documents that could be specialized
521.01 -> like identity documents or invoices, and receipts,
523.65 -> and much more.
525.12 -> And the goal here is, is that by doing so
528.39 -> we are able to get accurate and faster throughput
534.45 -> on that kind of data extraction that preserves the context,
537.3 -> which means developers and technologists
540 -> can then integrate those insights
541.89 -> as part of their business systems
543.51 -> that much more efficiently.
546.09 -> This has invariably the downstream effect
548.73 -> of reducing the total cost of processing,
551.7 -> especially when you think about the scale
553.35 -> at which these document processing workflows run.
559.17 -> So let's look at some of the features within Amazon Textract
565.35 -> that help customers get to more efficient ways
569.04 -> of document processing.
572.67 -> Some of you may already be familiar
575.13 -> with our traditional text extraction
577.5 -> and the OCR extraction capabilities.
579.87 -> But in addition to that there are lots of capabilities
582.33 -> that include handwritten text
585.3 -> as well as the ability to extract more complex structures
590.16 -> such as tables or nested tables,
592.56 -> forms of different kinds.
594.09 -> Being able to yank out key value pairs
598.5 -> that exist within the documents
600.03 -> while still preserving the context.
602.01 -> And this can happen across a wide variety of documents
606.54 -> that we have pre-trained the models on.
609.96 -> Financial statements, loan applications,
611.91 -> identity documents, and many more.
615.15 -> The other thing customers told us is that,
616.95 -> okay, we like your, you know, basic OCR,
619.68 -> and it's accurate and cheap and all that.
621.66 -> Great.
622.493 -> But could you specialize this
624.51 -> for specific kinds of documents
626.07 -> that we see commonly in our business workflows?
628.77 -> And here's where we can specialize things
631.14 -> like identity documents.
633.203 -> So for now it's US passports, US driver's licenses,
637.23 -> and so on so forth.
638.31 -> Or invoices and receipts that have very peculiar,
641.79 -> weird ways of representing data.
646.23 -> And customers then use them to accelerate their use cases.
649.53 -> So for example,
650.363 -> if you're using our analyze expense capability
654.27 -> for invoices and receipts,
655.62 -> you may have accounts payable, invoicing,
659.07 -> an internal financial processing workloads
663.66 -> that could go a little bit faster.
667.32 -> A place where I'm gonna spend
668.46 -> a little bit more time on today, is with Textract queries.
674.25 -> But before I get in there,
676.14 -> I wanna linger just a little bit to really dial in
679.89 -> on the data extraction challenges
684.66 -> that we know customers face.
688.62 -> And maybe some of this will deeply resonate with you too.
692.46 -> And if you're early in the journey then it serves hopefully
695.25 -> as for knowledge about the kinds of challenges
699.24 -> you will likely encounter.
702.54 -> The first thing is that data is just,
705.72 -> exists in so many variations across documents
710.13 -> or a similar stack of documents,
712.47 -> that it's not a given that the system
715.26 -> will auto-magically figure out that date of birth,
719.1 -> DOB, birth date, actually all mean the same thing.
724.83 -> And these are the little things that matter much
727.56 -> when it comes to data extraction accuracy.
731.37 -> The other thing is that how the data is laid out,
737.25 -> the structure with which it's represented
739.68 -> can also be incredibly complicated and diverse.
743.64 -> Data showing up in tables, nested tables,
747.12 -> forms that have very, very different structures
749.97 -> in terms of how they accept what kind of data,
753.03 -> whether it has fields that are labeled
756.03 -> or fields that are implied.
757.8 -> And all these variations exist within
760.83 -> a set of even related documents
762.81 -> that customers have to process.
765.66 -> And then you've got a bunch of variations
768.84 -> that are around how the data is laid out itself.
772.83 -> The orientation of the text.
775.44 -> How does a table get interspersed
779.1 -> between paragraphs of text.
781.44 -> How do section headers, putters,
784.29 -> how do they make sense to lend extra context
787.56 -> to the document itself?
790.56 -> All of these are real world challenges that customers face
794.1 -> and invariably, what do they have to do?
797.22 -> They end up writing a bunch of post-processing logic.
802.02 -> It's a bunch of code ultimately, that has to be written,
804.66 -> it has to be maintained.
807 -> Sometimes good old fashioned deterministic code
810.63 -> doesn't do the job.
812.85 -> So customers have to build customized
815.82 -> machine learning models.
817.89 -> And now when you start to think about the complexity
820.68 -> that you have to manage,
821.85 -> it gets expensive and time consuming.
827.28 -> So for those kinds of unique issues that exist pervasively,
833.97 -> these are not corner cases, they exist all the time.
837.33 -> We just don't know in which document is gonna show up
839.67 -> in what form.
841.08 -> For that reason,
842.31 -> and for particularly gnarly difficult documents
845.91 -> that are inherently difficult to process
848.28 -> through the set of OCR, OCR plus techniques,
851.91 -> we invented Textract queries.
854.97 -> So customers see the benefit of accuracy and speed
858.3 -> using the existing capabilities that we have
860.73 -> on forms and tables and OCR to detect text.
864.96 -> But when they hit against these roadblocks,
867.99 -> they have to figure out, you know, which key value pair,
871.32 -> which table value corresponds
872.91 -> to what information they're actually looking to extract
877.71 -> from a given document.
880.53 -> And so for such customers we built the Queries capability,
885.33 -> which enables them to specify using natural language queries
891.15 -> and extract pieces of information from these documents.
894.78 -> So given a document,
896.46 -> it enables a developer to express what exactly it is
900.27 -> that they're looking for.
901.103 -> What piece of information in a natural language form.
904.08 -> So what is the borrower's tax id?
907.35 -> What is the social security number of the co-borrower?
912 -> By doing that,
913.89 -> Textract Queries uses a combination of vision,
917.64 -> language, and spatial cues
921.33 -> to extract with much higher accuracy
923.85 -> exactly that piece of information
927.03 -> that the developer has requested
928.92 -> without having them had to build any customized model
933.09 -> or presenting new data to train such a model at all.
939.84 -> And because it's a simple Q and A based process,
942.69 -> it makes feel a lot more natural.
944.97 -> Like you may ask your colleague,
946.747 -> "Hey, can you tell me what this doc says
948.96 -> when it comes to this specific field?"
950.94 -> And because this is part of a, you know,
953.61 -> well formed API response,
955.86 -> this means that a developer can yank the output
959.07 -> and integrate it into whatever next is
961.83 -> in the business workflow.
962.67 -> Maybe it's an insert into a database
964.68 -> or maybe it kicks off a different scheduled workflow,
967.14 -> depending on what the response is.
970.95 -> Let's look at some examples here.
972.75 -> This is the common Fannie Mae Form 1003.
976.95 -> If you notice this form has has a borrower and a co-borrower
982.35 -> that's gonna split halfway through the page.
985.29 -> And traditional OCR will extract the information.
990.06 -> But now as a developer of a business application,
993.15 -> you're trying to figure out whose birthdate is this exactly?
998.13 -> Is it the borrower or the co-borrower?
999.96 -> And it turns out that systems fail
1002.96 -> at that seemingly common simple task for us humans
1007.1 -> when you're looking at it with our eyes.
1009.23 -> And so in this sort of a situation
1011.63 -> where there are these nested sections,
1013.94 -> you could ask Textract Query,
1016.16 -> what is the borrower's date of birth,
1018.56 -> or what is the co-borrower's date of birth?
1020.33 -> And you could issue many such queries for any given document
1025.01 -> and you will get precisely the answer to the query
1029.42 -> that you can then build into your application.
1034.61 -> Here's another example of a bank statement.
1038.09 -> You know, there are hundreds or thousands of banks,
1041.24 -> plus credit unions that exist at least in the United States.
1045.44 -> And this is an example of an implied field.
1050.18 -> So, highlighted in red is the customer's name and address,
1054.41 -> but notice that there's no field that says name colon,
1057.5 -> or address colon.
1059.57 -> And then on adjacent to it, within the box,
1063.71 -> is the bank's details.
1065.09 -> And there is a mailing address in there.
1067.43 -> Now when you try to extract data, you will get an address,
1070.01 -> but you don't know whose address it is.
1073.37 -> And this is where issuing a query
1077.57 -> to say, "What is the customer's name?"
1080.517 -> "What is the customer's address?"
1082.88 -> Is going to give you an accurate response back
1084.83 -> rather than having to decipher that once regular OCR
1088.76 -> that does its job.
1090.26 -> He's another more interesting example in a way
1093.41 -> of tabular data.
1096.35 -> Now if you notice the table here,
1098.03 -> this is, you know, a Pay stub.
1100.25 -> If you notice that the table here
1102.41 -> that's highlighted for gross pays,
1103.88 -> little different 'cause it's not a regular rows and columns.
1107.3 -> Gross pay is offset as a cell value under rate.
1113.12 -> Under the rate column header.
1114.35 -> And then you've got two values for, you know,
1116.48 -> for this period and the year to date gross pay.
1119.15 -> And turns out this again, seemingly simple problem,
1122.36 -> when it comes to actually using OCR
1124.19 -> tends to really complicate matters.
1126.71 -> And so in this example,
1128.36 -> a developer could issue a query that says,
1131.217 -> "What is the gross pay for this period?"
1133.76 -> And what Textract Queries does is, it takes the image,
1138.29 -> it takes the hint from the query
1140.99 -> that the developer has issued to say,
1142.737 -> "I must look for that kind of information."
1145.58 -> And that kind of abstraction,
1147.35 -> those cues are what help us extract that data
1151.46 -> with higher accuracy,
1152.96 -> even though the Textract system may not have seen
1156.41 -> that kind of document before.
1161.42 -> As I wrap up here,
1163.19 -> I do wanna share that we recently launched
1167.96 -> the Analyze Lending API,
1172.07 -> targeted to help customers build
1177.02 -> more efficient mortgage processing documents,
1180.47 -> mortgage document processing workflows.
1182.78 -> And what we observe that customers repeatedly follow
1186.56 -> certain set of workflows.
1188.24 -> Given a large set of documents,
1191.48 -> they have to classify them quickly,
1193.94 -> they need to redirect those documents
1196.58 -> to the right kind of ML models
1199.01 -> that are specialized to extract that data from them.
1203 -> They need to do a high quality job of data extraction,
1205.13 -> of course,
1206 -> and then run some validation.
1208.31 -> And since we saw these patterns happen time and time again,
1213.77 -> as we heard from our customers,
1215.03 -> we were able to build out this capability
1217.49 -> as a single well formed set of APIs
1220.97 -> to help customers just advance their journey
1223.22 -> a little bit more, with the Analyze Lending API.
1231.56 -> I would now love to transition
1233.96 -> to Frank's portion of the presentation,
1237.59 -> where he's gonna talk to us about how Black Knight
1240.53 -> extensively uses AI and ML technologies
1242.69 -> to accelerate mortgage processing workflows.
1245.54 -> Thank you, Ari.
1247.37 -> - Good afternoon everybody.
1249.98 -> So let's talk a little bit about who Black Knight is.
1253.82 -> Most of you don't know Black Knight.
1255.29 -> We are a financial services oriented technology company.
1259.13 -> We provide solutions to mortgage banks, banks,
1262.67 -> and other financial institutions that make mortgage loans
1265.43 -> and service them.
1268.19 -> Ari mentioned the residential mortgage
1270.26 -> and home equity lending markets.
1272.84 -> We do serve those markets in both origination,
1276.56 -> that is the making of the loans,
1278.45 -> as well as the servicing of the loans over their life.
1283.16 -> We have solutions that cover the entire value chain
1287.45 -> that a mortgage consumer experiences.
1290.66 -> So loan origination through our product called Empower,
1294.62 -> it's called a loan origination system.
1296.6 -> It has a series of other capabilities
1298.88 -> that are integrated with it,
1300.74 -> including our document service
1302.6 -> that allow us to handle the entire process for lenders,
1306.71 -> all in one system.
1308.63 -> Our loan servicing platform called MSP, is a market leader
1313.25 -> and services a very large percentage
1315.32 -> of the national mortgage market
1316.97 -> in terms of the loans being serviced over their entire life.
1320.54 -> Their life meaning up to 40 plus years.
1324.38 -> We also support the capital markets.
1326.51 -> The markets that provide funding
1329.06 -> for the people who want to borrow the money
1331.82 -> that finances their home through our Optimal Blue affiliate.
1335.69 -> Optimal Blue provides the loan pricing process
1338.48 -> as well as serves the capital markets needs of lenders
1342.26 -> in order to allow them to make more and more loans
1345.23 -> every year.
1346.7 -> We also have a data and analytics business
1349.19 -> that supports the entire industry
1351.17 -> by gathering real estate related data
1353.69 -> and providing it to professionals and organizations
1356.42 -> that need data to support their businesses.
1359.75 -> I'm gonna be talking today about our AIVA document service.
1364.4 -> AIVA is our artificial intelligence
1366.83 -> virtual assistant product.
1368.75 -> It is an AI and ML based solution
1371.66 -> that includes Textract as you'll see
1374.21 -> as we go through the rest of the presentation.
1378.41 -> Let's talk a little bit about why Ari's comments
1381.38 -> are so important.
1383.63 -> We're talking about, as he mentioned,
1385.37 -> thousands of financial institutions.
1388.46 -> 4,300 plus institutions in the United States
1392.21 -> provided information about loans that they made
1395.93 -> to regulators in 2021.
1401.03 -> Those entities originated $4.5 trillion worth of loans.
1407.6 -> The real estate finance industry
1409.49 -> is a significant driver of our economy.
1412.64 -> If you've been following the news lately,
1414.2 -> you know that these numbers are diminished lately
1417.77 -> because of higher interest rates.
1419.66 -> It's still a significant driver of economic activity.
1424.73 -> 13 and a half million loans were processed
1427.58 -> by organizations originating loans in 2021.
1432.08 -> During that same year,
1433.91 -> 12 and a half trillion dollars worth of financing
1437.09 -> was handled by the industry.
1438.53 -> 54 million loans serviced.
1441.47 -> Each of these loans is built on a stack of paper.
1446.21 -> And that stack of paper persists for up to 47 years.
1451.97 -> So the information that we gather
1454.88 -> in the beginning of the loan origination process
1458.54 -> is gonna persist for quite a long time.
1460.7 -> We have to get it right.
1464.36 -> This is a very complex industry,
1466.19 -> so when you think about all the paper,
1468.26 -> if you've bought a house
1470.06 -> and you've gone through the process
1472.25 -> of getting the loan closed
1474.11 -> and been concerned that there's a stack of paper
1476.45 -> about this high at the end of the process,
1479.48 -> it's largely because all of these entities
1483.05 -> have something to do in the process of financing a home.
1488.3 -> Of course, consumers will.
1489.47 -> They get paper,
1491.42 -> they get disclosures,
1492.62 -> they get information about their property.
1495.29 -> They have documents to sign,
1496.82 -> legal documents that evidence their debt.
1500.75 -> The lender has to get that documentation
1504.35 -> and gather information in order to underwrite the loan
1507.23 -> in order to know that it's a good loan.
1510.8 -> Banks finance these loans
1512.42 -> and put them on their balance sheets.
1514.04 -> They need to manage the risk
1515.54 -> associated with these transactions.
1518.87 -> The regulators, the FHFA, the FFIEC, and others,
1524.24 -> all regulate the institutions that make these loans
1527.54 -> so that consumers are treated fairly.
1530.96 -> And so that the financial system is not as fragile as it was
1536.24 -> prior to 2008 crisis.
1541.1 -> Going on, there's title services
1542.96 -> and settlement service agents
1544.34 -> that support the process of closing loans.
1547.7 -> Servicers, again, manage the loan throughout its lifetime.
1551.69 -> There are insurers in the process
1554.42 -> who help to provide secure funding.
1557.27 -> They create credit enhancements
1559.49 -> and they monitor the properties insurance.
1563.9 -> And finally, there are investors.
1567.11 -> This is one of the most interesting aspects
1569.12 -> of the mortgage lending process
1570.77 -> that most consumers don't know about.
1573.62 -> Lenders don't keep your loan on their books.
1577.88 -> They sell the loan.
1579.17 -> Many of you have heard from your lender,
1581.427 -> "Oh, we've sold your loan to another lender."
1584.3 -> That's because there is an active securities market
1588.02 -> that is driven by the debt in residential real estate
1593.21 -> in this country.
1594.14 -> It's how the industry works.
1595.91 -> All these stakeholders have an interest in the paperwork.
1602.21 -> Why is it still so crazy?
1604.67 -> There are hundreds of documents that support this process.
1608.57 -> At Black Night,
1609.403 -> we support more than 800 different document types.
1612.8 -> And we extract over 600 data points from those documents.
1616.58 -> And by the way,
1617.6 -> the demand for additional data is growing every day.
1622.19 -> As Ari mentioned,
1623.03 -> many of these documents are highly variable.
1625.49 -> We're gonna talk about bank statements in particular,
1627.8 -> and why that is such a,
1629.54 -> to use the great term Ari uses,
1631.94 -> such a gnarly problem.
1634.49 -> Many of these documents are not only highly variable,
1638.15 -> but very few of them are actually standardized.
1641 -> And even the ones that are standardized
1642.95 -> can be produced in different ways.
1645.05 -> That 1003 that was used as an example earlier
1649.1 -> is the standard application provided by
1651.89 -> and mandated by some of the regulatory bodies
1654.8 -> that manage the industry.
1656.54 -> Well, they can be handwritten,
1658.37 -> they can be produced by computer systems,
1660.89 -> the data can vary widely from one system to another.
1666.08 -> Most of the documents that support the process
1668.39 -> are originally in paper form.
1672.8 -> They're analog.
1675.92 -> The industry can drive improvement in this and has been
1679.64 -> for, well over 20 years.
1681.62 -> There have been legal abilities to e-sign documents
1684.65 -> and e-deliver documents.
1685.85 -> However, even today, 24% of consumers
1690.32 -> do not want to receive their documents digitally.
1694.52 -> 24% of the time we are printing them out
1697.22 -> and putting them in the mail.
1699.26 -> When they come back from the consumer,
1700.79 -> then we have to scan them and process them.
1704.42 -> And despite that 23 year old law, UETA,
1708.29 -> only 4% of loans are closed digitally,
1711.53 -> meaning the signing of the mortgage
1713.96 -> and all of the transactions that handle
1716.81 -> the moving of the funds from one party to another,
1720.47 -> only 4% are closed digitally.
1722.33 -> There's a lot of reasons for that.
1724.04 -> There's a whole session we could do about that process.
1728.33 -> So bottom line, it's very expensive.
1731.75 -> Just under $11,000 per loan
1735.92 -> is a statistic that the industry is fighting with today.
1739.4 -> Think about that.
1740.423 -> $11,000 per loan and millions of loans originated.
1747.62 -> So, what is our mission?
1750.02 -> AIVA Document Services has a mission of classifying
1752.42 -> those 800 plus document types.
1755.48 -> Classifying means not only knowing,
1757.91 -> okay, this document is a pay stub,
1760.49 -> but also is the second page also a pay stub.
1765.2 -> If there's an appraisal of a property,
1768.17 -> that appraisal can be five pages long or ten pages long.
1772.34 -> Understanding not only what the page is,
1775.52 -> but also what the document is.
1781.28 -> What was the solution that we arrived at?
1784.25 -> Black Knight has been working with Amazon
1786.32 -> for just about four years.
1788.66 -> Our CTO's right there, and we can ask him exactly how long,
1793.34 -> but we've been working with Amazon for a long period of time
1796.73 -> trying to slay this particular dragon.
1800.15 -> With a combination of AWS infrastructure,
1802.91 -> we're hosted on AWS,
1804.92 -> Amazon Textract,
1806.39 -> which is the subject of much of this conversation,
1809.54 -> and Black Knight's own proprietary AI and ML bot models,
1813.44 -> software and data,
1815.6 -> we've created an environment
1817.07 -> where we are finally making exceptionally good progress
1820.31 -> at nailing down those 800 documents.
1822.95 -> Classifying them correctly the first time,
1825.14 -> and extracting a large body of data
1828.2 -> that can save our customers money.
1831.65 -> We have a humans in the loop process
1834.65 -> because you have to have humans in the loop
1837.08 -> to read the ones you can't read with the computers.
1839.39 -> So if computer vision fails,
1841.58 -> we have to have a human read the document.
1843.77 -> If we don't get a return from Textract
1847.679 -> that gives us the information that we need,
1850.85 -> we have humans to do the reviews.
1852.95 -> So we have exception processes
1854.54 -> and we have processes to detect
1856.79 -> whether our models are starting to drift.
1860.18 -> One of the things we have to do with our AI and ML,
1863.09 -> just like everybody else does, our models can drift.
1866.69 -> We put new models into the wild, they can regress.
1870.35 -> So we watch them like a hawk.
1871.76 -> So we have humans doing essentially three things.
1875 -> On documents that can't be read, they're reading them,
1878.36 -> On documents that may have problems, they're checking them.
1883.28 -> And on documents that are coming through, just to be sure,
1887.99 -> we do sampling to make sure that we're getting it right.
1892.61 -> Our key strategies to solve the problem
1894.86 -> include a classifier, just an open source OCR product.
1900.2 -> Black Knight trained models,
1902.18 -> it's a transformer based natural language processing model
1905.66 -> that we've been using for several years.
1908.24 -> And it gets trained and retrained.
1909.92 -> And by the way, we also,
1911.87 -> for some documents we use Textract.
1916.55 -> We also do extraction using largely
1920.27 -> the Textract knowledge graph.
1922.22 -> The information that comes out of Textract,
1924.53 -> we use all the tools that Ari mentioned,
1928.49 -> and we've pick the best of them,
1930.14 -> but we're still handling 800 plus document types.
1933.77 -> And we have a lot of complexity there
1937.49 -> that we have to solve for our clients
1939.92 -> that just has not been able to be accomplished
1942.56 -> across the industry.
1943.49 -> So we focus hard for our clients
1945.08 -> on getting extraction right.
1947.51 -> So we've got proprietary models,
1949.13 -> we've got the Textract knowledge graph,
1952.25 -> and we have a business rules system to use the data.
1956.3 -> And this is key.
1958.73 -> It's one thing to extract data
1960.5 -> and it's one thing to get back documents.
1963.47 -> You're gonna save time by avoiding keyboarding,
1966.56 -> but what you're going to do the most
1969.08 -> by having this data available,
1971.21 -> is be able to apply business logic to accelerate processing.
1978.38 -> Don't just tell somebody that they don't have to keyboard,
1983.93 -> but bypass steps.
1986.18 -> I have all the W-2s for borrower one.
1990.32 -> Check.
1991.28 -> No human human had to look at it.
1993.83 -> That's because we have a document called a W-2,
1996.47 -> we extracted information from it
1998.06 -> that tells us that that's for borrower one.
2001.15 -> We can then pull all the document data,
2002.95 -> all the data about their income and populate the system
2006.31 -> so that a human doesn't have to touch the analysis necessary
2010.66 -> to do income validation.
2014.41 -> The business rules are the key element that drives value.
2021.76 -> So let's talk about bank statements for a minute.
2024.49 -> Again, big gnarly problem.
2027.34 -> There are 9,600 plus financial institutions in this country
2031.75 -> that are federally insured.
2034.03 -> So banks, credit unions, thrift.
2037.54 -> Each one of those has it's own bank statement.
2041.2 -> Even though there are a number of providers
2043.15 -> that are the same across banks,
2045.97 -> every bank configures them their own way.
2049.6 -> And every consumer's bank statement is different.
2054.58 -> Some consumers may have one checking account.
2058.39 -> Another consumer may have two checking accounts
2061.45 -> and a savings account.
2063.19 -> Another consumer may have a checking account,
2066.1 -> an investment account, an IRA account,
2068.08 -> and a business account.
2070.81 -> And they may all be on the same statement.
2073.81 -> So we have very, very different documents.
2078.97 -> There's also a complexity that may seem surprising,
2082.24 -> but it's not when you think about it.
2085.33 -> Transaction lists are one of the important things
2089.29 -> that we have to look at in the process.
2091.63 -> So we have to look through a list of transactions to decide
2095.68 -> is there a transaction of interest to an underwriter.
2099.61 -> As I review the document,
2100.99 -> I might be looking for large transactions that are unusual.
2105.16 -> So I have to be able to read every row
2109.27 -> in a table of transactions that may span multiple pages.
2114.55 -> And be able to identify the separate tables
2116.98 -> within that document, and extract the data reliably.
2122.08 -> This is where Textract tables comes in.
2125.23 -> However,
2126.25 -> it's all of that complexity
2128.65 -> and the fact that the tables are oriented differently,
2132.67 -> we had to do more to get it to be reliable
2135.19 -> to the point where we could drive those business rules.
2137.74 -> Driving that business value meant
2139.15 -> we had to be able to do things like
2141.28 -> take the beginning balance and the ending balance,
2143.35 -> all the transactions add up to those two numbers.
2147.76 -> We have to be able to reconcile that data
2149.8 -> in order to know we got it right.
2152.71 -> So, what we did is partner with Amazon.
2157.87 -> The Amazon Textract team got together with our team,
2161.44 -> we studied the problem,
2163.63 -> and the Amazon team delivered a model
2168.1 -> and a set of capabilities,
2169.48 -> a set of rules and model adjustments
2173.02 -> that the Black Knight team took delivery of
2175.27 -> early in the year.
2178.24 -> We then started tuning and improving that basic framework.
2184.03 -> And you can see it was a great framework
2185.68 -> because we went from very, very low scores.
2189.55 -> This is, for those of you who are statistics oriented,
2192.041 -> this is f1, a measure of accuracy that we use.
2195.7 -> We had very, very low scores at the beginning of the year.
2197.92 -> It took off very quickly,
2199.45 -> but didn't get where we needed it to for several months.
2202.84 -> But during the time from July until the end of September,
2206.47 -> we were able to achieve a 0.9 F1
2209.98 -> in handling this very complex problem.
2214.03 -> This is a great success because what that means is that
2218.26 -> 90% of the time we can do what I told you earlier.
2223.87 -> We can see everything in the bank statement,
2226.24 -> we can identify transactions of interest
2228.85 -> and we can apply business rules
2230.62 -> to help solve our customer's business problem.
2235.84 -> So, what is AIVA's value proposition for our customers?
2240.94 -> What's the ROI to our customers?
2243.52 -> Well, we are able to process now
2245.2 -> about 14 million pages a month.
2247.84 -> We are extracting over 7 million data points every month.
2254.29 -> And our clients are able to generate significant savings
2258.91 -> and the savings vary from customer to customer
2261.43 -> depending on what kinds of loans they make,
2263.8 -> but we're seeing seconds per page.
2265.81 -> Remember 14 million pages.
2268.33 -> That's a lot of time.
2270.04 -> Seconds to minutes for each data point.
2272.89 -> And remember, the minutes can turn into hours
2275.59 -> if we can take it a number of data points
2278.14 -> and use them to drive real business process improvement.
2282.64 -> And that's the phase that we're entering now at Black Knight
2286.18 -> to serve our customers in partnership with Amazon.
2292.69 -> Adoption is accelerating.
2295.21 -> We have a lineup of customers
2297.31 -> wanting to adopt this technology.
2299.44 -> And we are seeing real value among our customers
2304.36 -> and other industries as well.
2306.88 -> So it's a very exciting place to be right now.
2312.22 -> And back to Ari.
2316.72 -> Thank you.
2318.25 -> - Thanks, Frank.
2319.15 -> So,
2322.06 -> we understand though that getting through this journey
2324.85 -> can be complicated,
2325.81 -> especially given all the systems customers have to build.
2329.05 -> And so for that,
2329.92 -> there are numerous AWS partners that we work with closely
2334.66 -> to help customers solve
2338.17 -> for their toughest business problems here.
2342.07 -> There's also a nifty little matrix
2345.82 -> to help some of you who are going to embark on this journey,
2349 -> decide how do you want make some of those tradeoffs
2351.67 -> in terms of where you want to invest in-house
2354.28 -> versus where you may want to work with a partner
2358.69 -> to help you build those solutions.
2364.39 -> I do wanna wrap up here with a few links
2367.06 -> and then open up sufficient time for us
2368.83 -> to chat and answer any questions that you may have.
2373 -> But there are a number of different sources of documents,
2379.9 -> Blog posts,
2382.15 -> and source code through numerous samples
2384.49 -> that we've put upon GitHub
2386.74 -> for developers to get their hands dirty with.
2389.47 -> To play with the product, play with the capabilities,
2391.9 -> as well as several prototype solutions
2396.28 -> for you to see how that might make sense for you to do.
2399.64 -> And I would encourage you,
2403.72 -> the builders here,
2404.86 -> to give that a shot, should you choose to do so.
2410.17 -> With that,
2411.003 -> we'd like to wrap up kind of the formal presentation here.
2413.89 -> I wanna of course deeply thank Frank
2418.03 -> and the Black Knight team for the partnership.
2421.9 -> We know that we learn deeply from them
2425.08 -> and that pushes us to build better capabilities
2428.59 -> and together we are able to hopefully solve problems
2431.86 -> for as many different kinds of customers
2434.08 -> as we can as possible.
2436.72 -> And of course, you know,
2437.553 -> with Black Knight leading the charge,
2438.97 -> and how they're able to deliver these cutting edge solutions
2441.67 -> for customers in this space,
2442.79 -> we're just delighted that we can be part of that journey
2445.81 -> with them.
2446.643 -> So with that, we have a lot of time I think
2451.06 -> for us to talk about questions if we have any.

Source: https://www.youtube.com/watch?v=OEJ24XGMbEI