The world is building a solid foundation of machine learning (ML) into various sectors, functions, and tasks in our everyday lives. There is a myriad of components that one may delve into when exploring ML but an area that we engage in extensively and is getting increasingly interesting, is deep learning applied to document and data capture or Intelligent Document Processing (IDP).
The ABCs of IDP
Let’s start with the basics: According to Bernard Marr at Forbes, “Deep learning is a subset of machine learning where artificial neural networks, algorithms inspired by the human brain, learn from large amounts of data.” IDP involves the capturing, extracting and processing of data from a variety of document types.
Now what’s so “exciting” about document and data capture? There are two types of documents from which you’d capture data: Structured and Unstructured documents.
- Structured: These are standard forms (ex: government forms) that have specific and unaltered fields and patterns of information.
- Unstructured: These are documents that don’t have any universal pattern. For this article, letters of employment would be good examples. Companies draft them in different formats and with different lengths (and cadence).
And with that established, lets delve into the two stages of how data is captured:
- Classification: Examining and defining a document
- Extraction: Capturing and recording key data from a document that has already been classified
There are two ways in which documents may be categorized: image processing and Natural Language Processing (NLP).
The Art of Image Processing
Here is an instance where machine learning models are trained to look at a document and understand the pattern of what it sees. It isn’t ‘reading’ the document but uses Convolutional Neural Networks (CNNs) to identify patterns within an image, which in turn informs its classification. We found this works well for more structured documents where patterns are identical while the content may be varied. However, where we came into a few issues was the ability to scale the process.
When you train a machine to resolve categorisation, through the image approach, you are identifying objects specifically. However, forms don’t possess visual features for image processing. Every time we had to retrain a new document type, it would need to be trained from scratch with new data. This took days and weeks. In addition, we started to work with unstructured documents and found that we needed a new solution to effectively categorize them faster.
Enter: Natural Language Processing
With NLP, Optical Character Recognition (OCR) and LTSM (long term short memory) networks are used to extract words, run analysis (determining the context of the words), and then classify the documents. Unlike image processing, NLP, along with transformer models considers the relationship between all the words in a sentence and determines the weight of each word to interpret its meaning. This reduces the training time for every new document introduced, making the entire process faster.
Where new document types may take a week to train the machine to classify using image processing, NLP does it in mere seconds.
Setting up for success
There are two things to consider when one starts a project with NLP to make the process easier:
- Determine goals: What information are you capturing and why. And what type of documents will be required to process?
- Organize data collection: Maintain the quality of data by building workflows or processes that support consistent quality of information and documents collected.
The Journey Ahead
When you look at algorithms, sentiment analysis and the capacity of machine learning to evolve its capabilities, the term ‘speed reading’ really is taken to a whole new level. Our story doesn’t end here. It is just the beginning; It’s an evolving journey of discoveries and we look forward to exploring other dimensions of our realizations, including our next chapter exploring extraction, with you very soon!
Explore Kapti, our intelligent document processing software to find out how the power of machine learning and automated document workflows can transform your organization’s loan processing experience.