Companies need not fear their document management any more for Amazon Web Services (AWS) is now here to even the score. Introducing Textract, the cloud-based, fully-managed service that uses machine learning to read text in many of its myriad forms.
It’s said to go beyond the more conventional optical character recognition (OCR) technique and is capable of deconstructing data tables, forms and whole pages of both word and a number sources of information. The tool is also reported to recognize specific content fields in documents such as contracts, expense reports, mortgage guarantees and other types to help companies cut down on the amount of time and effort dedicated to quality assurance.
The Textract API supports the scanning of different kinds of image files – such as a pdf or a photo – which is subsequently stored to Amazon’s S3 bucket. It’s picked up, read and returned to the owner in a JSON text format footnoted with page numbers, sections, form labels and data types provided through an API.
AWS points out that their newly added tool also integrates well with database and analytics services such as Amazon-offered Elasticsearch, DynamoDB, and Athena. For post-processing, customers can send their documents to machine-learning technologies such as Amazon Comprehend, Comprehend Medical, Translate, and Sagemaker. They have also ensured their solution is compatible with third-party cloud platforms for more specialized purposes such as accounting, auditing, and compliance.
“The power of Amazon Textract is that it accurately extracts text and structured data from virtually any document with no machine learning experience required,” said Amazon Machine Learning VP Swami Sivasubramanian. “In addition to the integration with other AWS services, the rich partner community developing around Amazon Textract makes it possible for customers to gain real meaning from their file collections, operate more efficiently, improve security compliance, automate data entry, and facilitate faster business decisions.”
Amazon Web Services has worked hard to become the answer to every business’ ask – from the smallest solopreneur to the largest corporation. Textract represents their latest effort to corner the market on document management in a new way. With the advent of machine learning, the web giant is able to offer a means to automate document workflow without the need for a human developer to hard-code configurations for each unique record type.
PricewaterhouseCoopers (PwC) is a current customer with AWS and has adopted the new text-reading tool with happy results. “At PwC, we work to provide our customers with intelligent automation tools that help transform previously manual processes,” Siddhartha Bhattacharya of PwC said. “We’ve integrated Amazon Textract into our solution for the pharmaceutical industry to automate document processing for various FDA forms like MedWatch and CIOMS.”
“Textract has proven to be the most efficient and accurate OCR solution available for these forms, extracting all of the relevant information for review and processing, and reducing time spent from hours to down to minutes.”