Written by Jean-Jacques Bérard, Executive Vice President, Research & Development, Esker
It has already been three years since an Artificial Intelligence (AI) using deep learning capabilities become the world’s best Go player. We were told that the AI revolution was imminent and that AI would take control of our future for better or worse, but what really happened?
Despite the fact that many of the more wild predictions haven’t materialised (cars are still driven by humans, fake news is proliferating without real control and artificial neurons have not yet eradicated pandemics, etc.), we haven’t seen a second AI winter or a new “chasm of disillusion”. The reality is quite different and the potential of AI technological advances is far from exhausted. Major AI advancements in business have been taking place, particularly in regards to order management automation.
An old dream
For more than a decade, Esker engineers have struggled with the algorithmic challenge of the immediate understanding of an order (item, quantity and amount extraction), without a required learning phase or human intervention. The difficulty stems from the immense diversity of order templates. There are almost as many as there are customers. Our self-learning systems are able to “understand” an order by observing input from users but this takes time, often several weeks. Wouldn’t operational AI from day one be wonderful?
A great thesis subject
There are four essential ingredients to solve such a problem using deep learning:
- A high computational capacity: Easy to acquire by using specialised graphics cards
- Machine-learning software: Google, Apple, Facebook and Amazon (GAFA) provide high-quality open source software
- Large amounts of correctly labelled data: We’ve built a vast repository in 10 years of production
- A “well-trained” human brain to orchestrate it all
As for the last point, we started our first thesis with the University of Lyon in 2017 in order to benefit from all the academic knowledge accumulated over more than 20 years in deep learning. An ideal business/university partnership!
The natural language path
Observation of orders shows that there are groups of highly synthetic text spread all over the surface of the pages. Often the top contains the customer’s name and the shipping address. The details of the items: identifier, quantity and description are generally found inserted in a header table in the middle of the document. The totals and other taxes are usually located in the bottom third of the page. This is a kind of “business language” between customers and suppliers that is relatively flexible, concise and international.
Research very quickly pointed towards networks used for natural language processing (NLP). These have been strongly publicised through voice assistants or other translators offered by GAFA.
After having carefully selected well-labelled orders, identified the vocabulary and constructed a recurring word classifier system (BiLSTM), we were able to start training our model. A trial and error approach spread over several months was necessary to refine the parameters and data. Our efforts were rewarded with recognition rates of more than 80{8bf2b29f36318f0ac46ab1cc03d7035abce669a1cea16c9ed62389a818fa22fd} on documents never seen by the AI. This “business language” is therefore an understandable language for an AI.
The cherry on the cake was that this advance in recognition matters allowed us to be selected in 2019 at ICDAR, the largest international conference on document research.
To orders & beyond!
This “business language” does not stop with just orders. The same four-ingredient recipe, applied to invoices and expense reports, gives equally spectacular results. Of course, the abundance of well-labelled data is the key!
Finally, so that our users can can understand and trust this technology, we have added an unsupervised anomaly detection module. Information proposed by the AI is assessed statistically, and if it falls outside of the usual values, visual indicators allow manual processing of possible errors.
Rendezvous in three years
There is still a long road ahead to obtain a virtual assistant 100{8bf2b29f36318f0ac46ab1cc03d7035abce669a1cea16c9ed62389a818fa22fd} capable of freeing administrative services from repetitive tasks. By attacking the problems, one by one, using the latest technologies, software is inching closer day by day.
An invisible revolution is underway. It is shifting back-office jobs towards operations requiring thought, business knowledge and the ability to communicate.
Moravec’s paradox says that “what is most difficult in robotics is often what is easiest for humans.” This paradox also applies to back-office work. Difficult does not mean impossible, and we are on our way to proving it.
This article is sponsored by Esker