The Business Need
A renowned software service provider to the petroleum supply and trading sector based out of Canada had a pressing requirement to extract and aggregate about 160 shipping data points from a daily inflow of up to 300 cargo inspection reports involving more than 700 file activities.
The data thus gathered from these daily reports would enable the company in offering data-driven products and platforms to shipping companies and oil refineries for effective business operations and risk mitigation.
Challenges we faced
The sheer volume and scale at which the data extraction had to be carried out combined with the unstructured format of the reports posed a huge challenge to the customer.
Close to 160 data points had to be extracted on a daily basis from the input inspection reports which were of different formats like XML, email, and pdf. Mobius tackled the demanding situation by developing a proprietary extraction tool customized to understand the customer’s input files.
How we solved the problem
A completely automated solution was provided by Mobius to achieve timely extraction of all the attributes on a day-to-day basis.
The smart extraction tool had 2 major components - an OCR component that converted the input files in pdf into machine-readable documents and a data extraction component that extracted the expected attributes including cargo owner information, inspection company details, cargo transfer date, vessel name, port location, type of cargo handled, report generation date from the document.
To boost the overall productivity of our tool, the data extraction component was fortified with machine learning that simplified the process of spotting the attributes to be extracted in the reports.
The well-trained machine learning model classified the OCR-ed documents and captured data intelligently from the inspection reports. The extracted data points were then pushed in JSON format to the client’s API.
The automated approach taken by Mobius to handle the large-scale extraction of unstructured data ensured that all the necessary details were extracted in a timely manner with a consistent output quality of 98%. Our advance machine learning technique ensured that the turnaround time was cut by more than 60% compared to the rule-based approach and assured an activity coverage of 97%.