The Business Need
A leading U.S based information provider connecting people, projects, and products across the construction industry had a requirement to aggregate construction tender information from government/public sources across the world. The client was a leading provider of market, project and product information, plans, and specifications, data solutions, industry news, trends, and forecasts in the US, serving more than two million customers in the global construction industry through various products and more than 10 regional publications.
They also required our technical expertise to build a robust database to dispense construction project information to customers and to track and aggregate thousands of trusted sources in real time.
Challenges we faced
Data had to be fetched from more than 20K websites which were from different geographies and posed a challenge on the language front. All these sources had to be constantly tracked and data had to be fetched in near real-time.
The information was also seen in varied data formats which called for a sharpened aggregation approach.
Aggregated data from the websites also had the possibility of containing repetitive information creating the need for a tight deduplication process.
How we solved the problem
Mobius built a sophisticated technology framework to harvest data from more than 20,000 websites across the US, Canada, UK, Germany & France. The gathered data was then normalized based on the general and client system-specific conventions, and their process information.
Since a huge number of data sources were involved, a custom technology architecture was developed on our workflow platform - Worxtream that boosted the efficiency to track websites and download construction related documents in the form of PDFs, feeds, and many more.
A robust back-end database was set in place to handle the voluminous scale of data processing and time-zone specific crawl schedules were framed to process data sources from different geolocations that needed unique time windows for data extraction.
Exemplary digitized outputs of consistently high accuracy levels of up to 99.56%, way above the contractual standard of 95% were achieved in a fast and reliable manner. While 39 digitized records per hour were expected on an average, we delivered 47 records pushing the productivity levels by about 21% per hour. Achieving top-notch quality in digitization with an optimally numbered proficient team, earned us accolades from our client.