|
||||||||||||||
|
||||||||||||||
![]() | |||||||
|
|
Modern society has done in the most part a step from paper to electronic documents. This provides a number of advantages. It is much easer to exchange, copy, edit and save documents today. Now it is time to make the next step - extract important data that the documents contain so that analytical applications can make use of this data. Examples:
The process of manual data extraction is slow, error-prone and labor consuming. This is the main reason why pretty often a company leaves important data dying in the documents even when the use of it could make a huge difference to the business. We offer a non expensive solution to the problem taking the burden of data extraction from you. You only need to tell us which data points you want to be extracted from every type of documents and then just send the documents to us with regular email or upload them using our Web Insert page and in several hours we will return a structure containing all the extractions. Together with the extractions we save links to the original document so that a client can always verify the correctness of extractions by checking out the part of the document surrounding it.
Pricing
Technology
In short DEP is a collection of unique Text Mining and Document Processing solutions supported furthermore by a knowledge base in form of ontology of models. The models reflect semantic and formatting dependencies between elements of documents. Second important element of our service is the Data QA team. We hire well educated experts in every application area who do final validation and cleanup of the extraction results after automatic extraction.
Workflow
Then you would only need to upload the source documents onto an FTP site of your choice. We will take the source document, process it and put the result to the same FTP site. Or even easier - email a source document to us and we will send you back the result of data extraction. Result can be in one of the standard forms (xml, excel, html). See samples of the results below:
Data Extraction Requirements
After you've sent the chart to our representative we will try to create an extraction template which is actually a tree structure serving as a placeholder for extracted data.
Then we make trial extractions and send them back to the client for fixes and approval. After several such iterations the requirements get approved and we start creating term models for automatic extractions and production processing.
Output Formats One of the important features of our Data Extraction technology is ability to link particular extraction result to corresponding location in the initial document. It allows us to show the user not just an extraction result but the place in the document containing the data point. Thus the user doesn't have to trust our extraction results blindly but can rather check every data point just clicking on it in one of the visual formats. We offer 2 visual forms that can be returned to the client together with the XML structure. First is PDF similar to the one shown below. Left panel contains the resulting data tree whereas the right one represents initial document. User can just click on any data value in the left pain and the right pain will scroll to the extraction point in the document. Plus we would highlight the data location in the right pain for better visual recognition. ![]() Second visual form supporting similar behavior is HTML. We developed it for the users who don't have Adobe Acrobat installed on their workstations. It looks like the following : ![]() Here again user can click on a data value in the leftmost pane and the middle and right panes will scroll to the corresponding data locations and highlight the data in the document text. We are not limited with the listed output formats and can add any additional one on a client's request. How to make first step Just tell our sales representative about your needs: sales@ev-soft.com |

|
|