Thursday 28 May 2015

Financial Document Normalization



Financial Document Normalization:

Normalization of document is the process of converting input document such as Word, Excel, Pdf, PPT etc. into html document and formatting the converted html documents using set of rules so that it could be used for further processing in filing.

In US financial documents which are being used for filing have set of rules for content such as type of the document to be used for filing, paragraphs, financial tables , line , header and footer of document , water marks , images , embedded excel objects , pdf objects  ,ppt objects, drawing objects and other type of objects etc.

For working on filing document using the activities such as tagging, untagging, commenting, setting formulas etc. normalization required.

There are two technical steps involved in normalization.

Step 1:
Read all allowed input financial documents and convert them into html document using Aspose library.
Examples of set of rules to be applied using aspose:
·         Identify and marks embedded excel pdf and other objects in document.
·         Extract text from text boxes and remove textbox from word document.
·         Remove password check from secured document for processing.
·         Remove non required objects etc.

Step 2:
Read converted html document and apply set of rules for formatting using Html Agility Pack library.
Examples of set of rules to be applied using Html Agility Pack:
·         Convert embedded excel into html.
·         Create html header and footer.
·         Decorate borders.
·         Format financial Tables  etc.






1 comment: