Classification of documents

It is a service that automatically categorizes a document against a set of given categories. For example, the service is able to understand that an article about the approval of the finance law should be categorized as “political”. The classification (or categorization) of documents is mainly based on statistical techniques (machine learning) but greatly benefits from the layer of grammatical analysis and named entity extraction set up by Ho2S.

The automatic classification of documents or automatic categorization of documents represents a fundamental shift for the entire editorial management of content, whether it be news, technical articles or blogs. Apart from online content, it is essential in all “knowledge intensive” activities, where the amount of textual material produced requires an automatic archiving. Such is the case, for example, of tickets in the field of CRM, verbatim in marketing analysis, CVs in the recruitment process, metadata in digital libraries, etc.

Although the service currently allows a high level of parameterization by making it possible to indicate the training corpus, Ho2S can make a dedicated service by configuring the learning parameters to suit your requirements. In addition, our engineers can integrate semantic resources such as thesauri and ontologies in order to make the system even more efficient.