Clustering

In the tasks of information management, we sometimes need to group documents with little to know what are the criteria for classification. The clustering web service solves this problem by automatically analyzing a set of documents and providing natural groupings, ie based on the actual content of the texts. For example, given the 5 following “documents”:

  1. Problem on the telephone line.
  2. The wifi does not work properly.
  3. I do not hear any noise when I lift the phone.
  4. Internet on my laptop is not working.
  5. I can not watch television.

The clustering algorithm will propose three groups, ie Phone = [1,3], Wifi = [2,4], TV = [5]

Clustering can handle all cases involving a set of documents that are not classifiable using a predefined pattern, or for which the pattern varies continuously. For example, it can be used to group the comments made by users, customer mails, or all types of documents for which a specific classification system was not determined yet.

Therefore, clustering can also be used to ease the design of a classification system (tree structure) from a set of pre-existing documents.

Finally, it should be mentioned that even in systems that are based on a predefined category tree, clustering can be useful for detecting trends that have not been foreseen. For example, in a call center of a public administration, it may be some days that a significant group of calls relate to the research tax credit, even if such a category is not provided in the tree structure of the call center.

Ho2S is specialized on the clustering of textual type: the characteristics of a group of documents are therefore based on linguistic and semantic analysis of text. Such characteristics are then analyzed by an algorithm which is inspired by Topic Models.

Ho2S can provide dedicated access to an instance of the clustering service set according to your requirements in terms of business areas and applications.