Generalized Semantic Analysis (HOLMES)

All our applications are based on HOLMES, our semantic analysis engine. HOLMES stands for Hybrid Operable platform for Language Management and Extensible Semantics. It is a platform for the analysis (in French, Italian and English) of texts, including tokenization, part-of-speech tagging, named entity recognition and extraction, trainable classifiers, dependency parsing and information extraction. HOLMES is entirely written in Java and maximizes extensibility of processing for use in real-life tasks.

Given its customizability, HOLMES allows us to deliver classical information extraction applications with relatively little effort, and, thus, lowered entry costs for the customer.

  • Uses of HOLMES

    Information extraction is the task consisting in transforming unstructured information contained in texts into structured information to be used by other applications. Nowadays, there are a number of information extraction applications, for instance:

    • In the financial domain, applications that extract from news facts that are relevant for analysts: people’s specific actions, company mergers and acquisitions, specific events able to influence economic trends.
    • In the security domain: detection of dangerous events, detection of weak signals, identification of security breaches, and detection of patent violation.
    • In the scientific domain: detection of the usages of specific technologies, technology watch, automatic identification of experimental patterns from scientific texts.
    • In the personal domain: email analysis, automatic identification of relevant events, such meetings and assigned tasks.

    In general, all applications including some kind of intelligent semantic search make use of information extraction, for instance, in order to identify entities that are pertinent to a given domain, or to characterize the relationships between different entities, event dates, etc.

    Recently, information extraction has found two big fields of application, namely the paradigms of (Open) Linked Data and Big Data. In the former, it is mainly used to automatically link resources and find relations that have not been manually coded. In the latter, information extraction provides central cues for analyzing texts (typically coming from social networks) that, because of their quantity, would be difficult to analyze by human operators.

  • HOLMES in detail

    Technically, HOLMES is a framework for natural language processing based on a radically incremental approach. This means that all information added in the processing chain is always available at higher processing levels. Currently, the main processors available in HOLMES are:

    • Tokenization
    • Sentence detection
    • Part-of-speech tagging
    • Morphological analysis
    • Linear pattern rules over sequences of linguistic objects
    • Dependency parsing
    • Ontology look-up
    • Named entity extraction (using Conditional Random Fields)
    • Automatic classification

    As can be deduced from this list, the Hybrid aspect of HOLMES is represented by a tight integration between symbolic techniques (hand-written semantic rules) and machine learning-based techniques. Basically, results of statistical computation can be accessed at the rule level, and inversely all machine learning-based components can benefit from the results of any kind of symbolic processing in terms of input features.