Lemmatization

Lemmatization is the process by which single words are reconducted to their citational form. For instance the word “animals” is converted into its standard form “animal”.

The lemmatization service is also able to identify the morphological features of words : in the preceding example, the plural form of the word “animals”. Optionnally, the service can also disambiguate words. For example, in the following sentences: “Tomorrow I will release a new version” and “Tomorrow a new release will be rolled out”, the service is aware that, in the first case, “release” is a verb, whereas in the second case, “release” is a noun.

Lemmatization and/or morphological analysis are the foundation of all the processes involved in language normalization. One of the basic applications is related to search engines such as Apache Lucene, in which it enables more relevant searches. It is also essential for all processes of business terminologies extraction, semantic analysis, machine learning, etc.

Ho2S can provide a dedicated access to an instance of the lemmatization service, adjusted according to your own needs in terms of business domains and custom applications.