Anonymization

Documents may contain sensitive information that may reveal the identities of people, places, and companies. Sometimes it is necessary to hide this information in order to allow for the transfer and circulation of such documents. A typical example is that of medical records, in which the identity of the persons involved must be protected.  The same need is found in the legal domain, in the field of finance, and in all cases when document processing is outsourced and the privacy of mentioned parties must be protected.

Our anonymizer (“Privacy Guardian”) was originally developed for use in the medical domain, where the strictest policies for privacy protection must apply.  It was supervised by Lyon Hospital and approved by the French commission for Information Technology and liberties (CNIL). It has since been extended to different domains (financial and legal) and ported to languages other than French (English and Italian).

  • Interactivity

    Besides working as a standard java library, Privacy Guardian comes with an interactive anonymization Graphical User Interface (screenshots coming soon). In a few simple clicks the user is able to validate or modify the choices made by Privacy Guardian. The interactive interface is particularly useful in those cases that are so critical that human supervision is needed anyway.

  • Semantic preservation

    Anonymization is not just deletion of sensible information. For any further processing (whether by humans or information systems) the anonymization process must preserve the meaning of the anonymized information. Together with the anonymized document, PG conserves all metadata concerning the anonymized entity e.g. whether it was a person’s name, a date, a place etc. In special cases, such as dates, it is also able to preserve the temporal sequence of events by starting from a specific moment in time (called “time 0”, the reference time according to which incomplete dates are calculated). Moreover, in specific domains such as the medical domain, it can provide very specialized information, such as whether the anonymized entity was a doctor or a patient, if a specific date is a date of birth or the date of a medical intervention, etc.

  • Configurability

    PG is based on a set of linguistic rules coded by language experts on the basis of available corpora. The results of the systems are, therefore, configurable by our experts to fit specific domains and specific application needs. Furthermore, the customer can customize to a certain extent the results of the anonymization process by adding specific lists of names of people, places etc. She or he can also write special purpose rules to match application dependent identifiers which are not included in the default configuration.

  • Reversibility

    Thanks to an advanced implementation of the concept of stand-off annotation, once anonymized documents come back to the protected environment they can be automatically “de-anonymized”. In other words, the results of the external processing can be integrated into the non-anonymous version of the document.