Frequently asked questions
- What is REMBRANDT?
- What is SASKIA and RENOIR?
- Why these names, REMBRANDT, SASKIA and RENOIR?
- Can I run REMBRANDT on my computer?
- Can I change REMBRANDT source code for my own use?
- What kinds of entities does REMBRANDT annotate?
- What languages is REMBRANDT prepared for?
- How does REMBRANDT work?
- Is REMBRANDT good? Can it tag EVERYTHING?
- REMBRANDT does not tag entities as I was expecting!
- How do I cite REMBRANDT?
What is REMBRANDT?
REMBRANDT is a named entity recognition tool that identifies and classifies all named entities (NE) in the text (that is, entity names such as proper names, places or organizations), and detects the relations among NEs. REMBRANDT is prepared to classify entities that have potentially different meanings, and disambiguates its meaning whenever possible.
REMBRANDT is a software package developed by me, Nuno Cardoso, on behalf of my PhD work.
My PhD work is related to two projects: 1) Linguateca, located in SINTEF, and 2) GReaSE, from the XLDB team, LaSIGE laboratory, of the Department of Informatics of the Faculty of Sciences, University of Lisbon.
What is SASKIA and RENOIR?
RENOIR is an advanced question parser and answering module, aimed to extract intentions and more elaborated meanings from NEs in queries, and reason over them. For instance, RENOIR can understand the question "What is the capital of Portugal?" and, with the help of REMBRANDT and SASKIA, get the correct NE that answers it, "Lisbon".
Why these names, REMBRANDT, SASKIA and RENOIR?
REMBRANDT is an acronym for Reconhecimento de Entidades Mencionadas Baseado em Relações e ANálise Detalhada do Texto (roughly, Named Entity Recognition Based of Relations and Detailed Analysis of Text). A good acronym is the first step for good software.It seems it was also a dutch painter...
SASKIA is an acronym for SPARQL API Service for Knowledge and Information Access. It seems that there is also a person called Saskia, who married the painter Rembrandt. Coincidences...
RENOIR is an acronym for REMBRANDT's Extended NER On Information Retrieval, until I find a better acronym. Coincidentally, it looks like it's also the name of a french painter...
Can I run REMBRANDT on my computer?
Yes. REMBRANDT is freely available to everyone (please, read the disclaimer before using it). REMBRANDT can be downloaded and executed on any machine, as long as it has Java 1.6 installed. You'll need also to download other Java packages that REMBRANDT requires, as well as have access to a database.
As a data source, REMBRANDT needs a local copy of the Wikipedia databases for the language(s) you want to annotate. These databases can be downloaded for free, and the downloads page has kinks to those databases. It's also required a database server, like MySQL, which is freely available. In summary, you can run REMBRANDT on your computer for free.
The installation instructions are detailed on the REMBRANDT's installation page.
Can I change REMBRANDT source code for my own use?
Yes, the source code is included on the software packages, under a GPL license.
What kinds of entities does REMBRANDT annotate?
The semantic classification is made through a generic category, and a specialization in two levels (type and subtype). There is nine main categories:
- PERSON - Includes person names, positions or groups of persons.
- ORGANIZATION - Includes companies, institutions and other administrative entities.
- PLACE - Includes geographic places and virtual places (such as newspapers, TV shows or Internet sites).
- TIME - Includes temporal expressions like time, dates or weekdays.
- VALUE - Includes numeric expressions like quantities and measurements.
- MASTERPIECE - Includes works of art, films, paintings, etc.
- EVENT - Includes past events and relevant happenings.
- THING - Includes entities that refer to objects or object classes.
- ABSTRACTION - Includes abstract concepts such as intellectual movements, research areas, philosophical concepts, etc.
What languages is REMBRANDT prepared for?
REMBRANDT is prepared to use annotation rules for several languages, and tag texts of different languages simultaneously. Nonetheless, the rules for English text are not optimized, and as such the results for English are not famous.
While it is not an urgent matter, I hope to great English grammar rules from scratch on a forthcoming release. Stay tuned to the wish-list page of REMBRANDT, to know when it's planned to be addressed. Nonetheless, you can still tag English texts although the quality of the results are not comparable to Portuguese annotation results.
How does REMBRANDT work?
REMBRANDT implements two main strategies on named entity recognition: i) it uses grammar rules for each language, namely on the detection of internal and external evidence, like the presence of "Mr." preceding a person's name. 2) it extracts information from Wikipedia, to obtain knowledge and know the different meanings associated to each name.
Please see the published papers and presentation for more information about REMBRANDT.
Is REMBRANDT good? Can it tag EVERYTHING?
REMBRANDT is not an oracle, and it fails like any tool made by a human. REMBRANDT participated on the Second HAREM, a specific evaluation contest for named entity recognition systems for Portuguese, organized by Linguateca in April 2008 with its version 0.7, and among 10 systems, it achieved second place on the overall NER task, with a F-measure value of 0.567. In the scenario with only PLACE entities, it achieved first place, among 8 systems, with a F-measure of 0,625. On the entity relation detection task, REMBRANDT achieved first place among three participating systems.
In summary, o REMBRANDT it's pretty alright, but I wish it could be better. That's why I have a page to collect error reports and tagging problems, so that I can improve REMBRANDT, its grammar rules and include cases that I've overlooked.
REMBRANDT does not tag entities as I was expecting!
This can happen for many reasons, but first, mind the following: REMBRANDT annotates named entities within context, that is, it tries to assign the meaning that the entity has on the sentence, not the most common meaning associated to that entity name.
That is, tagging 'Portugal' always as a country is not the REMBRANDT goal: the name 'Portugal' can have other roles that depend on the context, such as a group of persons (in the case of a sports team), or an organization (in the case of a governmental decision). Isn't it what's happening?
How do I cite REMBRANDT?
Please, cite REMBRANDT with the following reference:
Nuno Cardoso, REMBRANDT - Reconhecimento de Entidades Mencionadas Baseado em Relações e ANálise Detalhada do Texto. In Cristina Mota & Diana Santos (eds.). Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM. Linguateca. 2008. In Portuguese.