Linguateca-XLDB in English


Jump to: navigation, search

Esta página pode ser lida em português.


About the XLDB Node of Linguateca

The XLDB Node of Linguateca - Centro de Recursos Distribuído para a Língua Portuguesa run from January 2004 to December 2008 with following objectives:

  • organise, mantain and document text collections in Portuguese, for Information Retrieval (IR) evaluation contests;
  • establish routine measures and tracking the Web related to the Portuguese language;
  • develop programs that integrate activities of the XLDB Team and Linguateca, like adding linguistic features to the tumba! search engine.
  • organize HAREM - the evaluation contest for named entity recognition systems for Portuguese


Co-organization of HAREM:

  • Identification of HAREM requirements (Work done by Cristina Mota, Linguateca node in LabEL, and modifications by Nuno Cardoso, XLDB node of Linguateca)
  • Recognition of geographic named entities, on behalf of the GReaSE project.
  • Generation and public release of geographic ontologies, such as Geo-Net-PT 01 and Geo-Net-PT 02.

Generation and release of crawls of the Portuguese web:

  • WPT 03 - crawl of the Portuguese web (PT top level domain, plus subdomains with Portuguese contents on the .COM, .NET, .ORG and .TV top level domains) - 11GB of metadata and document text, including PDF, DOC and PS files, and excluding images.
  • WPT 05 - To be announced soon.



Linguateca was supported by FCT, project POSC/339/1.3/C/NAC.

Personal tools
Research Lines
Internal Information