Linguateca-XLDB in English
Esta página pode ser lida em português.
About the XLDB Node of Linguateca
The XLDB Node of Linguateca - Centro de Recursos Distribuído para a Língua Portuguesa run from January 2004 to December 2008 with following objectives:
- organise, mantain and document text collections in Portuguese, for Information Retrieval (IR) evaluation contests;
- establish routine measures and tracking the Web related to the Portuguese language;
- develop programs that integrate activities of the XLDB Team and Linguateca, like adding linguistic features to the tumba! search engine.
- organize HAREM - the evaluation contest for named entity recognition systems for Portuguese
Co-organization of HAREM:
- Identification of HAREM requirements (Work done by Cristina Mota, Linguateca node in LabEL, and modifications by Nuno Cardoso, XLDB node of Linguateca)
- Recognition of geographic named entities, on behalf of the GReaSE project.
- Generation and public release of geographic ontologies, such as Geo-Net-PT 01 and Geo-Net-PT 02.
Generation and release of crawls of the Portuguese web:
- WPT 03 - crawl of the Portuguese web (PT top level domain, plus subdomains with Portuguese contents on the .COM, .NET, .ORG and .TV top level domains) - 11GB of metadata and document text, including PDF, DOC and PS files, and excluding images.
- WPT 05 - To be announced soon.
Linguateca was supported by FCT, project POSC/339/1.3/C/NAC.