Blog
Stay tuned on the most recent posts, not only regarding REMBRANDT, but also regarding the named entity recognition task. Subscribe the REMBRANDT RSS feed for the latest updates.
Posted: Saturday, 3 July, 2010, 05:00 PM
REMBRANDT 1.1 released, and refreshed REMBRANDT website
REMBRANDT has been updated to version 1.1, and the project is now hosted on Google Code, in http://rembrandt.googlecode.com. there you can download tar.gz packages, browse the source code and send issues. Also, the Rembrandt website was completely revamped, to allow better interaction in the futures for the web-services for tagging, searching and collection management that I am currently developing.
Posted: Wednesday, 2 December, 2009, 10:00 AM
This is becoming better and better, next release will probably be the 1.0 final, as I'm not planning on adding more features, just solving some minor issues. I'm tagging a 300K English collection, and solved many DB sync issues, workers can now tag pools of 10K documents without chrashing. When the DB I/O is full, they can wait and retry until it manages to distribute more untagged documents to them. Ah, and GeoPlanet WOEIDs are now being used to ground geographic entities, and I'm now generating document geographic signatures.
Here is the list of changes
For the 1.0 final version, I plan to finish the time signatures and just solve any minor issues while tagging the collection, and work on the manual.
Posted: Tuesday, 24 November, 2009, 10:00 AM
After long weeks in a 'leve-no-class-untouched' revision approach, I proudly present the first beta version of REMBRANDT 1.0. It features many exciting features, but the major one is that it can be as 10x faster than the 0.8 version!
The main changes include:- REMBRANDT Cores and patterns are all now in UTF-8
- Indexes in documents and NE lists, to speed up rule match
- Pre-optimization of rule/sentence pairs for first-clauses of rules
- Pre-compilation of patterns, gazetteers is now static and final
- Detector subclasses with pre-determined actions
- Better DB sync
- Separation from HAREM classification and internal REMBRANDT classification
- Gazetteer and pattern re-organiztion
- Memory usage output
- Rewritten Detector and MatcherObject, now they can call actions from rules, and allow actions to more than one NE at the same time
- Wikipedia and DBpedia references are now linked to NE classifications, not just the NE
- External rules can now disambiguate and filter Wikipedia and DBpedia grounding references
- NE splitting tests are now in their own Rule class, after external evidence detection
- Several improvements on Reader and Writer classes, now REMBRANDT can read already tagged documents, with several document/ tag styles
- A more simple and better NE classification comparison engine
- Entity detection rewritten, much faster now
- NE history tracking rewritten, now it's printed for log.trace in NamedEntity logger
- Courthouse now gives verdicts (not actions), ListOfNE executes verdicts in a smarter way.
- Term count reviewed, better support for hidden terms
- Wikipedia Category mining now uses only plural evidence
- Laws revised, to solve some precedence problems
Posted: Tuesday, 23 June, 2009, 06:00 PM
New formatted text, with balloon tooltips.
Now, the annotated texts can be displayed with boxes of several colors, one for each NE category. If you click on the NE, it displays a balloon tooltip with additional information about the NE.
The REMBRANDT web service has a patched version of 0.8.6, which prevented the access to DBpedia. In other words, only NOW the service is using DBpedia.Have fun creating and killing balloon tooltips!
Posted: Monday, 22 June, 2009, 10:00 AM
REMBRANDT 0.8.6 is now available to download. Hopefully, it does not have serious bugs (remember, it's still a 0.X version), and solved some of the issues with the language, and on DB connections, which sometimes killed the web service.
Note that it includes now the Saskia and Renoir packages, but it now depends on other jars which can be found on the HP's Jena/ARQ package, a SPARQL storage/query interface, required for DBpedia queries.
I'll be working now on the 0.8.7 version. Feedback is always welcome.
Meanwhile, there's now a Twitter account for REMBRANDT, to speed up update info and get your complaints/suggestions.
Posted: Monday, 15 June, 2009, 10:00 AM
REMBRANDT 0.8.6 nearly out, uses DBpedia now.
I'll be releasing REMBRANDT 0.8.6 soon, as a download package and as a web service.
It'll use DBpedia's ontology classifications before Wikipedia, which improves dramatically the precision of the results in English texts. It's a really nice improvement over the 0.8.5, but it'll be an unstable release. I'll announce it soon, have a few bugs to kill.
Posted: Tuesday, 19 May, 2009, 03:11 PM
REMBRANDT now tags in English, and the website is now also in English.
REMBRANDT's site is now in English, and the REMBRANDT service can now tag English texts, using the English Wikipedia. There are now two RSS feeds: Portuguese news and English news.
Note that REMBRANDT's English grammar rules are, at the moment, mere transpations from the Portuguese rules, and as such the REMBRANDT's performance over English text is not so good. The size of the English Wikipedia (more than 5 times the size of the Portuguese Wikipedia) may lead to longer waits while tagging.
Nonetheless, I prefer to make REMBRANDT available right now for English texts, so that I can check the service stability over a much bigger Wikipedia database, and to get early feedback from eager users.
Posted: Tuesday, 19 May, 2009, 11:34 AM
REMBRANDT service was down last weekend, due to a problem caused by PHP versions, which is now solved. One more bug down towards a more stable service. :)
Meanwhile, I'll let the search service on for tests, even though Yahoo! only indexed two pages at the moment. So don't expect good search results yet.
Posted: Sunday, 10 May, 2009, 02:39 PM
English version of REMBRANDT website is ready.
The REMBRANDT website is now translated to English, and the service is ready to annotate English text.