Open Source NLP Toolkit

Tuesday, March 8th, 2005...4:20 am

Jump to Comments

Mark watson has released KBTextmaster, his Natural Language Processing toolkit, under the GPL licence. This Java library will perform indexing, summarization, part-of-speech tagging (i.e., identifying nouns, verbs, conjunctions etc) and will also extract names of people and places (which is pretty cool). It understand OpenOffice.org, PDF, Text, HTML and Word formats.

Input text: President George Bush met with the president of Mexico. President Bush and President Fox talked about foreign trade issues. They then went shopping and bought silk shirts made in India and radios made in China.

Summary: They{President Bush, President Fox} then went shopping and bought
silk shirts made in India and radios made in China.

KBtextmaster User Guide [pdf]

Leave a Reply