我正在查看uimaFIT,我在向Dictionary Annotator添加analyse engine时遇到了一些困难。
到目前为止,这是我最好的关闭:
public class LocationAnnotator extends JCasAnnotator_ImplBase {
public static final String RES_DICTIONARY = "dictionary";
@ExternalResource(key = RES_DICTIONARY)
private DataResource resource;
private Dictionary dictionary;
@Override
public void initialize(UimaContext context) throws ResourceInitializationException {
super.initialize(context);
try {
DictionaryBuilder dictBuilder = new HashMapDictionaryBuilder();
// create dictionary file parser
DictionaryFileParserImpl fileParser = new DictionaryFileParserImpl();
fileParser.parseDictionaryFile(resource.getUri().getPath(), resource.getInputStream(), dictBuilder);
dictionary = dictBuilder.getDictionary();
} catch (IOException e) {
throw new ResourceInitializationException();
}
}
@Override
public void process(JCas cas) throws AnalysisEngineProcessException {
String docText = cas.getDocumentText();
for (String line : docText.split("\n")) {
for (String word : line.split(" ")) {
if (dictionary.contains(word)) {
int pos = docText.indexOf(word);
Location annotation = new Location(cas, pos, pos + word.length());
annotation.addToIndexes();
}
}
}
}
}
我正在执行这样的引擎:
CollectionReaderDescription reader = CollectionReaderFactory.createReaderDescription(CvReader.class, CvReader.PARAM_INPUT_FILE, "docs/simple-doc.txt");
AnalysisEngineDescription tokenizer = AnalysisEngineFactory.createEngineDescription(LocationAnnotator.class);
ExternalResourceFactory.bindResource(tokenizer, LocationAnnotator.RES_DICTIONARY, "META-INF/dictionaries/location.dict.xml");
for (JCas cas : SimplePipeline.iteratePipeline(reader, tokenizer)) {
for (Location location : JCasUtil.select(cas, Location.class)) {
System.out.println("Found location: " + location.getCoveredText());
}
}
没有更优雅的方式吗?不喜欢初始化。期望使用注释@ExternalResource
初始化字典。
如果有人能给我提供一个更简单的例子,我会感到害怕..谢谢!