基于uimaFIT代码的字典示例

时间:2014-09-21 09:50:39

标签: java uima

我正在查看uimaFIT,我在向Dictionary Annotator添加analyse engine时遇到了一些困难。

到目前为止,这是我最好的关闭:

public class LocationAnnotator extends JCasAnnotator_ImplBase {

    public static final String RES_DICTIONARY = "dictionary";

    @ExternalResource(key = RES_DICTIONARY)
    private DataResource resource;
    private Dictionary dictionary;

    @Override
    public void initialize(UimaContext context) throws ResourceInitializationException {
        super.initialize(context);
        try {
            DictionaryBuilder dictBuilder = new HashMapDictionaryBuilder();
            // create dictionary file parser
            DictionaryFileParserImpl fileParser = new DictionaryFileParserImpl();
            fileParser.parseDictionaryFile(resource.getUri().getPath(), resource.getInputStream(), dictBuilder);
            dictionary = dictBuilder.getDictionary();
        } catch (IOException e) {
            throw new ResourceInitializationException();
        }
    }

    @Override
    public void process(JCas cas) throws AnalysisEngineProcessException {
        String docText = cas.getDocumentText();
        for (String line : docText.split("\n")) {
            for (String word : line.split(" ")) {
                if (dictionary.contains(word)) {
                    int pos = docText.indexOf(word);
                    Location annotation = new Location(cas, pos, pos + word.length());
                    annotation.addToIndexes();
                }
            }
        }

    }
}

我正在执行这样的引擎:

CollectionReaderDescription reader = CollectionReaderFactory.createReaderDescription(CvReader.class, CvReader.PARAM_INPUT_FILE, "docs/simple-doc.txt");

AnalysisEngineDescription tokenizer = AnalysisEngineFactory.createEngineDescription(LocationAnnotator.class);
ExternalResourceFactory.bindResource(tokenizer, LocationAnnotator.RES_DICTIONARY, "META-INF/dictionaries/location.dict.xml");

for (JCas cas : SimplePipeline.iteratePipeline(reader, tokenizer)) {
    for (Location location : JCasUtil.select(cas, Location.class)) {
        System.out.println("Found location: " + location.getCoveredText());
    }
}

没有更优雅的方式吗?不喜欢初始化。期望使用注释@ExternalResource初始化字典。

如果有人能给我提供一个更简单的例子,我会感到害怕..谢谢!

0 个答案:

没有答案