使用UIMA提取CD标记的标记

时间:2014-06-24 20:44:30

标签: uima

我写了一个注释器,它提取所有CD标记的标记,代码看起来像这样

public class WeightAnnotator extends JCasAnnotator_ImplBase {

private Logger logger = Logger.getLogger(getClass().getName());
public static List<Token> weightTokens = new ArrayList<Token>();

public static final String PARAM_STRING = "stringParam";
@ConfigurationParameter(name = PARAM_STRING)
private String stringParam;

@Override
public void initialize(UimaContext context) throws ResourceInitializationException {
    super.initialize(context);
}


@Override
public void process(JCas jCas) throws AnalysisEngineProcessException {
    logger.info("Starting processing.");

    for (Sentence sentence : JCasUtil.select(jCas, Sentence.class)) {

    List<Token> tokens = JCasUtil.selectCovered(jCas, Token.class, sentence);
        for (Token token : tokens) {
             int begin = token.getBegin();
             int end = token.getEnd();
            if (token.getPos().equals( PARAM_STRING)) {
                WeightAnnotation ann = new WeightAnnotation(jCas, begin, end);
                ann.addToIndexes();
                System.out.println("Token: " + token.getCoveredText());
            }
        }
    }
}

}

但是当我尝试在管道中创建迭代器时,迭代器返回null。这是我的管道的样子。

            AnalysisEngineDescription weightDescriptor = AnalysisEngineFactory.createEngineDescription(WeightAnnotator.class,
            WeightAnnotator.PARAM_STRING, "CD"
            );

    AggregateBuilder builder = new AggregateBuilder();
    builder.add(sentenceDetectorDescription);
    builder.add(tokenDescriptor);
    builder.add(posDescriptor);
    builder.add(weightDescriptor);
    builder.add(writer);

    for (JCas jcas : SimplePipeline.iteratePipeline(reader, builder.createAggregateDescription())) {

       Iterator iterator1 = JCasUtil.iterator(jcas, WeightAnnotation.class);

        while (iterator1.hasNext()) {
            WeightAnnotation weights = (WeightAnnotation) iterator1.next();
            logger.info("Token: " + weights.getCoveredText());
        }
    }

我使用JCasGen生成了WeightAnnotator和WeightAnnotator_Type。我调试了整个代码,但我不明白我在哪里弄错了。关于如何改进这一点的任何想法都值得赞赏。

0 个答案:

没有答案