您好我一直在努力进行信息检索,并且遇到了一些困难。 最近我从以下链接下载了StandAloneAnnie.java
http://gate.ac.uk/wiki/code-repository/src/sheffield/examples/StandAloneAnnie.java 虽然我已经能够执行它并看到输出我有一两个查询。
此程序注释人和位置,其中存储用于注释此类实体的语法。
如何编写自己的简单语法来提取一些数据并在我的StandAloneAnnie.java副本中使用它?
以前的帖子 Hundreds of RegEx on one string New to NLP, Question about annotation
答案 0 :(得分:3)
以下是标记人物身高的简单语法
Phase: Meaurements
Input: Token Number
Options: control=appelt debug=true
Rule: Height
(
({Number})
( {Token.string=~"[Ff]t"} | {Token.string=~"[Ii]n"} | {Token.string=~"[Cc]m"})
):height
-->
:height.Height= {value= :height.Number.value, unit= :height.Token.string}
这是执行的主要代码,
public static void main(String arg[]) {
Gate.init();
gate.Corpus corpus= (Corpus) Factory.createResource("gate.corpora.CorpusImpl");
//You need to register the plugin before you load it.
Gate.getCreoleRegister().registerDirectories(new File(Gate.getPluginsHome(), ANNIEConstants.PLUGIN_DIR).toURI().toURL());
Gate.getCreoleRegister().registerDirectories(new URL("file:///GATE_HOME/plugins/Tagger_Numbers"));//change this path
Document doc = new DocumentImpl();
//The string to be annotated.
String str = "Height is 60 in. Weight is 150 lbs pulse rate 90 Pulse rate 90";
DocumentContentImpl impl = new DocumentContentImpl(str);
doc.setContent(impl);
//Loading processing resources. refer http://gate.ac.uk/gate/doc/plugins.html for what class the plugin belongs to
ProcessingResource token = (ProcessingResource) Factory.createResource("gate.creole.tokeniser.DefaultTokeniser", Factory.newFeatureMap());
ProcessingResource sspliter = (ProcessingResource) Factory.createResource("gate.creole.splitter.SentenceSplitter", Factory.newFeatureMap());
ProcessingResource number = (ProcessingResource) Factory.createResource("gate.creole.numbers.NumbersTagger", Factory.newFeatureMap());
/*pipeline is an application that needs to be created to use resources loaded above.
Reasources must be added in a particular order eg. below the 'number' resource requires the document to be tokenised. */
corpus.add(doc);
SerialAnalyserController pipeline = (SerialAnalyserController) Factory.createResource("gate.creole.SerialAnalyserController", Factory.newFeatureMap(), Factory.newFeatureMap(), "ANNIE");
pipeline.setCorpus(corpus);
pipeline.add(token);
pipeline.add(sspliter);
pipeline.add(number);
pipeline.execute();
//Extract info from an annotated document.
AnnotationSetImpl ann=(AnnotationSetImpl)doc.getAnnotations();
Iterator<Annotation>i = ann.get(vital).iterator();
Annotation annotation = i.next();
long start = annotation.getStartNode().getOffset();
long end = annotation.getEndNode().getOffset();
System.out.println(doc.toString().substring((int)start, (int)end));
}
注意: - 在上面的代码中,Height的语法将写在.jape文件中。您需要使用JAPE(JAPE Plus)传感器运行此语法。我们只需要在主代码中执行应用程序('pipeline')。你可以在gate.ac.uk/sale/tao找到写jape的教程
答案 1 :(得分:0)
有一个Introduction to Annie powerpoint解释了如何存储语法。它们位于Jape个文件中。