我将GATE API与java代码一起使用,并试图在文档文本上运行一个已知的JAPE规则但不幸的是我无法获得适当的结果。我的代码如下:
public void initAnnie() throws GateException, IOException {
Out.prln("Initialising ANNIE...");
// load the ANNIE application from the saved state in plugins/ANNIE
File pluginsHome = Gate.getPluginsHome();
File anniePlugin = new File(pluginsHome, "ANNIE");
File annieGapp = new File(anniePlugin, "ANNIE_with_defaults.gapp");
annieController = (CorpusController) PersistenceManager
.loadObjectFromFile(annieGapp);
Out.prln("...ANNIE loaded");
} // initAnnie()
/** Tell ANNIE's controller about the corpus you want to run on */
public void setCorpus(Corpus corpus) {
annieController.setCorpus(corpus);
} // setCorpus
/** Run ANNIE */
public void execute() throws GateException {
Out.prln("Running ANNIE...");
annieController.execute();
Out.prln("...ANNIE complete");
} // execute()
/**
* Run from the command-line, with a list of URLs as argument.
* <P>
* <B>NOTE:</B><BR>
* This code will run with all the documents in memory - if you want to
* unload each from memory after use, add code to store the corpus in a
* DataStore.
*/
public static void main(String args[]) throws GateException, IOException {
// initialise the GATE library
Out.prln("Initialising GATE...");
Gate.init();
Out.prln("...GATE initialised");
// load ANNIE plugin - you must do this before you can create tokeniser
// or JAPE transducer resources.
Gate.getCreoleRegister().registerDirectories(
new File(Gate.getPluginsHome(), "ANNIE").toURI().toURL());
// Build the pipeline
SerialAnalyserController pipeline =
(SerialAnalyserController)Factory.createResource(
"gate.creole.SerialAnalyserController");
LanguageAnalyser tokeniser = (LanguageAnalyser)Factory.createResource(
"gate.creole.tokeniser.DefaultTokeniser");
LanguageAnalyser jape = (LanguageAnalyser)Factory.createResource(
"gate.creole.Transducer", gate.Utils.featureMap(
"grammarURL", new
File("C:path/to/univerity_rules.jape").toURI().toURL(),
"encoding", "UTF-8")); // ensure this matches the file
pipeline.add(tokeniser);
pipeline.add(jape);
// create document and corpus
// create a GATE corpus and add a document for each command-line
// argument
Corpus corpus = Factory.newCorpus("JAPE corpus");
URL u = new URL("file:/path/to/Document.txt");
FeatureMap params = Factory.newFeatureMap();
params.put("sourceUrl", u);
params.put("preserveOriginalContent", new Boolean(true));
params.put("collectRepositioningInfo", new Boolean(true));
Out.prln("Creating doc for " + u);
Document doc = (Document)
Factory.createResource("gate.corpora.DocumentImpl", params);
corpus.add(doc);
pipeline.setCorpus(corpus);
// run it
pipeline.execute();
// extract results
System.out.println("Found annotations of the following types: " +
doc.getAnnotations().getAllTypes());
} // main
}
和JAPE规则使用如下:
Phase:firstpass
Input: Lookup Token
//note that we are using Lookup and Token both inside our rules.
Options: control = appelt
Rule: University1
Priority: 20
(
{Token.string == "University"}
{Token.string == "of"}
{Lookup.minorType == city}
):orgName
-->
:orgName.Organisation =
{kind = "university", rule = "University1"}
最后得到的结果如下:
Initialising GATE...
log4j:WARN No appenders could be found for logger (gate.Gate).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
...GATE initialised
Creating doc for file:path/to/Document.txt
Found annotations of the following types: [SpaceToken, Token]
请任何帮助
答案 0 :(得分:2)
问题是您没有尝试在JAPE程序中使用的“查找”注释。
您需要添加2个额外资源:
LanguageAnalyser gazetter = (LanguageAnalyser)Factory.createResource(
"gate.creole.gazetteer.DefaultGazetteer");
LanguageAnalyser splitter = (LanguageAnalyser)Factory.createResource(
"gate.creole.splitter.SentenceSplitter");
您的处理资源应按以下顺序运行:
pipeline.add(tokeniser);
pipeline.add(gazetter);
pipeline.add(splitter);
pipeline.add(jape);
Gazetterr将创建“查找”注释。
句子拆分器将停止创建跨越两个句子的“组织”注释。
经过测试,对我有用。
...GATE initialised
Creating doc for file:/Users/andreyshafirin/tmp/testdoc.txt
Found annotations of the following types: [Lookup, Organisation, Token, Split, SpaceToken, Sentence]
PS:
我认为从Java代码中使用GATE有更好的方法。 您可以在GATE Developer中创建应用程序,对其进行自定义并将其保存到文件(here you will find how)。然后,您可以从Java代码加载GATE应用程序(请参阅此example for you,以及更多other examples here以了解如何)。这样您就不必担心与处理资源属性相关的大量细节和功能(您将在GUI中定义和更改它们)。
祝GATE好运。