在GATE源代码中添加自定义jape文件

时间:2013-02-23 14:53:10

标签: bigdata gate named-entity-extraction

有人可以指导我如何使用GATE源代码创建一个自定义 JAPE 文件并配置它。我尝试使用以下代码并在解析语法时遇到异常,例如“错误:”和“设置了grammarURL或binaryGrammarURL参数!”

     try{
             Document doc = new DocumentImpl();
              String str = "This is test.";
              DocumentContentImpl impl = new DocumentContentImpl(str);
              doc.setContent(impl);
          System.setProperty("gate.home", "C:\\Program Files\\GATE_Developer_7.1"); 
          Gate.init();
          gate.Corpus corpus = (Corpus) Factory
            .createResource("gate.corpora.CorpusImpl");
          File gateHome = Gate.getGateHome();
          File pluginsHome = new File(gateHome, "plugins");
          Gate.getCreoleRegister().registerDirectories(new File(pluginsHome, "ANNIE").toURI().toURL());  

          Transducer transducer = new Transducer();
             transducer.setDocument(doc);
transducer.setGrammarURL(new URL("file:///D:/misc_workspace/gate-7.1-build4485-SRC/plugins/ANNIE/resources/NE/SportsCategory.jape"));
transducer.setBinaryGrammarURL(new URL("file:///D:/misc_workspace/gate-7.1-build4485-SRC/plugins/ANNIE/resources/NE/SportsCategory.jape"));

LanguageAnalyser jape = (LanguageAnalyser)Factory.createResource(
                  "gate.creole.Transducer", gate.Utils.featureMap(
                          "grammarURL", "D:/misc_workspace/gate-7.1-build4485-SRC/plugins/ANNIE/resources/NE/SportsCategory.jape",
                          "encoding", "UTF-8"));

3 个答案:

答案 0 :(得分:3)

您需要加载ANNIE插件

Gate.getCreoleRegister().registerDirectories(
  new File(Gate.getPluginsHome(), "ANNIE").toURI().toURL());

然后使用正确的参数

创建gate.creole.Transducer的实例
LanguageAnalyser jape = (LanguageAnalyser)Factory.createResource(
  "gate.creole.Transducer", gate.Utils.featureMap(
      "grammarURL", new URL("file:///D:/path/to/my-grammar.jape"),
      "encoding", "UTF-8")); // ensure this matches the file

但我们通常提倡的方法是在GATE Developer中按照您希望的方式组装和配置整个管道,使用您需要的任何标准组件以及您自己的语法,然后将应用程序状态保存到文件中。然后,您可以使用一行从代码重新加载整个应用程序

CorpusController app = (CorpusController) PersistenceManager.loadObjectFromFile(savedAppFile);

编辑:您添加到问题中的代码有几个基本问​​题。首先,在使用GATE执行任何其他操作之前,您必须先致电Gate.init() - 在创建Document之前必须。其次,您必须never call the constructor of a Resource class directly - 始终使用Factory。同样,您永远不需要直接致电init(),因为这是Factory.createResource的一部分。例如:

// initialise GATE
Gate.setGateHome(new File("C:\\Program Files\\GATE_Developer_7.1"));
Gate.init();

// load ANNIE plugin - you must do this before you can create tokeniser
// or JAPE transducer resources.
Gate.getCreoleRegister().registerDirectories(
   new File(Gate.getPluginsHome(), "ANNIE").toURI().toURL());

// Build the pipeline
SerialAnalyserController pipeline =
  (SerialAnalyserController)Factory.createResource(
     "gate.creole.SerialAnalyserController");
LanguageAnalyser tokeniser = (LanguageAnalyser)Factory.createResource(
     "gate.creole.tokeniser.DefaultTokeniser");
LanguageAnalyser jape = (LanguageAnalyser)Factory.createResource(
  "gate.creole.Transducer", gate.Utils.featureMap(
      "grammarURL", new File("D:\\path\\to\\my-grammar.jape").toURI().toURL(),
      "encoding", "UTF-8")); // ensure this matches the file
pipeline.add(tokeniser);
pipeline.add(jape);

// create document and corpus
Corpus corpus = Factory.newCorpus(null);
Document doc = Factory.newDocument("This is test.");
corpus.add(doc);
pipeline.setCorpus(corpus);

// run it
pipeline.execute();

// extract results
System.out.println("Found annotations of the following types: " +
  doc.getAnnotations().getAllTypes());

如果您还没有我强烈建议您至少完成training course materials模块5,这将显示加载文档并在其上运行处理资源的正确方法。

答案 1 :(得分:1)

谢谢你,伊恩。这些培训课程材料很有帮助。但我的问题不同,我已经解决了。以下代码捕捉是如何在GATE中使用自定义jape文件。现在我的自定义jape文件能够生成新的注释

 System.setProperty("gate.home", "C:\\Program Files\\GATE_Developer_7.1"); 
  Gate.init();

  ProcessingResource token = (ProcessingResource)   Factory.createResource("gate.creole.tokeniser.DefaultTokeniser",Factory.newFeatureMap());



 String str = "This is a test. Myself Abhijit Nag sport";
   Document doc = Factory.newDocument(str);


  gate.Corpus corpus = (Corpus) Factory.createResource("gate.corpora.CorpusImpl");
  corpus.add(doc);
  File gateHome = Gate.getGateHome();
  File pluginsHome = new File(gateHome, "plugins");

  Gate.getCreoleRegister().registerDirectories(new File(pluginsHome, "ANNIE").toURI().toURL());  


 LanguageAnalyser jape = (LanguageAnalyser)Factory.createResource(
              "gate.creole.Transducer", gate.Utils.featureMap(
                      "grammarURL", "file:///D:/misc_workspace/gate-7.1-build4485-SRC/plugins/ANNIE/resources/NE/SportsCategory.jape","encoding", "UTF-8"));
      jape.setCorpus(corpus);
      jape.setDocument(doc);
      jape.execute();

  pipeline = (SerialAnalyserController) Factory.createResource("gate.creole.SerialAnalyserController",
                Factory.newFeatureMap(), Factory.newFeatureMap(),"ANNIE");
              initAnnie();
              pipeline.setCorpus(corpus);
              pipeline.add(token);
              pipeline.add((ProcessingResource)jape.init());
              pipeline.execute();
      AnnotationSetImpl ann = (AnnotationSetImpl) doc.getAnnotations();
      System.out.println(" ...Total annotation "+ann.getAllTypes());

答案 2 :(得分:0)

如果您想要更新ANNIE管道,这是另一种选择。

  1. 首先获取管道中默认/现有处理资源的列表
  2. 创建JAPE规则的实例
  3. 迭代现有处理资源列表,将每个处理资源添加到新集合中。将您自己的自定义JAPE规则添加到此集合中。
  4. 当您执行ANNIE管道时,将自动获取JAPE规则,因此无需指定文档路径或单独执行。
  5. 示例代码:

    File pluginsHome = Gate.getPluginsHome();
    File anniePlugin = new File(pluginsHome, "ANNIE");
    File annieGapp = new File(anniePlugin, "ANNIE_with_defaults.gapp");
    annieController = (CorpusController) PersistenceManager.loadObjectFromFile(annieGapp);
    
    LanguageAnalyser jape = (LanguageAnalyser)Factory.createResource(
                    "gate.creole.Transducer", gate.Utils.featureMap(
                            "grammarURL", new URL("file:///C://Program Files//gate-7.1//plugins//ANNIE//resources//NE//opensource.jape"),
                            "encoding", "UTF-8")); 
    
    Collection<ProcessingResource> newPRS = new ArrayList<ProcessingResource>();
    Collection<ProcessingResource> prs = annieController.getPRs();
    for(ProcessingResource resource: prs){
        newPRS.add(resource);
    }
    newPRS.add((ProcessingResource)jape.init());
    annieController.setPRs(newPRS);