所以,我开始学习GATE。我的问题是关于如何计算我的标记引擎的性能(基于java)。
使用UIMA,我通常会将所有系统注释转储到xmi文件中,然后使用Java代码将其与人类注释(金标准)注释进行比较,以计算精确度/召回率和F值。
但是,我仍在努力寻找与GATE类似的东西。 在浏览了该页面上的Gate Annotation-Diff和其他信息后,我觉得必须有一种简单的方法在JAVA中完成。但是,我无法弄清楚如何使用JAVA来做到这一点。想把这个问题放在这里,有人可能已经想出了这个问题。
如果您需要更具体或详细信息,请与我们联系。
答案 0 :(得分:0)
此代码似乎有助于将注释写入xml文件。 http://gate.ac.uk/wiki/code-repository/src/sheffield/examples/BatchProcessApp.java
String docXMLString = null;
// if we want to just write out specific annotation types, we must
// extract the annotations into a Set
if(annotTypesToWrite != null) {
// Create a temporary Set to hold the annotations we wish to write out
Set annotationsToWrite = new HashSet();
// we only extract annotations from the default (unnamed) AnnotationSet
// in this example
AnnotationSet defaultAnnots = doc.getAnnotations();
Iterator annotTypesIt = annotTypesToWrite.iterator();
while(annotTypesIt.hasNext()) {
// extract all the annotations of each requested type and add them to
// the temporary set
AnnotationSet annotsOfThisType =
defaultAnnots.get((String)annotTypesIt.next());
if(annotsOfThisType != null) {
annotationsToWrite.addAll(annotsOfThisType);
}
}
// create the XML string using these annotations
docXMLString = doc.toXml(annotationsToWrite);
}
// otherwise, just write out the whole document as GateXML
else {
docXMLString = doc.toXml();
}
// Release the document, as it is no longer needed
Factory.deleteResource(doc);
// output the XML to <inputFile>.out.xml
String outputFileName = docFile.getName() + ".out.xml";
File outputFile = new File(docFile.getParentFile(), outputFileName);
// Write output files using the same encoding as the original
FileOutputStream fos = new FileOutputStream(outputFile);
BufferedOutputStream bos = new BufferedOutputStream(fos);
OutputStreamWriter out;
if(encoding == null) {
out = new OutputStreamWriter(bos);
}
else {
out = new OutputStreamWriter(bos, encoding);
}
out.write(docXMLString);
out.close();
System.out.println("done");