我正在尝试了解如何使用TreeTagger
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/
http://reckart.github.io/tt4j/
将一些文字分块。
我找不到任何教程。
感谢您的帮助
答案 0 :(得分:0)
使用有用的代码更新了tt4j页面:
import org.annolab.tt4j.*;
import static java.util.Arrays.asList;
public class Example {
public static void main(String[] args) throws Exception {
// Point TT4J to the TreeTagger installation directory. The executable is expected
// in the "bin" subdirectory - in this example at "/opt/treetagger/bin/tree-tagger"
System.setProperty("treetagger.home", "/opt/treetagger");
TreeTaggerWrapper tt = new TreeTaggerWrapper<String>();
try {
tt.setModel("/opt/treetagger/models/english.par:iso8859-1");
tt.setHandler(new TokenHandler<String>() {
public void token(String token, String pos, String lemma) {
System.out.println(token + "\t" + pos + "\t" + lemma);
}
});
tt.process(asList(new String[] { "This", "is", "a", "test", "." }));
}
finally {
tt.destroy();
}
}
}
像这样的pom.xml(Maven)应该足以让它工作:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<dependencies>
<dependency>
<groupId>org.annolab.tt4j</groupId>
<artifactId>org.annolab.tt4j</artifactId>
<version>1.1.0</version>
<type>jar</type>
</dependency>
</dependencies>
<modelVersion>4.0.0</modelVersion>
<groupId>gk2go</groupId>
<artifactId>gk</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>gk</name>
<url>http://maven.apache.org</url>
</project>
上面的所有代码都经过调整,因此未按原样进行测试。