我正在尝试使用open-nlp Ruby gem通过RJB(Ruby Java Bridge)访问Java OpenNLP处理器。我不是Java程序员,所以我不知道如何解决这个问题。任何有关解决它,调试它,收集更多信息等的建议都将不胜感激。
环境是Windows 8,Ruby 1.9.3p448,Rails 4.0.0,JDK 1.7.0-40 x586。宝石是rjb 1.4.8和louismullie / open-nlp 0.1.4。为了记录,这个文件在JRuby中运行,但我在该环境中遇到了其他问题,并且现在更喜欢保留本机Ruby。
简而言之,open-nlp gem失败,缺少java.lang.NullPointerException和Ruby错误方法。我不知道为什么会这样,因为我不知道,但在我看来,无法访问Jars文件opennlp.tools.postag.POSTaggerME@1b5080a的动态加载,可能是因为OpenNLP :: Bindings: :Utils.tagWithArrayList未正确设置。 OpenNLP :: Bindings是Ruby。 Utils及其方法是Java。据称,Utils是"默认"罐子和类文件,这可能很重要。
我做错了什么,在这里?谢谢!
我正在运行的代码直接从github/open-nlp复制。我的代码副本是:
class OpennlpTryer
$DEBUG=false
# From https://github.com/louismullie/open-nlp
# Hints: Dir.pwd; File.expand_path('../../Gemfile', __FILE__);
# Load the module
require 'open-nlp'
#require 'jruby-jars'
=begin
# Alias "write" to "print" to monkeypatch the NoMethod write error
java_import java.io.PrintStream
class PrintStream
java_alias(:write, :print, [java.lang.String])
end
=end
=begin
# Display path of jruby-jars jars...
puts JRubyJars.core_jar_path # => path to jruby-core-VERSION.jar
puts JRubyJars.stdlib_jar_path # => path to jruby-stdlib-VERSION.jar
=end
puts ENV['CLASSPATH']
# Set an alternative path to look for the JAR files.
# Default is gem's bin folder.
# OpenNLP.jar_path = '/path_to_jars/'
OpenNLP.jar_path = File.join(ENV["GEM_HOME"],"gems/open-nlp-0.1.4/bin/")
puts OpenNLP.jar_path
# Set an alternative path to look for the model files.
# Default is gem's bin folder.
# OpenNLP.model_path = '/path_to_models/'
OpenNLP.model_path = File.join(ENV["GEM_HOME"],"gems/open-nlp-0.1.4/bin/")
puts OpenNLP.model_path
# Pass some alternative arguments to the Java VM.
# Default is ['-Xms512M', '-Xmx1024M'].
# OpenNLP.jvm_args = ['-option1', '-option2']
OpenNLP.jvm_args = ['-Xms512M', '-Xmx1024M']
# Redirect VM output to log.txt
OpenNLP.log_file = 'log.txt'
# Set default models for a language.
# OpenNLP.use :language
OpenNLP.use :english # Make sure this is lower case!!!!
# Simple tokenizer
OpenNLP.load
sent = "The death of the poet was kept from his poems."
tokenizer = OpenNLP::SimpleTokenizer.new
tokens = tokenizer.tokenize(sent).to_a
# => %w[The death of the poet was kept from his poems .]
puts "Tokenize #{tokens}"
# Maximum entropy tokenizer, chunker and POS tagger
OpenNLP.load
chunker = OpenNLP::ChunkerME.new
tokenizer = OpenNLP::TokenizerME.new
tagger = OpenNLP::POSTaggerME.new
sent = "The death of the poet was kept from his poems."
tokens = tokenizer.tokenize(sent).to_a
# => %w[The death of the poet was kept from his poems .]
puts "Tokenize #{tokens}"
tags = tagger.tag(tokens).to_a
# => %w[DT NN IN DT NN VBD VBN IN PRP$ NNS .]
puts "Tags #{tags}"
chunks = chunker.chunk(tokens, tags).to_a
# => %w[B-NP I-NP B-PP B-NP I-NP B-VP I-VP B-PP B-NP I-NP O]
puts "Chunks #{chunks}"
# Abstract Bottom-Up Parser
OpenNLP.load
sent = "The death of the poet was kept from his poems."
parser = OpenNLP::Parser.new
parse = parser.parse(sent)
=begin
parse.get_text.should eql sent
parse.get_span.get_start.should eql 0
parse.get_span.get_end.should eql 46
parse.get_child_count.should eql 1
=end
child = parse.get_children[0]
child.text # => "The death of the poet was kept from his poems."
child.get_child_count # => 3
child.get_head_index #=> 5
child.get_type # => "S"
puts "Child: #{child}"
# Maximum Entropy Name Finder*
OpenNLP.load
# puts File.expand_path('.', __FILE__)
text = File.read('./spec/sample.txt').gsub!("\n", "")
tokenizer = OpenNLP::TokenizerME.new
segmenter = OpenNLP::SentenceDetectorME.new
puts "Tokenizer: #{tokenizer}"
puts "Segmenter: #{segmenter}"
ner_models = ['person', 'time', 'money']
ner_finders = ner_models.map do |model|
OpenNLP::NameFinderME.new("en-ner-#{model}.bin")
end
puts "NER Finders: #{ner_finders}"
sentences = segmenter.sent_detect(text)
puts "Sentences: #{sentences}"
named_entities = []
sentences.each do |sentence|
tokens = tokenizer.tokenize(sentence)
ner_models.each_with_index do |model, i|
finder = ner_finders[i]
name_spans = finder.find(tokens)
name_spans.each do |name_span|
start = name_span.get_start
stop = name_span.get_end-1
slice = tokens[start..stop].to_a
named_entities << [slice, model]
end
end
end
puts "Named Entities: #{named_entities}"
# Loading specific models
# Just pass the name of the model file to the constructor. The gem will search for the file in the OpenNLP.model_path folder.
OpenNLP.load
tokenizer = OpenNLP::TokenizerME.new('en-token.bin')
tagger = OpenNLP::POSTaggerME.new('en-pos-perceptron.bin')
name_finder = OpenNLP::NameFinderME.new('en-ner-person.bin')
# etc.
puts "Tokenizer: #{tokenizer}"
puts "Tagger: #{tagger}"
puts "Name Finder: #{name_finder}"
# Loading specific classes
# You may want to load specific classes from the OpenNLP library that are not loaded by default. The gem provides an API to do this:
# Default base class is opennlp.tools.
OpenNLP.load_class('SomeClassName')
# => OpenNLP::SomeClassName
# Here, we specify another base class.
OpenNLP.load_class('SomeOtherClass', 'opennlp.tools.namefind')
# => OpenNLP::SomeOtherClass
end
失败的行是第73行:(令牌==正在处理的句子。)
tags = tagger.tag(tokens).to_a #
# => %w[DT NN IN DT NN VBD VBN IN PRP$ NNS .]
tagger.tag调用open-nlp / classes.rb第13行,这是引发错误的地方。那里的代码是:
class OpenNLP::POSTaggerME < OpenNLP::Base
unless RUBY_PLATFORM =~ /java/
def tag(*args)
OpenNLP::Bindings::Utils.tagWithArrayList(@proxy_inst, args[0]) # <== Line 13
end
end
end
此时抛出的Ruby错误是:`method_missing&#39;:未知异常(NullPointerException)。调试这个,我发现错误java.lang.NullPointerException。 args [0]是正在处理的句子。 @proxy_inst是opennlp.tools.postag.POSTaggerME@1b5080a。
OpenNLP :: Bindings设置Java环境。例如,它设置要加载的Jars和这些Jars中的类。在第54行中,它设置了RJB的默认值,它应该设置OpenNLP :: Bindings :: Utils及其方法如下:
# Add in Rjb workarounds.
unless RUBY_PLATFORM =~ /java/
self.default_jars << 'utils.jar'
self.default_classes << ['Utils', '']
end
utils.jar和Utils.java位于CLASSPATH中,其他Jars正在加载。正在访问它们,这是经过验证的,因为如果它们不存在,其他Jars会抛出错误消息。 CLASSPATH是:
.;C:\Program Files (x86)Java\jdk1.7.0_40\lib;C:\Program Files (x86)Java\jre7\lib;D:\BitNami\rubystack-1.9.3-12\ruby\lib\ruby\gems\1.9.1\gems\open-nlp-0.1.4\bin
应用程序Jars位于D:\ BitNami \ rubystack-1.9.3-12 \ ruby \ lib \ ruby \ gems \ 1.9.1 \ gems \ open-nlp-0.1.4 \ bin中,再次,如果他们不存在我在其他罐子上收到错误消息。 ... \ bin中的Jars和Java文件包括:
jwnl-1.3.3.jar
opennlp-maxent-3.0.2-incubating.jar
opennlp-tools-1.5.2-incubating.jar
opennlp-uima-1.5.2-incubating.jar
utils.jar
Utils.java
Utils.java如下:
import java.util.Arrays;
import java.util.ArrayList;
import java.lang.String;
import opennlp.tools.postag.POSTagger;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.namefind.NameFinderME; // interface instead?
import opennlp.tools.util.Span;
// javac -cp '.:opennlp.tools.jar' Utils.java
// jar cf utils.jar Utils.class
public class Utils {
public static String[] tagWithArrayList(POSTagger posTagger, ArrayList[] objectArray) {
return posTagger.tag(getStringArray(objectArray));
}
public static Object[] findWithArrayList(NameFinderME nameFinder, ArrayList[] tokens) {
return nameFinder.find(getStringArray(tokens));
}
public static Object[] chunkWithArrays(ChunkerME chunker, ArrayList[] tokens, ArrayList[] tags) {
return chunker.chunk(getStringArray(tokens), getStringArray(tags));
}
public static String[] getStringArray(ArrayList[] objectArray) {
String[] stringArray = Arrays.copyOf(objectArray, objectArray.length, String[].class);
return stringArray;
}
}
因此,它应该定义tagWithArrayList并导入opennlp.tools.postag.POSTagger。 (OBTW,只是为了尝试,我在此文件中将POSTagger的发生率更改为POSTaggerME。它没有改变任何内容......)
工具Jar文件opennlp-tools-1.5.2-incubating.jar包含postag / POSTagger和POSTaggerME类文件,如预期的那样。
错误消息是:
D:\BitNami\rubystack-1.9.3-12\ruby\bin\ruby.exe -e $stdout.sync=true;$stderr.sync=true;load($0=ARGV.shift) D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb
.;C:\Program Files (x86)\Java\jdk1.7.0_40\lib;C:\Program Files (x86)\Java\jre7\lib;D:\BitNami\rubystack-1.9.3-12\ruby\lib\ruby\gems\1.9.1\gems\open-nlp-0.1.4\bin
D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/bin/
D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/bin/
Tokenize ["The", "death", "of", "the", "poet", "was", "kept", "from", "his", "poems", "."]
Tokenize ["The", "death", "of", "the", "poet", "was", "kept", "from", "his", "poems", "."]
D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:13:in `method_missing': unknown exception (NullPointerException)
from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:13:in `tag'
from D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:73:in `<class:OpennlpTryer>'
from D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
from -e:1:in `load'
from -e:1:in `<main>'
修改了Utils.java:
import java.util.Arrays;
import java.util.Object;
import java.lang.String;
import opennlp.tools.postag.POSTagger;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.namefind.NameFinderME; // interface instead?
import opennlp.tools.util.Span;
// javac -cp '.:opennlp.tools.jar' Utils.java
// jar cf utils.jar Utils.class
public class Utils {
public static String[] tagWithArrayList(POSTagger posTagger, Object[] objectArray) {
return posTagger.tag(getStringArray(objectArray));
}f
public static Object[] findWithArrayList(NameFinderME nameFinder, Object[] tokens) {
return nameFinder.find(getStringArray(tokens));
}
public static Object[] chunkWithArrays(ChunkerME chunker, Object[] tokens, Object[] tags) {
return chunker.chunk(getStringArray(tokens), getStringArray(tags));
}
public static String[] getStringArray(Object[] objectArray) {
String[] stringArray = Arrays.copyOf(objectArray, objectArray.length, String[].class);
return stringArray;
}
}
修改后的错误消息:
Uncaught exception: uninitialized constant OpennlpTryer::ArrayStoreException
D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:81:in `rescue in <class:OpennlpTryer>'
D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:77:in `<class:OpennlpTryer>'
D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
修改了错误,将Utils.java修改为&#34; import java.lang.Object;&#34;:
Uncaught exception: uninitialized constant OpennlpTryer::ArrayStoreException
D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:81:in `rescue in <class:OpennlpTryer>'
D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:77:in `<class:OpennlpTryer>'
D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
从OpennlpTryer中删除的救援显示了在classes.rb中捕获的错误:
Uncaught exception: uninitialized constant OpenNLP::POSTaggerME::ArrayStoreException
D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:16:in `rescue in tag'
D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:13:in `tag'
D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:78:in `<class:OpennlpTryer>'
D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
同样的错误,但删除了所有的救援,所以它是&#34;本地Ruby&#34;
Uncaught exception: unknown exception
D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:15:in `method_missing'
D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:15:in `tag'
D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:78:in `<class:OpennlpTryer>'
D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
修订了Utils.java:
import java.util.Arrays;
import java.util.ArrayList;
import java.lang.String;
import opennlp.tools.postag.POSTagger;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.namefind.NameFinderME; // interface instead?
import opennlp.tools.util.Span;
// javac -cp '.:opennlp.tools.jar' Utils.java
// jar cf utils.jar Utils.class
public class Utils {
public static String[] tagWithArrayList(
System.out.println("Tokens: ("+objectArray.getClass().getSimpleName()+"): \n"+objectArray);
POSTagger posTagger, ArrayList[] objectArray) {
return posTagger.tag(getStringArray(objectArray));
}
public static Object[] findWithArrayList(NameFinderME nameFinder, ArrayList[] tokens) {
return nameFinder.find(getStringArray(tokens));
}
public static Object[] chunkWithArrays(ChunkerME chunker, ArrayList[] tokens, ArrayList[] tags) {
return chunker.chunk(getStringArray(tokens), getStringArray(tags));
}
public static String[] getStringArray(ArrayList[] objectArray) {
String[] stringArray = Arrays.copyOf(objectArray, objectArray.length, String[].class);
return stringArray;
}
}
我在Utils.class上运行了cavaj,我从util.jar解压缩,这就是我找到的。它与Utils.java有很大的不同。两者都安装了open-nlp 1.4.8 gem。我不知道这是否是问题的根本原因,但是这个文件是它破坏的核心,我们有一个主要的差异。我们应该使用哪个?
import java.util.ArrayList;
import java.util.Arrays;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.postag.POSTagger;
public class Utils
{
public Utils()
{
}
public static String[] tagWithArrayList(POSTagger postagger, ArrayList aarraylist[])
{
return postagger.tag(getStringArray(aarraylist));
}
public static Object[] findWithArrayList(NameFinderME namefinderme, ArrayList aarraylist[])
{
return namefinderme.find(getStringArray(aarraylist));
}
public static Object[] chunkWithArrays(ChunkerME chunkerme, ArrayList aarraylist[], ArrayList aarraylist1[])
{
return chunkerme.chunk(getStringArray(aarraylist), getStringArray(aarraylist1));
}
public static String[] getStringArray(ArrayList aarraylist[])
{
String as[] = (String[])Arrays.copyOf(aarraylist, aarraylist.length, [Ljava/lang/String;);
return as;
}
}
自2007年10月起使用的Utils.java,编译并压缩为utils.jar:
import java.util.Arrays;
import java.util.ArrayList;
import java.lang.String;
import opennlp.tools.postag.POSTagger;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.namefind.NameFinderME; // interface instead?
import opennlp.tools.util.Span;
// javac -cp '.:opennlp.tools.jar' Utils.java
// jar cf utils.jar Utils.class
public class Utils {
public static String[] tagWithArrayList(POSTagger posTagger, ArrayList[] objectArray) {
return posTagger.tag(getStringArray(objectArray));
}
public static Object[] findWithArrayList(NameFinderME nameFinder, ArrayList[] tokens) {
return nameFinder.find(getStringArray(tokens));
}
public static Object[] chunkWithArrays(ChunkerME chunker, ArrayList[] tokens, ArrayList[] tags) {
return chunker.chunk(getStringArray(tokens), getStringArray(tags));
}
public static String[] getStringArray(ArrayList[] objectArray) {
String[] stringArray = Arrays.copyOf(objectArray, objectArray.length, String[].class);
return stringArray;
}
}
此处第110行的BindIt :: Binding :: load_klass中发生了失败:
# Private function to load classes.
# Doesn't check if initialized.
def load_klass(klass, base, name=nil)
base += '.' unless base == ''
fqcn = "#{base}#{klass}"
name ||= klass
if RUBY_PLATFORM =~ /java/
rb_class = java_import(fqcn)
if name != klass
if rb_class.is_a?(Array)
rb_class = rb_class.first
end
const_set(name.intern, rb_class)
end
else
rb_class = Rjb::import(fqcn) # <== This is line 110
const_set(name.intern, rb_class)
end
end
消息如下,但就所识别的特定方法而言,它们不一致。每次运行都可以显示不同的方法,包括POSTagger,ChunkerME或NameFinderME。
D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:110:in `import': opennlp/tools/namefind/NameFinderME (NoClassDefFoundError)
from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:110:in `load_klass'
from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:89:in `block in load_default_classes'
from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:87:in `each'
from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:87:in `load_default_classes'
from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:56:in `bind'
from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp.rb:14:in `load'
from D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:54:in `<class:OpennlpTryer>'
from D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
from -e:1:in `load'
from -e:1:in `<main>'
关于这些错误的有趣之处在于它们起源于OpennlpTryer第54行:
OpenNLP.load
此时,OpenNLP会激活使用BindIt加载jar和类的RJB。这是在我在这个问题的开头看到的错误之前。但是,我无法帮助,但认为这一切都是相关的。我根本不理解这些错误的不一致性。
我能够将日志功能添加到Utils.java中,在添加&#34; import java.io。*&#34;之后编译它。并压缩它。但是,我因为这些错误而将其删除,因为我不知道它是否涉及。我不这么认为。但是,由于这些错误是在加载过程中发生的,因此无论如何都不会调用该方法,因此在那里记录不会有帮助......
对于每个其他jar,加载jar然后使用RJB导入每个类。实用程序的处理方式不同,并指定为&#34;默认&#34;。据我所知,Utils.class被执行加载自己的类?
10月10日稍后更新:
我认为这就是我的所在。首先,我在替换Utils.java时遇到了一些问题,正如我今天早些时候所描述的那样。在我可以安装修复程序之前,这个问题可能需要解决。
其次,我现在理解POSTagger和POSTaggerME之间的区别,因为ME意味着最大熵。测试代码试图调用POSTaggerME,但它看起来像Utils.java,实现后支持POSTagger。我尝试更改测试代码以调用POSTagger,但它说它无法找到初始化程序。查看每个这些的来源,我猜这里,我认为POSTagger的唯一目的是支持实现它的POSTaggerME。
源是opennlp-tools文件opennlp-tools-1.5.2-incubating-sources.jar。
我不能得到的是Utils的首要原因吗?为什么绑定.rb中提供的jar /类不够?这感觉就像一个糟糕的monkeypatch。我的意思是,看看bindings.rb首先做的是什么:
# Default JARs to load.
self.default_jars = [
'jwnl-1.3.3.jar',
'opennlp-tools-1.5.2-incubating.jar',
'opennlp-maxent-3.0.2-incubating.jar',
'opennlp-uima-1.5.2-incubating.jar'
]
# Default namespace.
self.default_namespace = 'opennlp.tools'
# Default classes.
self.default_classes = [
# OpenNLP classes.
['AbstractBottomUpParser', 'opennlp.tools.parser'],
['DocumentCategorizerME', 'opennlp.tools.doccat'],
['ChunkerME', 'opennlp.tools.chunker'],
['DictionaryDetokenizer', 'opennlp.tools.tokenize'],
['NameFinderME', 'opennlp.tools.namefind'],
['Parser', 'opennlp.tools.parser.chunking'],
['Parse', 'opennlp.tools.parser'],
['ParserFactory', 'opennlp.tools.parser'],
['POSTaggerME', 'opennlp.tools.postag'],
['SentenceDetectorME', 'opennlp.tools.sentdetect'],
['SimpleTokenizer', 'opennlp.tools.tokenize'],
['Span', 'opennlp.tools.util'],
['TokenizerME', 'opennlp.tools.tokenize'],
# Generic Java classes.
['FileInputStream', 'java.io'],
['String', 'java.lang'],
['ArrayList', 'java.util']
]
# Add in Rjb workarounds.
unless RUBY_PLATFORM =~ /java/
self.default_jars << 'utils.jar'
self.default_classes << ['Utils', '']
end
答案 0 :(得分:3)
我认为你根本没有做错任何事。你也是not the only one with this problem。它看起来像Utils
中的错误。在Java中创建ArrayList[]
没有多大意义 - 它在技术上是合法的,但它将是一个ArrayList
的数组,其中a)只是简单的奇怪而且b)关于Java的可怕做法泛型,和c)不会像String[]
那样正好投射到getStringArray()
。
考虑到实用程序的编写方式以及OpenNLP的实际情况,事实上,期望收到String[]
作为其tag()
方法的输入,我最好的猜测是原作者的意思是Object[]
ArrayList[]
类中Utils
的{{1}}。{/ 1>
<强>更新强>
要输出到项目目录根目录中的文件,请尝试像这样调整日志记录(我添加了另一行来打印输入数组的内容):
try {
File log = new File("log.txt");
FileWriter fileWriter = new FileWriter(log);
BufferedWriter bufferedWriter = new BufferedWriter(fileWriter);
bufferedWriter.write("Tokens ("+objectArray.getClass().getSimpleName()+"): \r\n"+objectArray.toString()+"\r\n");
bufferedWriter.write(Arrays.toString(objectArray));
bufferedWriter.close();
}
catch (Exception e) {
e.printStackTrace();
}
答案 1 :(得分:3)
查看完整正确的CLASSES.RB模块的完整代码
我今天遇到了同样的问题。我不太明白为什么要使用Utils类,所以我用以下方式修改了classes.rb文件:
unless RUBY_PLATFORM =~ /java/
def tag(*args)
@proxy_inst.tag(args[0])
#OpenNLP::Bindings::Utils.tagWithArrayList(@proxy_inst, args[0])
end
end
通过这种方式,我可以通过以下测试:
sent = "The death of the poet was kept from his poems."
tokens = tokenizer.tokenize(sent).to_a
# => %w[The death of the poet was kept from his poems .]
tags = tagger.tag(tokens).to_a
# => ["prop", "prp", "n", "v-fin", "n", "adj", "prop", "v-fin", "n", "adj", "punc"]
R_G编辑: 我测试了这个变化,它消除了错误。我将不得不做更多的测试,以确保结果是应该预期的。但是,遵循相同的模式,我在classes.rb中进行了以下更改:
def chunk(tokens, tags)
chunks = @proxy_inst.chunk(tokens, tags)
# chunks = OpenNLP::Bindings::Utils.chunkWithArrays(@proxy_inst, tokens,tags)
chunks.map { |c| c.to_s }
end
...
class OpenNLP::NameFinderME < OpenNLP::Base
unless RUBY_PLATFORM =~ /java/
def find(*args)
@proxy_inst.find(args[0])
# OpenNLP::Bindings::Utils.findWithArrayList(@proxy_inst, args[0])
end
end
end
这使得整个样本测试能够无故障地执行。我将提供有关结果验证的更新信息。
每个太空教皇和R_G的最终编辑和更新的CLASSES.RB:
事实证明,这个答案是理想解决方案的关键。但是,结果不一致,因为它已得到纠正。我们继续深入研究并在电话会议期间实施强类型,如RJB所规定。这会将调用转换为使用_invoke方法,其中参数包括所需方法,强类型和其他参数。安德烈的建议是解决方案的关键,所以对他赞不绝口。这是完整的模块。它消除了试图进行这些调用但失败的Utils.class的需要。我们计划为open-nlp gem发出一个github pull请求来更新这个模块:
require 'open-nlp/base'
class OpenNLP::SentenceDetectorME < OpenNLP::Base; end
class OpenNLP::SimpleTokenizer < OpenNLP::Base; end
class OpenNLP::TokenizerME < OpenNLP::Base; end
class OpenNLP::POSTaggerME < OpenNLP::Base
unless RUBY_PLATFORM =~ /java/
def tag(*args)
@proxy_inst._invoke("tag", "[Ljava.lang.String;", args[0])
end
end
end
class OpenNLP::ChunkerME < OpenNLP::Base
if RUBY_PLATFORM =~ /java/
def chunk(tokens, tags)
if !tokens.is_a?(Array)
tokens = tokens.to_a
tags = tags.to_a
end
tokens = tokens.to_java(:String)
tags = tags.to_java(:String)
@proxy_inst.chunk(tokens,tags).to_a
end
else
def chunk(tokens, tags)
chunks = @proxy_inst._invoke("chunk", "[Ljava.lang.String;[Ljava.lang.String;", tokens, tags)
chunks.map { |c| c.to_s }
end
end
end
class OpenNLP::Parser < OpenNLP::Base
def parse(text)
tokenizer = OpenNLP::TokenizerME.new
full_span = OpenNLP::Bindings::Span.new(0, text.size)
parse_obj = OpenNLP::Bindings::Parse.new(
text, full_span, "INC", 1, 0)
tokens = tokenizer.tokenize_pos(text)
tokens.each_with_index do |tok,i|
start, stop = tok.get_start, tok.get_end
token = text[start..stop-1]
span = OpenNLP::Bindings::Span.new(start, stop)
parse = OpenNLP::Bindings::Parse.new(text, span, "TK", 0, i)
parse_obj.insert(parse)
end
@proxy_inst.parse(parse_obj)
end
end
class OpenNLP::NameFinderME < OpenNLP::Base
unless RUBY_PLATFORM =~ /java/
def find(*args)
@proxy_inst._invoke("find", "[Ljava.lang.String;", args[0])
end
end
end