我想把一个带有选项的字符串放入Weka中。选项字符串内部是weka tokenizer字符串,在tokenizer字符串内部是分隔符选项字符串。我收到错误消息"没有为-delimiters选项给出值。"如何格式化字符串?
这是我的代码:
String[] options = weka.core.Utils.splitOptions("weka.filters.unsupervised.attribute.StringToWordVector "
+ "-R first-last -W 1000 -prune-rate -1.0 -N 0 -stemmer weka.core.stemmers.NullStemmer "
+ "-stopwords-handler weka.core.stopwords.Null -M 1 -tokenizer "
+ "\"weka.core.tokenizers.NGramTokenizer -max 5 -min 1 -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"");
StringToWordVector remove = new StringToWordVector();
This question没有解决我的问题。
答案 0 :(得分:3)
您收到的错误消息显示在-delimeters
选项后找不到任何值。原因是Weka检测到字符串在-delimeter
查询参数之后立即以双引号结束。造成这种情况的根本原因是一个流氓引号,它出现在之前属于weka.core.tokenizers.NGramTokenizer
查询参数的-tokenizer
术语:
String[] options = weka.core.Utils.splitOptions("weka.filters.unsupervised.attribute.StringToWordVector "
+ "-R first-last -W 1000 -prune-rate -1.0 -N 0 -stemmer weka.core.stemmers.NullStemmer "
+ "-stopwords-handler weka.core.stopwords.Null -M 1 -tokenizer "
+ "\"weka.core.tokenizers.NGramTokenizer -max 5 -min 1 -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"");
^ rogue quotation mark. Bad.
将字符串更改为以下内容,一切正常:
String[] options =
weka.core.Utils.splitOptions(
"weka.filters.unsupervised.attribute.StringToWordVector "
+ "-R first-last -W 1000 -prune-rate -1.0 -N 0 "
+ "-stemmer weka.core.stemmers.NullStemmer "
+ "-stopwords-handler weka.core.stopwords.Null -M 1 "
+ "-tokenizer weka.core.tokenizers.NGramTokenizer -max 5 -min 1 "
+ "-delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"");
答案 1 :(得分:1)
您传递给splitOptions
的字符串的内容是:
weka.filters.unsupervised.attribute.StringToWordVector -R first-last -W 1000 -prune-rate -1.0 -N 0 -stemmer weka.core.stemmers.NullStemmer -stopwords-handler weka.core.stopwords.Null -M 1 -tokenizer "weka.core.tokenizers.NGramTokenizer -max 5 -min 1 -delimiters " \\r\\n\\t.,;:\\\'\\"()?!"
我不确定-tokenizer
的参数应该是什么,但是传递给它的字符串有一个-delimiters
标志,没有任何值,这与错误是一致的你报道了。
也许你打算将此传递给-tokenizer
:
"weka.core.tokenizers.NGramTokenizer -max 5 -min 1 -delimiters \" \\r\\n\\t.,;:\\\'\\"()?!\""
-delimiters
的参数是字符串。
答案 2 :(得分:1)
可能使用\
String[] options = weka.core.Utils.splitOptions("\"weka.filters.unsupervised.attribute.StringToWordVector\"" + "\"-R first-last -W 1000 -prune-rate -1.0 -N 0 -stemmer weka.core.stemmers.NullStemmer\""+ "\"-stopwords-handler weka.core.stopwords.Null -M 1 -tokenizer\""+ "\"weka.core.tokenizers.NGramTokenizer -max 5 -min 1 -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"");