我正在尝试使用以下链接中的stanford解析器进行编码
https://gist.github.com/a34729t/2562754
我包含了所有jar文件,但它在import语句中显示错误
import edu.stanford.nlp.fsm.ExactGrammarCompactor;
谁能告诉我如何解决这个问题?我已经包含了所有jar文件,但我仍然无法弄清楚真正的问题是什么
线程中的异常" main" java.lang.Error:未解决的编译问题:
at pkg.stanford.Stan.main(Stan.java:39)
import edu.stanford.nlp.fsm.ExactGrammarCompactor;
import edu.stanford.nlp.io.IOUtils;
import edu.stanford.nlp.io.NumberRangeFileFilter;
import edu.stanford.nlp.io.NumberRangesFileFilter;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.objectbank.TokenizerFactory;
import edu.stanford.nlp.parser.ViterbiParser;
import edu.stanford.nlp.parser.KBestViterbiParser;
import edu.stanford.nlp.process.DocumentPreprocessor;
import edu.stanford.nlp.util.Function;
import edu.stanford.nlp.process.WhitespaceTokenizer;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.trees.international.arabic.ArabicTreebankLanguagePack;
import edu.stanford.nlp.util.Generics;
import edu.stanford.nlp.util.Numberer;
import edu.stanford.nlp.util.Pair;
import edu.stanford.nlp.util.Timing;
import edu.stanford.nlp.util.ScoredObject;
import java.io.*;
import java.text.DecimalFormat;
import java.text.NumberFormat;
import java.util.*;
import java.util.zip.GZIPOutputStream;
import java.util.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.parser.lexparser.LexicalizedParser;
import edu.stanford.nlp.process.PTBTokenizer;
public class RunStanfordParser {
public static void main(String[] args) throws Exception {
// input format: data directory, and output directory
String parserFileOrUrl=args[0];
String fileToParse=args[1];
LexicalizedParser lp = new LexicalizedParser(parserFileOrUrl);
//lp.setOptionFlags(new String[]{"-maxLength",
"80", "-retainTmpSubcategories"}); // set max sentence length if you want
// Call parser on files, and tokenize the contents
FileInputStream fstream = new FileInputStream(fileToParse);
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
StringReader sr;
PTBTokenizer tkzr; // tokenizer object
WordStemmer ls = new WordStemmer(); // stemmer/lemmatizer object
// Read File Line By Line
String strLine;
while ((strLine = br.readLine()) != null) {
System.out.println ("Tokenizing and Parsing: "+strLine);
sr = new StringReader(strLine);
tkzr = PTBTokenizer.newPTBTokenizer(sr);
List toks = tkzr.tokenize();
System.out.println ("tokens: "+toks);
Tree parse = (Tree) lp.apply(toks);
ArrayList<String> words = new ArrayList();
ArrayList<String> stems = new ArrayList();
ArrayList<String> tags = new ArrayList();
// Get words and Tags
for (TaggedWord tw : parse.taggedYield()){
words.add(tw.word());
tags.add(tw.tag());
}
// Get stems
ls.visitTree(parse); // apply the stemmer to the tree
for (TaggedWord tw : parse.taggedYield()){
stems.add(tw.word());
}
// Get dependency tree
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
Collection tdl = gs.typedDependenciesCollapsed();
// And print!
System.out.println("words: "+words);
System.out.println("POStags: "+tags);
System.out.println("stemmedWordsAndTags: "+stems);
System.out.println("typedDependencies: "+tdl);
System.out.println(); // separate output lines
}
}
}
答案 0 :(得分:1)
您应该将两个输入参数传递给您的程序,它需要它。那个问题
String parserFileOrUrl=args[0];
String fileToParse=args[1];
预计会args[0]
和args[1]
。请在继续执行代码之前检查数组的长度,如
String parserFileOrUrl = null;
String fileToParse = null;
if(args.length == 2){
parserFileOrUrl=args[0];
fileToParse=args[1];
}else{
System.exit(1);
}
当两个输入没有提供给程序时,它将退出
注意:我已将System.exit
代码设为1
,表示发生了一些错误。