我正在使用Lucene库。我想索引一些文档并为它们生成TermVectors。我编写了一个Indexer类来创建索引的字段,但是这段代码返回一个空字段。
我的索引类是:
public class Indexer {
private static File sourceDirectory;
private static File indexDirectory;
private String fieldtitle,fieldbody;
public Indexer() {
this.sourceDirectory = new File(LuceneConstants.dataDir);
this.indexDirectory = new File(LuceneConstants.indexDir);
fieldtitle = LuceneConstants.CONTENTS1;
fieldbody= LuceneConstants.CONTENTS2;
}
public void index() throws CorruptIndexException,
LockObtainFailedException, IOException {
Directory dir = FSDirectory.open(indexDirectory.toPath());
Analyzer analyzer = new StandardAnalyzer(StandardAnalyzer.STOP_WORDS_SET); // using stop words
IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
if (indexDirectory.exists()) {
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
} else {
// Add new documents to an existing index:
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
}
IndexWriter writer = new IndexWriter(dir, iwc);
for (File f : sourceDirectory.listFiles()) {
Document doc = new Document();
String[] linetext=getAllText(f);
String title=linetext[1];
String body=linetext[2];
doc.add(new Field(fieldtitle, title, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
doc.add(new Field(fieldbody, body, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
writer.addDocument(doc);
}
writer.close();
}
public String[] getAllText(File f) throws FileNotFoundException, IOException {
String textFileContent = "";
String[] ar = null;
try {
BufferedReader in = new BufferedReader(new FileReader(f));
for (String str : Files.readAllLines(Paths.get(f.getAbsolutePath()))) {
textFileContent += str;
ar=textFileContent.split("--");
}
in.close();
} catch (IOException e) {
System.out.println("File Read Error");
}
return ar;
}
}
和调试的结果是:
doc Document #534
fields ArrayList "size=0"
Static
linetext String[] #535(length=4)
title String "how ...."
body String "I created ...."
我在调试时也遇到了另一个错误:
非静态方法" toString"无法从静态上下文中引用。
文件路径发生此错误。
答案 0 :(得分:0)
听起来你有一个空文件,或者正在运行IOException。请参阅代码的这一部分:
String[] ar = null;
try {
//Do Stuff
} catch (IOException e) {
System.out.println("File Read Error");
}
return ar;
在IOException上,您无法处理它,并且有效地保证您之后立即遇到另一个异常。如果遇到IOException,或者getAllText
返回长度为1或2的数组,您需要弄清楚如何处理它
此外,不是您目前遇到的问题,但这几乎肯定是倒退的:
if (indexDirectory.exists()) {
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
} else {
// Add new documents to an existing index:
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
}
无论如何,根本不需要它。这是CREATE_OR_APPEND
的用途,写入现有索引,或者如果不存在则创建它。只需用
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);