索引字段为空

时间:2015-07-05 07:30:17

标签: indexing lucene

我正在使用Lucene库。我想索引一些文档并为它们生成TermVectors。我编写了一个Indexer类来创建索引的字段,但是这段代码返回一个空字段。

我的索引类是:

public class Indexer {

private static File sourceDirectory;
private static File indexDirectory;
private String fieldtitle,fieldbody;

public Indexer() {
    this.sourceDirectory = new File(LuceneConstants.dataDir);
    this.indexDirectory = new File(LuceneConstants.indexDir);
    fieldtitle = LuceneConstants.CONTENTS1;
    fieldbody= LuceneConstants.CONTENTS2;
}

public void index() throws CorruptIndexException,
        LockObtainFailedException, IOException {
    Directory dir = FSDirectory.open(indexDirectory.toPath());
    Analyzer analyzer = new StandardAnalyzer(StandardAnalyzer.STOP_WORDS_SET);  // using stop words
    IndexWriterConfig iwc = new IndexWriterConfig(analyzer);

    if (indexDirectory.exists()) {
        iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
    } else {
        // Add new documents to an existing index:
        iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
    }

    IndexWriter writer = new IndexWriter(dir, iwc);
    for (File f : sourceDirectory.listFiles()) {
        Document doc = new Document();
        String[] linetext=getAllText(f);
        String title=linetext[1];
        String body=linetext[2];

        doc.add(new Field(fieldtitle, title, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
        doc.add(new Field(fieldbody, body, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
        writer.addDocument(doc);
    }
    writer.close();
}

public String[] getAllText(File f) throws FileNotFoundException, IOException {
    String textFileContent = "";
    String[] ar = null;

    try {
    BufferedReader in = new BufferedReader(new FileReader(f));
    for (String str : Files.readAllLines(Paths.get(f.getAbsolutePath()))) {
         textFileContent += str;
            ar=textFileContent.split("--");

    }
    in.close();
} catch (IOException e) {
    System.out.println("File Read Error");
}
    return ar;
}
}

和调试的结果是:

doc     Document    #534    
fields  ArrayList   "size=0"    
Static          
linetext    String[]    #535(length=4)  
title   String          "how ...."  
body    String          "I created ...."    

我在调试时也遇到了另一个错误:

  

非静态方法" toString"无法从静态上下文中引用。

文件路径发生此错误。

1 个答案:

答案 0 :(得分:0)

听起来你有一个空文件,或者正在运行IOException。请参阅代码的这一部分:

String[] ar = null;

try {
    //Do Stuff
} catch (IOException e) {
    System.out.println("File Read Error");
}
return ar;

在IOException上,您无法处理它,并且有效地保证您之后立即遇到另一个异常。如果遇到IOException,或者getAllText返回长度为1或2的数组,您需要弄清楚如何处理它

此外,不是您目前遇到的问题,但这几乎肯定是倒退的:

if (indexDirectory.exists()) {
    iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
} else {
    // Add new documents to an existing index:
    iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
}

无论如何,根本不需要它。这是CREATE_OR_APPEND的用途,写入现有索引,或者如果不存在则创建它。只需用

替换整个位
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);