将先前写入HDFS的lucene索引加载到RamDirectory中

时间:2014-07-08 15:46:09

标签: java apache hadoop lucene

以下是错误消息:

Exception in thread "main" org.apache.lucene.index.IndexNotFoundException: no segments* file found in RAMDirectory@1cff1d4a lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@2ddf0c3: files: [/prod/hdfs/LUCENE/index/140601/_0.cfe, /prod/hdfs/LUCENE/index/140601/segments_2, /prod/hdfs/LUCENE/index/140601/_0.si, /prod/hdfs/LUCENE/index/140601/segments.gen, /prod/hdfs/LUCENE/index/140601/_0.cfs]
    at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:801)
    at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
    at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66)

我已正确提交并关闭了索引编写者。

以下是搜索者代码:

public class SearchFiles {

private SearchFiles() {}

public static void main(String[] args) throws Exception  {

    String filenm = ""; 
    // Creating FileSystem object, to be able to work with HDFS
    Configuration config = new Configuration();
    config.set("fs.defaultFS","hdfs://127.0.0.1:9000/");
    config.addResource(new Path("/usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop/core-site.xml"));
    FileSystem dfs = FileSystem.get(config);
    FileStatus[] status = dfs.listStatus(new Path("/prod/hdfs/LUCENE/index/140601"));

    // Creating a RAMDirectory (memory) object, to be able to create index in memory.
    RAMDirectory rdir = new RAMDirectory();

    // Getting the list of index files present in the directory into an array.
    FSDataInputStream filereader = null;

    for (int i=0;i<status.length;i++)
    {

    // Reading data from index files on HDFS directory into filereader object.
    filereader = dfs.open(status[i].getPath());
        int size = filereader.available();
        // Reading data from file into a byte array.            

        byte[] bytarr = new byte[size];
        filereader.read(bytarr, 0, size);

    // Creating file in RAM directory with names same as that of 
    //index files present in HDFS directory.
        filenm = new String (status[i].getPath().toString()) ; 
        String sSplitValue = filenm.substring(21,filenm.length());
        System.out.println( sSplitValue);

        IndexOutput indxout = rdir.createOutput((sSplitValue) , null);

        // Writing data from byte array to the file in RAM directory
        indxout.writeBytes(bytarr,bytarr.length);
        indxout.flush();        
        indxout.close();  
    }
    filereader.close();
//  IndexReader indexReader = IndexReader.open(rdir);

    IndexReader indexReader = DirectoryReader.open(rdir); 
    IndexSearcher searcher = new IndexSearcher(indexReader);
    Analyzer analyzer = new StandardAnalyzer (Version.LUCENE_47); 
    QueryParser parser = new QueryParser(Version.LUCENE_47, "FUNDG_SRCE_CD",analyzer); 
    Query query = parser.parse("D"); 
    TopDocs results = searcher.search(query,1000); 

    int numTotalHits = results.totalHits; 
    TopDocs topDocs = searcher.search(query,1000); 
    ScoreDoc[] hits = topDocs.scoreDocs; 

    //Printing the number of documents or entries that match the search query.
    System.out.println("Total Hits = "+ numTotalHits); 
    for (int j =0 ; j < hits.length ; j++) {
        int docId = hits[j].doc; 

        Document d = searcher.doc(docId);

    System.out.println(d.get("FUNDG_SRCE_CD") +" " + d.get("ACCT_NUM") ) ; 
}
}
}

1 个答案:

答案 0 :(得分:0)

我不相信你应该将IOContext参数中的空值传递给createOutput。请尝试使用IOContext.DEFAULT。真的不知道这是否会使这项工作,但也许是朝着正确方向迈出的一步。

为什么不轻松一点?您可以使用适当的RAMDirectory构造函数来复制索引:

public static void main(String[] args) throws Exception  {
    Directory oldDirectory = FSDirectory("/prod/hdfs/LUCENE/index/140601");
    Directory rdir = new RAMDirectory(fsDirectory, IOContext.DEFAULT);
    IndexReader indexReader = DirectoryReader.open(rdir); 
    //etc.
}