Lucene - 使用FSDirectory创建索引

时间:2014-08-13 21:47:53

标签: java asp.net lucene

第一次发帖;长期读者。如果已经在这里问过这个问题,我很抱歉(我也是lucene的新手!)。我做了很多研究,但没能找到我的问题的好解释/例子。

首先,我使用IKVM.NET将lucene 4.9 java转换为包含在我的.net应用程序中。我选择这样做,所以我能够使用最新版本的lucene。没有问题。

我正在尝试创建一个基本示例,以开始学习lucene并将其应用到我的应用程序。我做了无数的谷歌搜索,阅读了很多文章,apache的网站等。我的代码主要是以下示例:http://www.lucenetutorial.com/lucene-in-5-minutes.html

我的问题是,我不相信我想使用RAMDirectory ..对吗?因为我将索引数据库并允许用户通过网站进行搜索。我选择使用FSDirectory,因为我认为它不应该全部存储在内存中。

创建IndexWriter时,每次都会创建新文件(.cfe,.cfs,.si,segments.gen,write.lock等)在我看来,你会创建一次这些文件然后再使用它们直到索引需要重建?

那么如何在不重新创建索引文件的情况下创建IndexWriter呢?

代码:

StandardAnalyzer analyzer;
Directory directory;
protected void Page_Load(object sender, EventArgs e)
{
  var version = org.apache.lucene.util.Version.LUCENE_CURRENT;
  analyzer = new StandardAnalyzer(version);

  if(directory == null){ directory= FSDirectory.open(new java.io.File(HttpContext.Current.Request.PhysicalApplicationPath + "/indexes"));
        }

        IndexWriterConfig config = new IndexWriterConfig(version, analyzer);

        //i found setting the open mode will overwrite the files but still creates new each time
        config.setOpenMode(IndexWriterConfig.OpenMode.CREATE);

        IndexWriter w = new IndexWriter(directory, config);
        addDoc(w, "test", "1234");
        addDoc(w, "test1", "1234");
        addDoc(w, "test2", "1234");
        addDoc(w, "test3", "1234");
        w.close(); 

}


private static void addDoc(IndexWriter w, String _keyword, String _keywordid)
    {
        Document doc = new Document();
        doc.add(new TextField("Keyword", _keyword, Field.Store.YES));
        doc.add(new StringField("KeywordID", _keywordid, Field.Store.YES));
        w.addDocument(doc);
    }

protected void searchButton_Click(object sender, EventArgs e)
{
        String querystr = "";
        String results=""; 


        querystr = searchTextBox.Text.ToString();

        Query q = new QueryParser(org.apache.lucene.util.Version.LUCENE_4_0, "Keyword", analyzer).parse(querystr);

        int hitsPerPage = 100;
        DirectoryReader reader = DirectoryReader.open(directory);
        IndexSearcher searcher = new IndexSearcher(reader);

        TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
        searcher.search(q, collector);
        ScoreDoc[] hits = collector.topDocs().scoreDocs;

        if (hits.Length == 0)
        {
           label.Text = "Nothing was found.";
        }
        else
           {
             for (int i = 0; i < hits.Length; ++i)
              {
               int docID = hits[i].doc;
               Document d = searcher.doc(docID);

               results += "<br />" + (i + 1) + ". " + d.get("KeywordID") + "\t" + d.get("Keyword") +   " Hit Score: " + hits[i].score.ToString() + "<br />";

               }
               label.Text = results;
               reader.close(); 
            }
  }

2 个答案:

答案 0 :(得分:3)

是的,RAMDirectory非常适合快速,即时的测试和教程,但在生产中,通常希望通过{{将索引存储在文件系统中1}}。

每次打开writer时它重写索引的原因是你将OpenMode设置为FSDirectoryIndexWriterConfig.OpenMode.CREATE表示您要删除该位置的任何现有索引,并从头开始。您可能需要IndexWriterConfig.OpenMode.CREATE_OR_APPEND,如果找到,将打开现有索引。


一个小注:

您不应该使用LUCENE_CURRENT(不建议使用),而是使用真实版本。您还在QueryParser中使用LUCENE_4_0。这些都不会导致任何重大问题,但无论如何都要保持一致。

答案 1 :(得分:0)

When we use RAMDirectory it loads whole index or large parts of it into “memory” that is virtual memory. As physical memory is limited, the operating system may, of course, decide to swap out our large RAMDirectory. So RAMDirectory is not a good idea to optimize index loading times.

On the other hand, if we don’t use RAMDirectory to buffer our index and use NIOFSDirectory or SimpleFSDirectory, we have to pay another price: Our code has to do a lot of syscalls to the O/S kernel to copy blocks of data between the disk or filesystem cache and our buffers residing in Java heap. This needs to be done on every search request, over and over again.

To resolve all above issue MMapDirectory uses virtual memory and a kernel feature called “mmap” to access the disk files.

Check this link also.