Question

我正在为一个目录下包含的大约400,000个文档编写Lucene索引。我的buildIndex（）方法在目录的小子集上工作正常，但当然，当我为整个目录构建索引时，我的代码会抛出内存不足异常。这是我现在的方法：

    public static void BuildIndex()
    {
        FSDirectory directory = FSDirectory.Open(new System.IO.DirectoryInfo(@"Z:\Indexer\index"));

        Analyzer analyzer = new SimpleAnalyzer();

        IndexWriter indexWriter = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);

        foreach(string file in System.IO.Directory.EnumerateFiles(Index.topDirectoryPath, "*", System.IO.SearchOption.AllDirectories))
        {
            indexWriter.AddDocument(TfsDocument.GetLuceneDocument(file));
        }

        indexWriter.Dispose();
    }

请注意，我并没有针对速度进行优化，我只希望我的索引以合理的速率构建，而不会耗尽内存。我想知道index.AddDocument或我的TfsDocument.GetLucenceDocument（）是否会留下内存中的东西？我不能＆＃39;弄清楚如何使用Lucence文档实现using（）。

另外，我的TfsDocument.GetLuceneDocument（）代码供参考

    public static Document GetLuceneDocument(string path)
    {
        Document document = new Document();

        using (StreamReader reader = new StreamReader(path))
        {
           document.Add(new Field("Word", reader.ReadToEnd(), Field.Store.YES, Field.Index.ANALYZED));
        }

        document.Add(new Field("Path", path, Field.Store.YES, Field.Index.ANALYZED));

        return document;
    }

感谢任何帮助。

如何优化我的Lucence BuildIndex（）方法以避免Out of Memory异常？

0 个答案: