Question

我的表包含超过1,200万行。

我需要使用Lucene.NET对这些行进行索引（我需要执行初始索引）。

所以我尝试通过从sql读取批处理数据包（每批1000行）以批处理方式进行索引。

以下是它的外观：

public void BuildInitialBookSearchIndex()
{
            FSDirectory directory = null;
            IndexWriter writer = null;

            var type = typeof(Book);

            var info = new DirectoryInfo(GetIndexDirectory());

            //if (info.Exists)
            //{
            //    info.Delete(true);
            //}

            try
            {
                directory = FSDirectory.GetDirectory(Path.Combine(info.FullName, type.Name), true);
                writer = new IndexWriter(directory, new StandardAnalyzer(), true);
            }
            finally
            {
                if (directory != null)
                {
                    directory.Close();
                }

                if (writer != null)
                {
                    writer.Close();
                }
            }

            var fullTextSession = Search.CreateFullTextSession(Session);

            var currentIndex = 0;
            const int batchSize = 1000;

            while (true)
            {
                var entities = Session
                    .CreateCriteria<BookAdditionalInfo>()
                    .CreateAlias("Book", "b")
                    .SetFirstResult(currentIndex)
                    .SetMaxResults(batchSize)
                    .List();

                using (var tx = Session.BeginTransaction())
                {
                    foreach (var entity in entities)
                    {
                        fullTextSession.Index(entity);
                    }

                    currentIndex += batchSize;

                    Session.Flush();
                    tx.Commit();
                    Session.Clear();
                }

                if (entities.Count < batchSize)
                    break;
     }
}

但是，当前指数大于6-7百万时，操作会超时。 NHibernate Pagging抛出时间。

任何建议，NHibernate中的任何其他方式来索引这12百万行？

修改

可能我会实施最农民的解决方案。

因为BookId是我的表中的集群索引并且BookId选择非常快，所以我将找到max BookId并浏览所有记录并将它们全部索引。

for (long = 0; long < maxBookId; long++)
{
   // get book by bookId
   // if book exist, index it
}

如果您有任何其他建议，请回答此问题。

Answer 1

您可以尝试划分并征服它，而不是分页整个数据集。你说你有书籍id的索引，只需根据bookid的界限改变你的标准来返回批量的书籍：

var entities = Session
    .CreateCriteria<BookAdditionalInfo>()
    .CreateAlias("Book", "b")
    .Add(Restrictions.Gte("BookId", low))
    .Add(Restrictions.Lt("BookId", high))
    .List();

低和高设置为0-1000,1001-2000等

Nhibernate分页性能

1 个答案: