Question

令我惊讶的是，我发现在H2中将文档索引到全文搜索引擎中的过程相对较慢，我想加快这一过程。

我正在使用H2的内存版本，这使这种情况特别令人惊讶。

一些使用10万个小文档的基准测试（仅标题和一些标签）：

使用org.h2.fulltext.FullTextLucene.init大约需要15秒才能编制索引。
使用org.h2.fulltext.FullText.init不变。
仅SQL插入（即禁用全文本索引编制）仅需1秒。
使用Elasticsearch（具有批量索引）时，我希望此金额会得到处理，并且可以在3秒钟内进行搜索，也就是说，它甚至存储在磁盘上。

一些其他信息可能会有所帮助：

连接被重用。
不使用停用词（但这在文档大小方面不会有太大区别）。 EDIT_2：我添加了一大堆停用词（> 100）。这使它的速度提高了<10％（从15秒到14秒）。
单独执行SQL插入（即禁用全文本索引编制）仅需1秒，因此问题应该出在全文检索索引上。
tutorial官方网站和有关performance的页面似乎没有提供解决方案。
似乎没有像Elasticsearch那样进行批量索引的可能性。
EDIT_1：我还尝试创建SQL表并插入FIRST（花费1秒），然后创建全文搜索索引并运行FullTextLucene.reindex()。但这会使该过程更加缓慢。

如果有帮助，下面是创建索引和进行插入的代码：

创建索引：

private void createTablesAndLuceneIndex() {
    try {
        final Statement statement = this.createStatement();
        statement.execute("CREATE ALIAS IF NOT EXISTS FT_INIT FOR \"org.h2.fulltext.FullTextLucene.init\"");
        statement.execute("CALL FT_INIT()");
        //  FullTextLucene.setIgnoreList(this.conn, "to,this"); // Do we need stop words?
        FullTextLucene.setWhitespaceChars(this.conn, " ,.-");

        // Set up SQL table & Lucene index
        statement.execute("CREATE TABLE " + PNS_VIDEOS + "(ID INT PRIMARY KEY, TITLE VARCHAR, TAGS VARCHAR, ACTORS VARCHAR)");
        statement.execute("CALL FT_CREATE_INDEX('PUBLIC', '" + PNS_VIDEOS + "', NULL)");

        // Close statement
        statement.close();
    } catch (final SQLException e) {
        throw new SqlTableCreationException(e); // todo logging?!
    }
}

索引文件：

public void index(final PnsVideo pnsVideo) {
    try (PreparedStatement statement = this.conn.prepareStatement("INSERT INTO " + PNS_VIDEOS + " VALUES(?, ?, ?, ?)")) {

        statement.setInt(1, this.autoKey.getAndIncrement());
        statement.setString(2, pnsVideo.getTitle());
        statement.setString(3, Joiner.on(",").join(pnsVideo.getTags()));
        statement.setString(4, Joiner.on(",").join(pnsVideo.getActors()));
        statement.execute();

    } catch (final SQLException e) {
        throw new FTSearchIndexException(e); // todo logging?!
    }
}

感谢您的任何建议！

有没有一种方法可以加快内存中全文搜索的索引速度？

0 个答案: