Question

我正在使用反向文件方法进行全文索引编制，该方法将提取文档中的所有单词，并将每个单词一个接一个地插入到我的MYSQL表中。

到目前为止，我的程序运行良好，但我一直在思考如何进一步优化以缩短插入db所需的时间。我知道倒排文件的缺点是建立索引表的时间很慢。

这是我的代码：

public class MysqlAccessIndex {
    public Connection connect = null;
    public Statement statement = null;
    public PreparedStatement preparedStatement = null;
    public ResultSet resultSet = null;



    public MysqlAccessIndex() throws Exception {
        Class.forName("com.mysql.jdbc.Driver");
        connect = DriverManager
                .getConnection("jdbc:mysql://126.32.3.178/fulltext_ltat?"
                        + "user=root&password=root123");
      //  statement = connect.createStatement();
        System.out.print("Connected");
    }


    public void readDataBase(String path,String word) throws Exception {
        try {


            preparedStatement = connect
                    .prepareStatement("insert IGNORE into  fulltext_ltat.test_text values (?, ?) ");

            preparedStatement.setString(1, path);
            preparedStatement.setString(2, word);
            preparedStatement.executeUpdate();



        } catch (Exception e) {
            throw e;
        } finally {
            close();
        }

    }

MYSQL连接：

{{1}}

是否可以使用某种类型的多线程在三个行中同时插入三个单词以加快插入过程或某种排序的速度？任何建议，我将不胜感激。

Answer 1

我认为您的问题的解决方案-使用批量插入。您可以尝试执行以下操作：

public void readDataBase(String path, HashSet<String> uniqueWords) throws Exception {

    PreparedStatement preparedStatement;

    try {

        String compiledQuery = "insert IGNORE into  fulltext_ltat.test_text values (?, ?) ";
        preparedStatement = connect.prepareStatement(compiledQuery);

        for(String word : uniqueWords) {
            preparedStatement.setString(1, path);
            preparedStatement.setString(2, word);
            preparedStatement.addBatch();
        }

        long start = System.currentTimeMillis();
        int[] inserted = preparedStatement.executeBatch();

        } catch (Exception e) {
            throw e;
        } finally {
            close();
        }
}

修改您的readDataBase方法以使HashSet<String> uniqueWords处于参数中。

此后，您应该在每个项目之后添加preparedStatement.addBatch()调用，以插入并执行preparedStatement.executeBatch()而不是最后的preparedStatement.executeUpdate()。

我希望这会有所帮助。

Java-改善建立索引表的性能

1 个答案: