Question

我正在构建一个webcrawler，我正在寻找处理我的线程和数据库（MySql）之间的请求和连接的最佳方法。

我有两种类型的线程：

Fetchers：他们抓取网站。他们生成url并将它们添加到2个表中：table_url和table_file。他们从table_url中选择   继续爬行。并更新table_url以设置visited = 1   读过一个网址。或者在他们阅读时访问= -1。他们能   删除行。

下载程序：他们下载文件。他们从table_file中选择。他们更新table_file以更改Downloaded列。他们从不   插入任何东西。

现在我正在使用这个：我基于c3p0建立了一个连接池。每个目标（网站）都有变量：

private Connection connection_downloader;
private Connection connection_fetcher;

当我实现网站时，我只创建了一次连接。然后每个线程将根据其目标使用连接。

每个线程都有变量：

private Statement statement;
private ResultSet resultSet;

在每次查询之前，我打开一个SqlStatement：

public static Statement openSqlStatement(Connection connection){
    try {
        return connection.createStatement();
    } catch (SQLException e) {
        e.printStackTrace();
    }
    return null;
}

在每次查询之后，我使用：

关闭sql语句和resultSet

public static  void closeSqlStatement(ResultSet resultSet, Statement statement){
    if (resultSet != null) try { resultSet.close(); } catch (SQLException e) {e.printStackTrace();}
    if (statement != null) try { statement.close(); } catch (SQLException e) {e.printStackTrace();}
}

现在我的Select查询仅适用于一个选择（我现在不必选择多个，但这会很快改变）并且定义如下：

public static  String sqlSelect(String Query, Connection connection, Statement statement, ResultSet resultSet){
    String result = null;
    try {
        resultSet = statement.executeQuery(Query);
        resultSet.next();
        result = resultSet.toString();
    } catch (SQLException e) {
        e.printStackTrace();
    }
    closeSqlStatement(resultSet, statement);
    return result;
}

插入，删除和更新查询使用此功能：

public static int sqlExec(String Query, Connection connection, Statement statement){
    int ResultSet = -1;
    try {
        ResultSet = statement.executeUpdate(Query);
    } catch (SQLException e) {
        e.printStackTrace();
    }
    closeSqlStatement(resultSet, statement);
    return ResultSet;
}

我的问题很简单：这可以改进得更快吗？我担心互斥会阻止线程更新链接，而另一个人正在这样做。

Answer 1

我相信你的设计存在缺陷。为一个网站分配一个全职连接将严重限制您的总体工作量。

由于您已经设置了连接池，因此在使用之前获取是完全可以的（并在之后返回）。

同样，try-with-catch 关闭所有ResultSet和Statement之后的代码将使代码更具可读性 - 并使用PreparedStatement而不是Statement也不会伤害。

一个示例（使用静态dataSource（）调用来访问您的池）：

public static String sqlSelect(String id) throws SQLException { try(Connection con = dataSource().getConnection(); PreparedStatement ps = con.prepareStatement("SELECT row FROM table WHERE key = ?")) { ps.setString(1, id); try(ResultSet resultSet = ps.executeQuery()) { if(rs.next()) { return rs.getString(1); } else { throw new SQLException("Nothing found"); } } } catch (SQLException e) { e.printStackTrace(); throw e; } }

遵循相同的模式我建议您为所有不同的插入/更新/选择应用程序使用创建方法 - 所有这些都只在DB逻辑内短时间内使用连接。

Answer 2

我无法看到在您的webcrawler线程中拥有所有数据库内容的真正优势。

为什么不使用带有sqlSelect和sqlExec方法的静态类，但没有Connection和ResultSet参数。两个连接对象也是静态的。确保连接对象在使用它们时有效。

JDBC在Multithread上优化MySql请求

2 个答案: