Question

我需要从MySQL数据库加载1亿多行到内存中。我的java程序失败了java.lang.OutOfMemoryError: Java heap space 我的机器中有8GB RAM，我在JVM选项中给了-Xmx6144m。

这是我的代码

public List<Record> loadTrainingDataSet() {

    ArrayList<Record> records = new ArrayList<Record>();
    try {
        Statement s = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
        s.executeQuery("SELECT movie_id,customer_id,rating FROM ratings");
        ResultSet rs = s.getResultSet();
        int count = 0;
        while (rs.next()) {

知道如何克服这个问题吗？

更新

我遇到了this post，并根据以下评论更新了我的代码。我似乎能够以相同的-Xmx6144m数量将数据加载到内存中，但这需要很长时间。

这是我的代码。

...
import org.apache.mahout.math.SparseMatrix;
...

@Override
public SparseMatrix loadTrainingDataSet() {
    long t1 = System.currentTimeMillis();
    SparseMatrix ratings = new SparseMatrix(NUM_ROWS,NUM_COLS);
    int REC_START = 0;
    int REC_END = 0;

    try {
        for (int i = 1; i <= 101; i++) {
            long t11 = System.currentTimeMillis();
            REC_END = 1000000 * i;
            Statement s = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
                    java.sql.ResultSet.CONCUR_READ_ONLY);
            s.setFetchSize(Integer.MIN_VALUE);
            ResultSet rs = s.executeQuery("SELECT movie_id,customer_id,rating FROM ratings LIMIT " + REC_START + "," + REC_END);//100480507
            while (rs.next()) {
                int movieId = rs.getInt("movie_id");
                int customerId = rs.getInt("customer_id");
                byte rating = (byte) rs.getInt("rating");
                ratings.set(customerId,movieId,rating);
            }
            long t22 = System.currentTimeMillis();
            System.out.println("Round " + i + " completed " + (t22 - t11) / 1000 + " seconds");
            rs.close();
            s.close();
        }

    } catch (Exception e) {
        System.err.println("Cannot connect to database server " + e);
    } finally {
        if (conn != null) {
            try {
                conn.close();
                System.out.println("Database connection terminated");
            } catch (Exception e) { /* ignore close errors */ }
        }
    }
    long t2 = System.currentTimeMillis();
    System.out.println(" Took " + (t2 - t1) / 1000 + " seconds");
    return ratings;
}

要加载前100,000行，需要2秒钟。要加载29个100,000行，需要46秒。我在中间停止了这个过程，因为它耗费了太多时间。这些可接受的时间是多少？有没有办法提高这段代码的性能？我在8GB RAM 64位Windows机器上运行它。

Answer 1

一亿条记录意味着每条记录最多可占用50个字节，以便适合6 GB +一些额外空间用于其他分配。在Java中，50字节不算什么;仅Object[]每个元素需要32个字节。您必须找到一种方法，以便在while (rs.next())循环中立即使用结果，而不是完全保留它们。

Answer 2

问题是我在s.executeQuery中得到了java.lang.OutOfMemoryError（自行排除

您可以将查询拆分为多个：

    s.executeQuery("SELECT movie_id,customer_id,rating FROM ratings LIMIT 0,300"); //shows the first 300 results
    //process this first result
    s.executeQuery("SELECT movie_id,customer_id,rating FROM ratings LIMIT 300,600");//shows 300 results starting from the 300th one
    //process this second result
    //etc

当找不到更多结果时，您可以暂停一段时间

Answer 3

您可以调用stmt.setFetchSize(50);和conn.setAutoCommitMode(false);以避免将整个ResultSet读入内存。

以下是文档的内容：

根据光标获取结果

默认情况下，驱动程序会立即收集查询的所有结果。这对于大型数据集来说可能不方便，因此JDBC驱动程序提供了一种在数据库游标上建立ResultSet并仅获取少量行的方法。

在连接的客户端缓存少量行，当用尽时，通过重新定位光标来检索下一行行。

注意：

基于游标的ResultSet不能在所有情况下使用。有一个限制的数量会使司机悄然退回一次获取整个ResultSet。
与服务器的连接必须使用V3协议。这是服务器版本7.4和的默认值（并且仅受其支持）后来.-
连接不得处于自动提交模式。后端关闭游标结束时的游标，因此在后端的自动提交模式下将在从中获取任何内容之前关闭光标.-
必须使用ResultSet类型创建Statement ResultSet.TYPE_FORWARD_ONLY。这是默认值，因此没有代码需要重写才能利用这一点，但这也意味着你不能向后滚动或以其他方式跳转结果集.-
给出的查询必须是单个语句，而不是多个语句用分号串起来。

示例：设置提取size以打开和关闭游标。

将代码更改为游标模式就像将Statement的获取大小设置为适当的大小一样简单。将提取大小设置为0将导致缓存所有行（默认行为）。

Class.forName("com.mysql.jdbc.Driver");
Connection conn = DriverManager.getConnection("jdbc:mysql://localhost/test?useCursorFetch=true&user=root");
// make sure autocommit is off 
conn.setAutoCommit(false); 
Statement st = conn.createStatement();

// Turn use of the cursor on. 
st.setFetchSize(50);
ResultSet rs = st.executeQuery("SELECT * FROM mytable");
while (rs.next()) {
   System.out.print("a row was returned.");
} 
rs.close();

// Turn the cursor off. 
st.setFetchSize(0);
rs = st.executeQuery("SELECT * FROM mytable");
while (rs.next()) {
   System.out.print("many rows were returned.");
} 
rs.close();

// Close the statement. 
st.close();

Answer 4

您将不得不重新设计并将数据分块加载到内存中。

示例

1）使用适当的SQL（sql仅选择一百万个）从数据库中加载前一百万条记录并进行处理 2）加载另一个类似的块。

仅setFetchSize不能解决此问题。

如何将1亿行加载到内存中

更新

4 个答案: