Android - 读取大量(30MB)文本文件并压缩它们

时间:2015-12-22 13:15:12

标签: java android file text compression

我正在尝试读取两个文件,加入其内容然后GZIP,以便将其发送到服务器。

当文件分别为几MB时,一切都很好,但当其中一个文件的重量约为30MB(生产预期大小)时,尝试读取它时会出现Out of memory on a 43628012-byte allocation错误。我不知道我做错了什么,因为它适用于较小的文件。

接下来是我用来读取文本文件的代码:

    private String getTextFromFile(File fileName) {
        StringBuilder logsHolder = new StringBuilder();
        BufferedReader input;
        try {
            input =  new BufferedReader(new FileReader(fileName));
            String line = null;
            String lineSeparator = System.getProperty("line.separator");
            while ((line = input.readLine()) != null){
                logsHolder.append(line);
                logsHolder.append(lineSeparator);
            }
            input.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return logsHolder.toString();
    }

在读取数千行后,错误将在logsHolder.append(line);行上启动。接下来是LogCat输出:

            01-04 09:54:25.852: D/dalvikvm(888): GC_FOR_ALLOC freed 1223K, 29% free 6002K/8364K, paused 21ms, total 21ms
            01-04 09:54:25.892: D/dalvikvm(888): GC_FOR_ALLOC freed 1022K, 30% free 6235K/8860K, paused 16ms, total 17ms
            01-04 09:54:25.932: D/dalvikvm(888): GC_FOR_ALLOC freed 884K, 27% free 6481K/8860K, paused 18ms, total 19ms
            01-04 09:54:25.932: I/dalvikvm-heap(888): Grow heap (frag case) to 8.521MB for 1134874-byte allocation
            01-04 09:54:25.952: D/dalvikvm(888): GC_FOR_ALLOC freed 738K, 32% free 6851K/9972K, paused 18ms, total 18ms
            01-04 09:54:26.012: D/dalvikvm(888): GC_FOR_ALLOC freed 586K, 32% free 6851K/9972K, paused 18ms, total 18ms
            01-04 09:54:26.012: I/dalvikvm-heap(888): Grow heap (frag case) to 9.422MB for 1702306-byte allocation
            01-04 09:54:26.042: D/dalvikvm(888): GC_FOR_ALLOC freed 1108K, 37% free 7405K/11636K, paused 20ms, total 20ms
            01-04 09:54:26.122: D/dalvikvm(888): GC_FOR_ALLOC freed 878K, 37% free 7405K/11636K, paused 21ms, total 21ms
            01-04 09:54:26.122: I/dalvikvm-heap(888): Grow heap (frag case) to 10.776MB for 2553454-byte allocation
            01-04 09:54:26.152: D/dalvikvm(888): GC_CONCURRENT freed 0K, 30% free 9899K/14132K, paused 10ms+2ms, total 27ms
            01-04 09:54:26.152: D/dalvikvm(888): WAIT_FOR_CONCURRENT_GC blocked 17ms
            01-04 09:54:26.242: D/dalvikvm(888): GC_FOR_ALLOC freed 2980K, 42% free 8236K/14132K, paused 16ms, total 16ms
            01-04 09:54:26.252: I/dalvikvm-heap(888): Grow heap (frag case) to 12.805MB for 3830176-byte allocation
            01-04 09:54:26.282: D/dalvikvm(888): GC_CONCURRENT freed 0K, 33% free 11977K/17876K, paused 10ms+3ms, total 27ms
            01-04 09:54:26.282: D/dalvikvm(888): WAIT_FOR_CONCURRENT_GC blocked 8ms
            01-04 09:54:26.432: D/dalvikvm(888): GC_FOR_ALLOC freed 4470K, 47% free 9483K/17876K, paused 17ms, total 17ms
            01-04 09:54:26.442: I/dalvikvm-heap(888): Grow heap (frag case) to 15.849MB for 5745260-byte allocation
            01-04 09:54:26.472: D/dalvikvm(888): GC_CONCURRENT freed 0K, 36% free 15094K/23488K, paused 17ms+2ms, total 33ms
            01-04 09:54:26.472: D/dalvikvm(888): WAIT_FOR_CONCURRENT_GC blocked 15ms
            01-04 09:54:26.663: D/dalvikvm(888): GC_FOR_ALLOC freed 6704K, 52% free 11353K/23488K, paused 20ms, total 20ms
            01-04 09:54:26.683: I/dalvikvm-heap(888): Grow heap (frag case) to 20.415MB for 8617886-byte allocation
            01-04 09:54:26.713: D/dalvikvm(888): GC_CONCURRENT freed 0K, 39% free 19769K/31904K, paused 17ms+2ms, total 32ms
            01-04 09:54:26.713: D/dalvikvm(888): WAIT_FOR_CONCURRENT_GC blocked 14ms
            01-04 09:54:27.033: D/dalvikvm(888): GC_FOR_ALLOC freed 10057K, 56% free 14158K/31904K, paused 31ms, total 31ms
            01-04 09:54:27.053: I/dalvikvm-heap(888): Grow heap (frag case) to 27.264MB for 12926824-byte allocation
            01-04 09:54:27.093: D/dalvikvm(888): GC_CONCURRENT freed 8415K, 59% free 18366K/44528K, paused 17ms+2ms, total 32ms
            01-04 09:54:27.093: D/dalvikvm(888): WAIT_FOR_CONCURRENT_GC blocked 15ms
            01-04 09:54:27.333: D/dalvikvm(888): GC_CONCURRENT freed 4324K, 59% free 18367K/44528K, paused 1ms+3ms, total 29ms
            01-04 09:54:27.333: D/dalvikvm(888): WAIT_FOR_CONCURRENT_GC blocked 22ms
            01-04 09:54:27.493: D/dalvikvm(888): GC_FOR_ALLOC freed 2345K, 59% free 18366K/44528K, paused 19ms, total 19ms
            01-04 09:54:27.513: I/dalvikvm-heap(888): Grow heap (frag case) to 37.537MB for 19390232-byte allocation
            01-04 09:54:27.563: D/dalvikvm(888): GC_CONCURRENT freed 0K, 42% free 37302K/63464K, paused 34ms+4ms, total 51ms
            01-04 09:54:27.563: D/dalvikvm(888): WAIT_FOR_CONCURRENT_GC blocked 15ms
            01-04 09:54:28.094: D/dalvikvm(888): GC_FOR_ALLOC freed 20815K, 62% free 24678K/63464K, paused 40ms, total 40ms
            01-04 09:54:28.234: D/dalvikvm(888): GC_FOR_ALLOC freed 1814K, 62% free 24678K/63464K, paused 22ms, total 23ms
            01-04 09:54:28.284: I/dalvikvm-heap(888): Grow heap (frag case) to 52.947MB for 29085344-byte allocation
            01-04 09:54:28.344: D/dalvikvm(888): GC_FOR_ALLOC freed 18935K, 63% free 34146K/91868K, paused 21ms, total 21ms
            01-04 09:54:29.245: D/dalvikvm(888): GC_FOR_ALLOC freed 8191K, 63% free 34146K/91868K, paused 50ms, total 51ms
            01-04 09:54:29.846: D/dalvikvm(888): GC_FOR_ALLOC freed 6821K, 63% free 34138K/91868K, paused 33ms, total 33ms
            01-04 09:54:29.846: I/dalvikvm-heap(888): Forcing collection of SoftReferences for 43628012-byte allocation
            01-04 09:54:29.866: D/dalvikvm(888): GC_BEFORE_OOM freed 76K, 63% free 34061K/91868K, paused 27ms, total 27ms
            01-04 09:54:29.866: E/dalvikvm-heap(888): Out of memory on a 43628012-byte allocation.

我不知道压缩部分是否可以在如此大的缓冲区上工作,但目前我唯一的问题是阅读巨大的文本文件。

我希望你能帮助我找到为什么会失败,我应该改变什么才能使它发挥作用。

修改

接下来,如果我用来压缩两个文件的连接内容的代码:

        File previousLog = SystemEventsReceiver.getPreviousLog();
        if (previousLog.exists()) {
            logsHolder.append(getTextFromFile(previousLog));
        }

        File currentLog = SystemEventsReceiver.getCurrentLog();
        if (currentLog.exists()) {
            logsHolder.append(getTextFromFile(currentLog));
        }

        Log.v("MyApp", "uncompressed logs: " + logsHolder.toString().getBytes().length);
        // Compress logs.
        byte[] compressedLogs = null;
        try {
            ByteArrayOutputStream os = new ByteArrayOutputStream(logsHolder.length());
            GZIPOutputStream gos = new GZIPOutputStream(os);
            gos.write(logsHolder.toString().getBytes());
            gos.close();
            compressedLogs = os.toByteArray();
            os.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        Log.v("MyApp", "compressed logs: " + compressedLogs.length);

1 个答案:

答案 0 :(得分:0)

感谢@ PhilW的评论,我提出了以下解决方案:

    Log.v("MyApp", "reading logs");
    // Get logs.
    File previousLog = SystemEventsReceiver.getPreviousLog();
    File currentLog = SystemEventsReceiver.getCurrentLog();

    int completeLogSize = (int) (currentLog.length() + previousLog.length());
    Log.v("MyApp", "uncompressed logs: " + completeLogSize);
    // Compress logs.
    byte[] compressedLogs = null;
    try {
        ByteArrayOutputStream os = new ByteArrayOutputStream(completeLogSize);
        GZIPOutputStream gzipOS = new GZIPOutputStream(os);

        if (previousLog.exists()) {
            addLogToGZIP(previousLog, gzipOS);
        }
        if (currentLog.exists()) {
            addLogToGZIP(currentLog, gzipOS);
        }

        gzipOS.close();
        compressedLogs = os.toByteArray();
        os.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
    Log.v("MyApp", "compressed logs: " + compressedLogs.length);



    private void addLogToGZIP(File logFile, GZIPOutputStream gzipOS) {
        byte[] bytes = new byte[1024];

        try {
            BufferedInputStream buffer = new BufferedInputStream(new FileInputStream(logFile));
            while (buffer.read(bytes, 0, bytes.length) != -1) {
                gzipOS.write(bytes);
            }
            buffer.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

我从每个日志文件中读取字节并将其直接添加到GZIPOutputStream。 它适用于55MB文件(约1.100.000行)甚至100MB(~2.200.000行)的文件。