java.lang.OutOfMemoryError |阅读和发布大量数据

时间:2017-03-31 10:04:03

标签: java csv jackson out-of-memory

我已经在这一段时间里摸不着头脑了。我有一个包含数千亿条记录的大型CSV文件。

我手头有一个简单的任务,用这个CSV文件创建JSON并将其发布到服务器。我想尽快完成这项任务。到目前为止,我读取CSV的代码如下:

protected void readIdentityCsvDynamicFetch() {


        String csvFile = pathOfIdentities;
        CSVReader reader = null;
        PayloadEngine payloadEngine = new PayloadEngine();
        long counter = 0;

        int size;

        List<IdentityJmPojo> identityJmList = new ArrayList<IdentityJmPojo>();


        try {
            ExecutorService uploaderPoolService = Executors.newFixedThreadPool(3);

            long lineCount = lineCount(pathOfIdentities);
            logger.info("Line Count: " + lineCount);
            reader = new CSVReader(new BufferedReader(new FileReader(csvFile)), ',', '\'', OFFSET);


            String[] line;
            long startTime = System.currentTimeMillis();

            while ((line = reader.readNext()) != null) {

             //   logger.info("Lines"+line[0]+line[1]);
                IdentityJmPojo identityJmPojo = new IdentityJmPojo();
                identityJmPojo.setIdentity(line[0]);
                identityJmPojo.setJM(line.length  > 1 ? line[1] : (jsonValue/*!=null?"":jsonValue*/));
                identityJmList.add(identityJmPojo);
                size = identityJmList.size();

                switch (size) {
                    case STEP:
                        counter = counter + STEP;
                        payloadEngine.prepareJson(identityJmList, uploaderPoolService,jsonKey);
                        identityJmList = new ArrayList<IdentityJmPojo>();

                        long stopTime = System.currentTimeMillis();
                        long elapsedTime = stopTime - startTime;
                        logger.info("=================== Time taken to read " + STEP + " records from CSV: " + elapsedTime + " and total records read: " + counter + "===================");
                }

            }

            if (identityJmList.size() > 0) {
                logger.info("=================== Executing Last Loop - Payload Size: " + identityJmList.size() + " ================= ");
                payloadEngine.prepareJson(identityJmList, uploaderPoolService, jsonKey);
            }
            uploaderPoolService.shutdown();

        } catch (Throwable e) {
            e.printStackTrace();
            logger.error("CsvReader || readIdentityCsvDynamicFetch method ", e);
        } finally {
            try {
                if (reader != null)
                    reader.close();
            } catch (IOException e) {

                e.printStackTrace();
                logger.error("CsvReader || readIdentityCsvDynamicFetch method ", e);
            }
        }
    }

现在我使用ThreadPool执行器服务,在其run()方法中我有一个Apache Http客户端设置将JSON发布到服务器。 (我正在使用连接池并保持活跃的策略,只打开和关闭conn一次)

我创造&amp;发布我的JSON如下:

 public void prepareJson(List<IdentityJmPojo> identities, ExecutorService notificationService, String key) {
        try {
             notificationService.submit(new SendPushNotification(prepareLowLevelJson(identities, key)));

          //  prepareLowLevelJson(identities, key);
        } catch (Exception e) {
            e.printStackTrace();
            logger.error("PayloadEngine || readIdentityCsvDynamicFetch method ", e);
        }
    }


    private ObjectNode prepareLowLevelJson(List<IdentityJmPojo> identities, String key) {
        long startTime = System.currentTimeMillis();
        ObjectNode mainJacksonObject = JsonNodeFactory.instance.objectNode();
        ArrayNode dJacksonArray = JsonNodeFactory.instance.arrayNode();
        for (IdentityJmPojo identityJmPojo : identities) {

            ObjectNode dSingleObject = JsonNodeFactory.instance.objectNode();
            ObjectNode dProfileInnerObject = JsonNodeFactory.instance.objectNode();

            dSingleObject.put("identity", identityJmPojo.getIdentity());
            dSingleObject.put("ts", ts);
            dSingleObject.put("type", "profile");
            //
            dProfileInnerObject.put(key, identityJmPojo.getJM());
            dSingleObject.set("profileData", dProfileInnerObject);

            dJacksonArray.add(dSingleObject);
        }

        mainJacksonObject.set("d", dJacksonArray);

        long stopTime = System.currentTimeMillis();
        long elapsedTime = stopTime - startTime;
        logger.info("===================Time to create JSON: " + elapsedTime + "===================");
        return mainJacksonObject;
    }

现在出现了一个奇怪的部分,当我注释掉通知服务时:

// notificationService.submit(new SendPushNotification(prepareLowLevelJson(identities, key)));

一切顺利,我可以阅读CSV并在29000毫安以下准备JSON。

但是当要执行实际任务时,它会失败并且我得到内存不足错误,我认为这里存在设计缺陷。如何快速处理大量数据,我们将非常感谢任何提示。

我认为在for循环中创建Json对象和数组也占用了很多内存,但是我似乎没有找到替代方案。

这是堆栈跟踪:

java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.LinkedHashMap.createEntry(LinkedHashMap.java:442)
    at java.util.HashMap.addEntry(HashMap.java:884)
    at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:427)
    at java.util.HashMap.put(HashMap.java:505)
    at com.fasterxml.jackson.databind.node.ObjectNode._put(ObjectNode.java:861)
    at com.fasterxml.jackson.databind.node.ObjectNode.put(ObjectNode.java:769)
    at uploader.PayloadEngine.prepareLowLevelJson(PayloadEngine.java:50)
    at uploader.PayloadEngine.prepareJson(PayloadEngine.java:24)
    at uploader.CsvReader.readIdentityCsvDynamicFetch(CsvReader.java:97)
    at uploader.Main.main(Main.java:30)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)

0 个答案:

没有答案