无法处理camel中的大文件

时间:2015-09-04 06:09:24

标签: apache-camel

我正在尝试对Csv文件进行简单的转换。但是我的程序卡住并且没有给出任何输出,并且在控制台上打印的内容如下所示。

22:38:02.001 [main] INFO  o.a.camel.impl.DefaultCamelContext - Apache Camel 2.15.2 (CamelContext: camel-1) is shutting down
22:38:02.135 [main] INFO  o.a.c.impl.DefaultShutdownStrategy - Starting to graceful shutdown 1 routes (timeout 300 seconds)
22:38:02.167 [main] DEBUG o.a.c.i.DefaultExecutorServiceManager - Created new ThreadPool for source: org.apache.camel.impl.DefaultShutdownStrategy@65ead16a with name: ShutdownTask. -> org.apache.camel.util.concurrent.RejectableThreadPoolExecutor@52c0a65f[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0][ShutdownTask]
22:38:02.173 [Camel (camel-1) thread #1 - ShutdownTask] DEBUG o.a.c.impl.DefaultShutdownStrategy - There are 1 routes to shutdown
22:38:02.177 [Camel (camel-1) thread #1 - ShutdownTask] DEBUG o.a.c.impl.DefaultShutdownStrategy - Route: route1 suspended and shutdown deferred, was consuming from: Endpoint[file:///home/cloudera/Desktop/camelinput/?delay=15m&noop=true]
22:38:02.177 [Camel (camel-1) thread #1 - ShutdownTask] INFO  o.a.c.impl.DefaultShutdownStrategy - Waiting as there are still 2 inflight and pending exchanges to complete, timeout in 300 seconds.
22:38:02.179 [Camel (camel-1) thread #1 - ShutdownTask] DEBUG o.a.c.impl.DefaultShutdownStrategy - There are 1 inflight exchanges:
    InflightExchange: [exchangeId=ID-quickstart-cloudera-40574-1441345060577-0-2, fromRouteId=route1, routeId=route1, nodeId=unmarshal1, elapsed=10787, duration=10791]
22:38:05.436 [Camel (camel-1) thread #1 - ShutdownTask] INFO  o.a.c.impl.DefaultShutdownStrategy - Waiting as there are still 2 inflight and pending exchanges to complete, timeout in 299 seconds.
22:38:05.437 [Camel (camel-1) thread #1 - ShutdownTask] DEBUG o.a.c.impl.DefaultShutdownStrategy - There are 1 inflight exchanges:
    InflightExchange: [exchangeId=ID-quickstart-cloudera-40574-1441345060577-0-2, fromRouteId=route1, routeId=route1, nodeId=unmarshal1, elapsed=14045, duration=14049]
22:38:08.201 [Camel (camel-1) thread #1 - ShutdownTask] INFO  o.a.c.impl.DefaultShutdownStrategy - Waiting as there are still 2 inflight and pending exchanges to complete, timeout in 298 seconds.
22:38:08.202 [Camel (camel-1) thread #1 - ShutdownTask] DEBUG o.a.c.impl.DefaultShutdownStrategy - There are 1 inflight exchanges:
    InflightExchange: [exchangeId=ID-quickstart-cloudera-40574-1441345060577-0-2, fromRouteId=route1, routeId=route1, nodeId=unmarshal1, elapsed=16810, duration=16814]

实际上相同的程序适用于小文件,但是当我尝试使用大文件时,我遇到了这个问题。我认为它可能与Threads有关。请帮我解决问题。 以下是我的计划

主类

CamelContext myContext = new DefaultCamelContext();

         TestRouter myRoute=new TestRouter();

        HDFSTransfer hdfsTransfer=new HDFSTransfer();
        String copy=hdfsTransfer.copyToLocal("hdfs://localhost:8020", "/user/cloudera/input/CamelTestIn.csv", "/home/cloudera/Desktop/camelinput/");
        boolean flag=false;
        if("SUCCESS".equals(copy)){
        myContext.addRoutes(myRoute);

        // Launching the context
        myContext.start();

        // Pausing to let the route do its work
        Thread.sleep(10000);

        myContext.stop();
        flag=true;
        }
        if(flag){
            hdfsTransfer.moveFile("hdfs://localhost:8020", "file:/home/cloudera/Desktop/camelout/out.csv", "/user/cloudera/output/");
        }


    }

RouterBuilder类      CsvDataFormat csv = new CsvDataFormat();

        from("file:/home/cloudera/Desktop/camelinput/?noop=true&delay=15m")

            .unmarshal(csv)
            .convertBodyTo(List.class)
            .process(new Processor() {

                @Override
                public void process(Exchange msg) throws Exception {
                    List<List<String>> data = (List<List<String>>) msg.getIn().getBody();
                    for (List<String> line : data) {
                        // Checks if column two contains text STANDARD 
                        // and alters its value to DELUXE.
                        // System.out.println("line "+line);
                         /*if("Aug-04".equalsIgnoreCase(line.get(6))){
                             line.set(6, "04-August");
                         }*/


                    }
                }
            }).marshal(csv)

            .to("file:/home/cloudera/Desktop/camelout/?fileName=out.csv")

            .log("done.").end();
    }

提前致谢

1 个答案:

答案 0 :(得分:1)

如果你有一个更大的文件,那么你需要睡眠超过10秒才能让它有时间处理文件。

另外请注意,当文件非常大时,您的当前路由会将文件读入内存。

请参阅http://camel.apache.org/csv.html

上的lazyLoad选项

此外,如果你的所有路线都在改变一个大文件中的某些文字,那么比起Camel路线更好更快地做到这一点。