Spark Cassandra连接器saveToCassandra()正在向驱动程序发送数据并导致OOM异常

时间:2015-09-29 21:18:00

标签: cassandra apache-spark driver spark-cassandra-connector

我正在尝试使用Spark Cassandra连接器。

这是我的代码:

   JavaRDD<UserStatistics> rdd=CassandraJavaUtil.javaFunctions(sparkContext).cassandraTable(
            ConfigStore.read("cassandra", "keyspace"), "user_activity_" + type).where("bucket =?",
            date).select("user_id", "code").mapToPair(row -> new Tuple2<String, Integer>(row
            .getString("user_id"), 1)).reduceByKey((value1, value2) -> value1 + value2).map(s ->
    {
        List<UserStatistics> userStatistics = new ArrayList<>();
        UserStatistics userStatistic = new UserStatistics();
        userStatistic.setUser_id(s._1);
        userStatistic.setStatistics_type(type);
        long total = s._2;
        int failureCount = 0;//s._2._2().iterator().next();
        int selectedCount = 0; //s._2._2().iterator().next();
        userStatistic.setTotal_count((int) total);
        userStatistic.setFailure_count(failureCount);
        userStatistic.setSelected_count(selectedCount);
        userStatistics.add(userStatistic);
        return userStatistic;
    });
    CassandraJavaUtil.javaFunctions(rdd).writerBuilder(ConfigStore.read("cassandra", "keyspace"),
            "user_statistics",mapToRow(UserStatistics.class)).saveToCassandra();

执行此操作后,输出以下内容。它最终会为驱动程序抛出一个OOM异常。 我不确定为什么要尝试向驱动程序发送数据。

Executor: Finished task 1007.0 in stage 0.0 (TID 1007). 84821 bytes result sent to driver
15/09/29 13:57:32 INFO TaskSetManager: Starting task 1016.0 in stage 0.0 (TID 1016, localhost, NODE_LOCAL, 2096 bytes)
15/09/29 13:57:32 INFO TaskSetManager: Finished task 1007.0 in stage 0.0 (TID 1007) in 78 ms on localhost (1009/640442)
15/09/29 13:57:32 INFO Executor: Running task 1016.0 in stage 0.0 (TID 1016)

0 个答案:

没有答案