我正在尝试使用Spark Cassandra连接器。
这是我的代码:
JavaRDD<UserStatistics> rdd=CassandraJavaUtil.javaFunctions(sparkContext).cassandraTable(
ConfigStore.read("cassandra", "keyspace"), "user_activity_" + type).where("bucket =?",
date).select("user_id", "code").mapToPair(row -> new Tuple2<String, Integer>(row
.getString("user_id"), 1)).reduceByKey((value1, value2) -> value1 + value2).map(s ->
{
List<UserStatistics> userStatistics = new ArrayList<>();
UserStatistics userStatistic = new UserStatistics();
userStatistic.setUser_id(s._1);
userStatistic.setStatistics_type(type);
long total = s._2;
int failureCount = 0;//s._2._2().iterator().next();
int selectedCount = 0; //s._2._2().iterator().next();
userStatistic.setTotal_count((int) total);
userStatistic.setFailure_count(failureCount);
userStatistic.setSelected_count(selectedCount);
userStatistics.add(userStatistic);
return userStatistic;
});
CassandraJavaUtil.javaFunctions(rdd).writerBuilder(ConfigStore.read("cassandra", "keyspace"),
"user_statistics",mapToRow(UserStatistics.class)).saveToCassandra();
执行此操作后,输出以下内容。它最终会为驱动程序抛出一个OOM异常。 我不确定为什么要尝试向驱动程序发送数据。
Executor: Finished task 1007.0 in stage 0.0 (TID 1007). 84821 bytes result sent to driver
15/09/29 13:57:32 INFO TaskSetManager: Starting task 1016.0 in stage 0.0 (TID 1016, localhost, NODE_LOCAL, 2096 bytes)
15/09/29 13:57:32 INFO TaskSetManager: Finished task 1007.0 in stage 0.0 (TID 1007) in 78 ms on localhost (1009/640442)
15/09/29 13:57:32 INFO Executor: Running task 1016.0 in stage 0.0 (TID 1016)