使用spark和作业服务器映射reduce以在Cassandra中执行group by和sum

时间:2016-05-06 09:50:35

标签: scala apache-spark cassandra spark-jobserver

我正在创建一个连接到cassandra的spark job服务器。在获得记录后,我想要执行一个简单的组并对其求和。我能够检索数据,我无法打印输出。我已经尝试谷歌数小时,并已发布在cassandra谷歌组。我目前的代码如下,我收集错误。

 override def runJob(sc: SparkContext, config: Config): Any = {
//sc.cassandraTable("store", "transaction").select("terminalid","transdate","storeid","amountpaid").toArray().foreach (println)
// Printing of each record is successful
val rdd = sc.cassandraTable("POSDATA", "transaction").select("terminalid","transdate","storeid","amountpaid")
val map1 = rdd.map ( x => (x.getInt(0), x.getInt(1),x.getDate(2))->x.getDouble(3) ).reduceByKey((x,y)=>x+y)
println(map1)
// output is ShuffledRDD[3] at reduceByKey at Daily.scala:34
map1.collect
//map1.ccollectAsMap().map(println(_))
//Throwing error java.lang.ClassNotFoundException: transaction.Daily$$anonfun$2

}

2 个答案:

答案 0 :(得分:0)

您的map1是RDD。您可以尝试以下方法:

map1.foreach(r => println(r))

答案 1 :(得分:0)

Spark对rdd进行了懒惰的评估。所以尝试一些行动

   map1.take(10).foreach(println)