我正在创建一个连接到cassandra的spark job服务器。在获得记录后,我想要执行一个简单的组并对其求和。我能够检索数据,我无法打印输出。我已经尝试谷歌数小时,并已发布在cassandra谷歌组。我目前的代码如下,我收集错误。
override def runJob(sc: SparkContext, config: Config): Any = {
//sc.cassandraTable("store", "transaction").select("terminalid","transdate","storeid","amountpaid").toArray().foreach (println)
// Printing of each record is successful
val rdd = sc.cassandraTable("POSDATA", "transaction").select("terminalid","transdate","storeid","amountpaid")
val map1 = rdd.map ( x => (x.getInt(0), x.getInt(1),x.getDate(2))->x.getDouble(3) ).reduceByKey((x,y)=>x+y)
println(map1)
// output is ShuffledRDD[3] at reduceByKey at Daily.scala:34
map1.collect
//map1.ccollectAsMap().map(println(_))
//Throwing error java.lang.ClassNotFoundException: transaction.Daily$$anonfun$2
}
答案 0 :(得分:0)
您的map1是RDD。您可以尝试以下方法:
map1.foreach(r => println(r))
答案 1 :(得分:0)
Spark对rdd进行了懒惰的评估。所以尝试一些行动
map1.take(10).foreach(println)