scala / spark中的异常org.apache.spark.rdd.RDD [(scala.collection.immutable.Map [String,Any],Int)]

时间:2016-07-12 15:44:12

标签: scala apache-spark spark-streaming twitter4j rdd

使用下面的代码我得到特定过滤器的推文:

val topCounts60 = tweetMap.map((_, 1)).

reduceByKeyAndWindow(_+_, Seconds(60*60))

topCounts60的示例输出之一是以下格式,如果我做topCounts60.println():

(Map(UserLang -> en, UserName -> Harmeet Singh, UserScreenName -> 
harmeetsingh060, HashTags -> , UserVerification -> false, Spam -> true,     UserFollowersCount -> 44, UserLocation -> भारत, UserStatusCount -> 50,   UserCreated -> 2016-07-04T06:32:49.000+0530, UserDescription -> Punjabi Music,   TextLength -> 118, Text -> RT @PMOIndia: The Prime Minister is chairing a high   level meeting on the situation in Kashmir,    UserFollowersRatio -> 0.32116788625717163, UserFavouritesCount -> 67,   UserFriendsCount -> 137, StatusCreatedAt -> 2016-07-12T21:07:30.000+0530,   UserID -> 749770405867556865),1)

现在我正在尝试打印如下所示的每个密钥对:

for ((k,v) <- topCounts60) printf("key: %s, value: %s\n", k, v)

我得到以下异常:

Error:(125, 10) constructor cannot be instantiated to expected type;
found   : (T1, T2)
required:     org.apache.spark.rdd.RDD[(scala.collection.immutable.Map[String,Any], Int)]
for ((k,v) <- topCounts60) printf("key: %s, value: %s\n", k, v)

如何获得如下输出:

UserLang -> en,

UserName -> Harmeet Singh

我是scala的初学者,不知道如何打印所有值seperatley,请帮我解决这个问题。

2 个答案:

答案 0 :(得分:0)

使用foreach和字符串插值:

rdd.collect().foreach { case (k,v) => println(s"key: $s, value: $v")`}

答案 1 :(得分:0)

尝试

topCounts60.foreachRDD {
    rdd => for ((k,v) <- rdd.collect) printf("key: %s, value: %s\n", k, v)
}