我需要找到每个RDD的最大键,但是当使用reduce()时,我能得到的是整个Dstream中最大的一个。
例如,在这个流中,我想要的是(2,“b”),(2,“d”),(3,“f”),但我只能得到(3,“f”)按stringr
我怎样才能得到(2,“b”),(2,“d”),(3,“f”)?
library(stringr)
data.frame(Player = word(v1, 1, 2),
Team = sub(',','' ,word(v1, 3)),
Pos = word(v1, 4, 6), stringsAsFactors = FALSE)
# Player Team Pos
#1 João Moutinho Monaco 30, M(C)
#2 Clinton N'Jie Marseille 23, FW
#3 Frederic Sammaritano Dijon 30, AM(LR)
答案 0 :(得分:0)
此:
stream = ssc.queueStream([sc.parallelize([(1,"a"), (2,"b"),(1,"c"),(2,"d"),
(1,"e"),(3,"f")],3)])
创建一个只有一个批处理的流,其中第一个批处理具有(最少)3个分区。我想你想要:
stream = ssc.queueStream([
sc.parallelize([(1,"a"), (2,"b")]),
sc.parallelize([(1,"c"), (2,"d")]),
sc.parallelize([(1,"e"), (3,"f")]),
])
这将给你预期的结果:
stream.reduce(max).pprint()