我有一个RDD
[playerID, gameID, amount_played]
我想按键对playerID进行分组,每个玩家ID只需最多50个
RDD.aggregateByKey(\
0, # initial value for an accumulator \
lambda r, v: r + v, # function that adds a value to an accumulator \
lambda r1, r2: r1 + r2 # function that merges/combines two accumulators \
).take(1)
答案 0 :(得分:0)
您可以使用按键合并:
def appender(a,b):
a.append(b)
return a
def extender(a, b):
a.extend(b)
return a
recommendRDD.combineByKey(\
lambda movieId: [movieId], #make a list of the initial value \
appender,\ #the appender adds a movie to a pre-created list
extender)\ # combines two pre-created lists
.take(1)
如果您需要限制电影数量,只需向appender
和extender
功能添加逻辑:
def appender(a,b):
a.append(b)
return a[:10]
def extender(a, b):
a.extend(b)
return a[:10]
但你需要小心限制,因为你可能会排除推荐最高的电影。