我有一个RDD输入,其格式如下:
from Tkinter import *
def login_check():
name = entry_name.get()
password = entry_password.get()
if name == "Marvin":
if password == "123":
print("You are now logged in")
else:
print("wrong password.")
else:
print("This username does not exist.")
root = Tk()
label_name = Label(root, text="Name")
label_password = Label(root, text="Password")
entry_name = Entry(root)
entry_password = Entry(root)
label_name.grid(row=0, sticky=E)
label_password.grid(row=1, sticky=E)
entry_name.grid(row=0, column=1)
entry_password.grid(row=1, column=1)
login_check()
root.mainloop()
“ 2002”是关键。因此,我具有以下键值对:
[('2002', ['cougar', 1]),
('2002', ['the', 10]),
('2002', ['network', 4]),
('2002', ['is', 1]),
('2002', ['database', 13])]
Count是整数,我想使用reduceByKey获得以下结果:
('year', ['word', count])
要获得上述的嵌套列表,我非常费劲。 主要问题是获取嵌套列表。 例如。我有三个清单a,b和c
[('2002, [['cougar', 1], ['the', 10], ['network', 4], ['is', 1], ['database', 13]]')]
将返回一个
a = ['cougar', 1]
b = ['the', 10]
c = ['network', 4]
a.append(b)
和
['cougar', 1, ['the', 10]]
将x返回为
x = []
x.append(a)
x.append(b)
但是,如果这样
[['cougar', 1], ['the', 10]]
将c返回为
c.append(x)
以上所有操作均达不到我想要的结果。
我想得到
['network', 4, [['cougar', 1], ['the', 10]]]
即嵌套列表应为:
[('2002', [[word1, c1],[word2, c2], [word3, c3], ...]),
('2003'[[w1, count1],[w2, count2], [w3, count3], ...])]
其中a,b,c本身是包含两个元素的列表。
我希望问题清楚,有什么建议吗?
答案 0 :(得分:1)
我提出了一种解决方案:
def wagg(a,b):
if type(a[0]) == list:
if type(b[0]) == list:
a.extend(b)
else:
a.append(b)
w = a
elif type(b[0]) == list:
if type(a[0]) == list:
b.extend(a)
else:
b.append(a)
w = b
else:
w = []
w.append(a)
w.append(b)
return w
rdd2 = rdd1.reduceByKey(lambda a,b: wagg(a,b))
有人有更好的解决方案吗?
答案 1 :(得分:1)
此问题无需使用ReduceByKey。
rdd = sc.parallelize([('2002', ['cougar', 1]),('2002', ['the', 10]),('2002', ['network', 4]),('2002', ['is', 1]),('2002', ['database', 13])])
rdd.collect()
: [('2002', ['cougar', 1]), ('2002', ['the', 10]), ('2002', ['network', 4]), ('2002', ['is', 1]), ('2002', ['database', 13])]
rdd_nested = rdd.groupByKey().mapValues(list)
rdd_nested.collect()
: [('2002', [['cougar', 1], ['the', 10], ['network', 4], ['is', 1], ['database', 13]])]