是火花的新手,所以我需要创建一个只有两个元素的from tkinter import *
from tkinter.ttk import *
root = Tk()
root.title("Playground")
#notebook
_notebook1 = Notebook(root)
_tab1 = Frame(_notebook1)
_notebook1.add(_tab1, text = "Tester", sticky = "e")
#_notebook1.grid(sticky=E+W+N+S)
#_tab1.grid(sticky=E+W+N+S)
_notebook1.pack(fill=BOTH, expand=1)
_startBtn = Button(_tab1, text="start")
_startBtn.grid(sticky=E+W)
root.grid_columnconfigure(0, weight=1)
root.grid_rowconfigure(0, weight=1)
# _tab1.grid_columnconfigure(0, weight=1)
# _tab1.grid_rowconfigure(0, weight=1)
# _notebook1.grid_columnconfigure(0, weight=1)
# _notebook1.grid_rowconfigure(0, weight=1)
root.mainloop()
。
RDD
当我执行groupby键时,输出为Array1 = ((1,1)(1,2)(1,3),(2,1),(2,2),(2,3)
但是我需要输出与键只有2个值对。我不确定如何获得它。
((1,(1,2,3)),(2,(1,2,3))
这些值只能打印一次。应该只有Expected Output = ((1,(1,2)),(1,(1,3)),(1(2,3),(2(1,2)),(2,(1,3)),(2,(2,3)))
而不是(1,2)
或像(2,1)
而不是(2,3)
谢谢
答案 0 :(得分:3)
您可以按以下方式获得所需的结果:
// Prior to doing the `groupBy`, you have an RDD[(Int, Int)], x, containing:
// (1,1),(1,2),(1,3),(2,1),(2,2),(2,3)
//
// Can simply map values as below. Result is a RDD[(Int, (Int, Int))].
val x: RDD[(Int, Int)] = sc.parallelize(Seq((1,1),(1,2),(1,3),(2,1),(2,2),(2,3))
val y: RDD[(Int, (Int, Int))] = x.map(t => (t._1, t)) // Map first value in pair tuple to the tuple
y.collect // Get result as an array
// res0: Array[(Int, (Int, Int))] = Array((1,(1,1)), (1,(1,2)), (1,(1,3)), (2,(2,1)), (2,(2,2)), (2,(2,3)))
也就是说,结果是一对RDD
,它将键(每对的第一个值)与该对(作为 tuple )相关联。不要使用groupBy
,因为在这种情况下,它不会给您想要的东西。
答案 1 :(得分:0)
如果我正确理解了您的要求,则可以使用groupByKey
和flatMapValues
展平分组值的2-combinations
,如下所示:
val rdd = sc.parallelize(Seq(
(1, 1), (1, 2), (1 ,3), (2, 1), (2, 2), (2, 3)
))
rdd.groupByKey.flatMapValues(_.toList.combinations(2)).
map{ case (k, v) => (k, (v(0), v(1))) }.
collect
// res1: Array[(Int, (Int, Int))] =
// Array((1,(1,2)), (1,(1,3)), (1,(2,3)), (2,(1,2)), (2,(1,3)), (2,(2,3)))