以下是两个数据文件:
spark16/file1.txt
1,9,5
2,7,4
3,8,3
spark16/file2.txt
1,g,h
2,i,j
3,k,l
加入后,我有:
(1, ((9,5),(g,h)) )
(2, ((7,4),(i,j)) )
(3, ((8,3),(k,l)) )
我需要得到
(5,4,3)= 12
的总和
我卡在这里:
val file1 = sc.textFile(“data96/file1.txt”).map(x=>(x.split(",")(0).toInt, (x.split(",")(1), x.split(",")(2).toInt)))
val file2 = sc.textFile(“data96/file2.txt”).map(x=>(x.split(",")(0).toInt, (x.split(",")(1), x.split(",")(2))))
val joined = file1.join(file2)
val sorted = joined.sortByKey()
val first = sorted.first
res4: (Int, ((String, Int), (String, String))) = (1,((9,5),(g,h)))
scala> joined.reduce(_._2._1._2 + _._2._1.2)
:34: error: type mismatch;
found : Int
required: (Int, ((String, Int), (String, String)))
joined.reduce(._2._1._2 + _._2._1._2)
如何在 _._ 2._1._2 上获得总和?
非常感谢。
答案 0 :(得分:0)
如果你在join
之后得到了这个
那么
(1, ((9,5),(g,h)) )
(2, ((7,4),(i,j)) )
(3, ((8,3),(k,l)) )
然后选择您需要的列并执行reduce
joined.map(_._2._1._2).reduce(_ + _)
这应该为5, 4, 3
提供12
Reduce
必须与dataType
passed
希望这有帮助!