想象一下,我有一个三重奏:
function compareUsrvsCom(){
var response = "";
if (number1 === compChoice1) response += "pico ";
else if (number1 === compChoice2 || number1 === compChoice3) response += "fermi ";
if (number2 === compChoice2) response += "pico ";
else if (number2 === compChoice1 || number2 === compChoice3) response += "fermi ";
if (number3 === compChoice3) response += "pico ";
else if (number3 === compChoice1 || number3 === compChoice2) response += "fermi ";
if (number1 === compChoice1 && number2 === compChoice2 && number3 === compChoice3) response += "You win";
else if (response == "" ) return ("beagls ")
return response;
}
如何通过前两个元素有效地对它们进行分组并按第三个元素排序?比如说:
val RecordRDD : RDD[Int, String, Int] = {
(5 , "x1", 100),
(3 , "x2", 200),
(3 , "x4", 300),
(5 , "x1", 150),
(3 , "x2", 160),
(5 , "x1", 400)
}
我正在寻找一种有效的方法。
我应该将其设为DataFrame并使用GroupBy(Col1,Col2)和SortBy(Col3)吗?
这会比Spark RDD的groupBy更有效吗?
AggregateByKey可以同时聚合2个键吗?
*你可以假设这个RDD非常大!提前致谢。
答案 0 :(得分:2)
您没有提到您正在运行的Spark版本,但使用RDD执行此操作的一种方法是这样的:
val result = RecordRDD
.map{case(x, y, z) => ((x,y), List(z))}
.reduceByKey(_++_)
.map{case(key, list) => (key._1, Map((key._2 -> list.sorted)))}
.reduceByKey(_++_)
我不知道它是 最有效的方式,但它非常有效;)