使用其他地图按键减少

时间:2019-04-28 18:41:50

标签: scala dictionary

所以我有一张地图

key:timestamp Val(IP,seconds)
(1421927423,(59.166.0.9,0.011))
(1421927423,(59.166.0.3,0.011))
(1421927423,(59.45.0.2,27.203556))
(1421927423,(59.166.0.8,0.018))
(1421927423,(59.166.0.8,1.256667))
(1421927423,(175.45.176.2,27.203556))
(1421927424,(59.166.0.8,0.018))
(1421927426,(59.166.0.8,0.018))

,然后再找到另一个地图,找出x._2的最大值

(1421927423,(175.45.176.2,27.203556))
(1421927426,(59.166.0.8,0.018))

然后我想根据 地图1,如果按键和最大秒数匹配,则将其添加到新地图

1 个答案:

答案 0 :(得分:0)

所以,我走了一条不同的路...

val file = sc.textFile("UNSW-NB15_1.csv")

val splitfile = file.map(x => x.split(","));

// create map  Key=Start time     Value =(IP, best arrival time)
val keyval = splitfile.map(x => (x(28), (x(0), (if( x(30).toDouble>x(31).toDouble ) {x(30).toDouble} else {x(31).toDouble}) ) )  ) 


// create map //Key=StartTime  Value=best arrival time               for finding max
val newResult = splitfile.map(x => (x(28), (if( x(30).toDouble>x(31).toDouble ) {x(30).toDouble} else {x(31).toDouble}) )  )      
//find max time for each key  Key=StartTime Value=Max time
val findMax = newResult.reduceByKey{case (a,b) => if (a>b){a} else {b} }



//Join 2 key/value pairs Key=StartTime Value=[(IP, ArrivalTime), MaxTime
val data = keyval.join(findMax)
//Compares MaxTime With ArrivalTime for each key and only leaves ones with max
val findIPs = data.filter{ case(x,y) => y._1._2==y._2 }
findIPs.collect.foreach(println)

这真是愚蠢,但是却把我带到了那里...