我想问一下,如何计算重复值?
格式:USER,ITEM,EVENT
我想算一下,每个项目 显示的次数。
以下是一些例子:
US50137,IT1548,7), (US42215,IT6298,7), (US98606,IT5305,7), (US34696,IT5914,7), (US74972,IT2796,7), (US1729,IT7696,7), (US76310,IT9790,7), (US49102,IT6487,7), (US25430,IT7901,7), (US50600,IT4156,7), (US65972,IT9830,7), (US50879,IT1902,7), (US36024,IT6484,7), (US46284,IT3281,7), (US55565,IT5303,7), (US18932,IT2025,7), (US39467,IT8677,7), (US12477,IT9678,7), (US94819,IT8427,7), (US19956,IT1402,7), (US41507,IT3624,7), (US845,IT4823,7), (US18860,IT7860,7), (US68784,IT4759,7), (US79752,IT421,7), (US18563,IT5329,7), (US79628,IT2351,7), (US83729,IT6082,7), (US61097,IT9643,7), (US69368,IT3162,7), (US59566,IT814,7), (US9726,IT7519,7), (US1157,IT5908,7), (US1176,IT3981,7), (US79409,IT8578,7), (US11786,IT5147,7), (US88604,IT8501,7), (US6857,IT2333,7), (US82349,IT6143,7), (US27666,IT9085,7), (US90508,IT352,7), (US48578,IT4503,7), (US14526,IT9551,7), (US29031,IT1992,7), (US57012,IT4353,7), (US97235,IT77,7), (US88666,IT2715,7), (US31035,IT7865,7), (US45054,IT6664,7), (US92069,IT9951,7), (US27175,IT913,7), (US60402,IT8480,7), (US28426,IT9309,7), (US23641,IT4518,7), (US10889,IT7348,7), (US16950,IT6087,7), (US68766,IT683,7), (US87726,IT7594,7), (US63638,IT8101,7), (US78079,IT4344,7), (US47257,IT3315,7), (US3915,IT8971,7), (US59440,IT3441,7), (US64466,IT3980,7), (US79624,IT3502,7), (US29356,IT6778,7)
从这个链接:
Scala how can I count the number of occurrences in a list
我的代码:
val RATING_SPLITER = N1.map(
{
baris => (
baris(0),
baris(1),
baris(2) match {
case "read" => 10
case "play" => 6
case "share" => 7
}
)
}
).take(1000)
val MM = RATING_SPLITER.groupBy(kk => kk._2).map(x1 => (x1._2))
MM.foreach(println)
然后,输出如下:
[Lscala.Tuple3;@fd53053
[Lscala.Tuple3;@4527f70a
[Lscala.Tuple3;@707b1a44
[Lscala.Tuple3;@7132a9dc
[Lscala.Tuple3;@57435801
[Lscala.Tuple3;@2da66a44
[Lscala.Tuple3;@527fc8e
[Lscala.Tuple3;@61bfc9bf
[Lscala.Tuple3;@2c7106d9
[Lscala.Tuple3;@329bad59
任何想法,为什么输出看起来像那样?我的代码是否正确计算重复值?
答案 0 :(得分:2)
您应该将groupBy
生成的值映射到尺寸 - groupBy
创建键值对,其中值是具有相同键的所有项的集合,你只对该系列的大小感兴趣:
// sample data:
val RATING_SPLITER = List(("A", "b", 4), ("A", "b", 5), ("A", "c", 6), ("A", "e", 7))
val result: Map[String,Int] = RATING_SPLITER.groupBy(_._2).mapValues(_.size)
result.foreach(println)
// prints:
// (e,1)
// (b,2)
// (c,1)