我正在尝试获取String ArrayList
的重复值的计数,我已完成任务但未完成。我能够获得counts
elements
的{{1}} arrayList
,但问题是当order
arrayList
{{}}} occurrences
时elements
1 {} code
这是我的 Map<String, Integer> counts = new HashMap<String, Integer>();
for (String str : t.courseName) {
if (counts.containsKey(str)) {
counts.put(str, counts.get(str) + 1);
} else {
counts.put(str, 1);
}
}
for (Map.Entry<String, Integer> entry : counts.entrySet()) {
System.out.println(entry.getKey() + " = " + entry.getValue());
}
:
occurrences
此代码适用于获取order
,但请注意此代码会破坏order
。我想要的是scala> val records = List( "CHN|2", "CHN|3" , "BNG|2","BNG|65")
records: List[String] = List(CHN|2, CHN|3, BNG|2, BNG|65)
scala> val recordsRDD = sc.parallelize(records)
recordsRDD: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[119] at parallelize at <console>:23
scala> val mapRDD = recordsRDD.map(elem => elem.split("\\|"))
mapRDD: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[120] at map at <console>:25
scala> val keyvalueRDD = mapRDD.map(elem => (elem(0),elem(1)))
keyvalueRDD: org.apache.spark.rdd.RDD[(String, String)] = MapPartitionsRDD[121] at map at <console>:27
scala> val groupbykeyRDD = keyvalueRDD.groupByKey()
groupbykeyRDD: org.apache.spark.rdd.RDD[(String, Iterable[String])] = ShuffledRDD[122] at groupByKey at <console>:29
scala> groupbykeyRDD.mapValues(elem => elem.count).collect
<console>:32: error: missing arguments for method count in trait TraversableOnce;
follow this method with `_' if you want to treat it as a partially applied function
groupbykeyRDD.mapValues(elem => elem.count).collect
^
scala> groupbykeyRDD.map(elem => (elem._1 ,elem._2.count)).collect
<console>:32: error: missing arguments for method count in trait TraversableOnce;
follow this method with `_' if you want to treat it as a partially applied function
groupbykeyRDD.map(elem => (elem._1 ,elem._2.count)).collect
也不应该被摧毁。
答案 0 :(得分:1)
使用LinkedHashMap
代替HashMap
来保留广告订单
LinkedHashMap是哈希表和链表的组合。它具有可预测的迭代顺序(链接列表),但检索速度是HashMap的检索速度。迭代的顺序由插入顺序决定,因此您将按照它们添加到此Map的顺序返回键/值。