在Java中获取arrayList中重复值的计数而不影响arrayList的顺序

时间:2016-12-29 13:00:21

标签: java arraylist hashmap find-occurrences

我正在尝试获取String ArrayList的重复值的计数,我已完成任务但未完成。我能够获得counts elements的{​​{1}} arrayList,但问题是当order arrayList {{}}} occurrenceselements 1 {} code

这是我的 Map<String, Integer> counts = new HashMap<String, Integer>(); for (String str : t.courseName) { if (counts.containsKey(str)) { counts.put(str, counts.get(str) + 1); } else { counts.put(str, 1); } } for (Map.Entry<String, Integer> entry : counts.entrySet()) { System.out.println(entry.getKey() + " = " + entry.getValue()); }

occurrences

此代码适用于获取order,但请注意此代码会破坏order。我想要的是scala> val records = List( "CHN|2", "CHN|3" , "BNG|2","BNG|65") records: List[String] = List(CHN|2, CHN|3, BNG|2, BNG|65) scala> val recordsRDD = sc.parallelize(records) recordsRDD: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[119] at parallelize at <console>:23 scala> val mapRDD = recordsRDD.map(elem => elem.split("\\|")) mapRDD: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[120] at map at <console>:25 scala> val keyvalueRDD = mapRDD.map(elem => (elem(0),elem(1))) keyvalueRDD: org.apache.spark.rdd.RDD[(String, String)] = MapPartitionsRDD[121] at map at <console>:27 scala> val groupbykeyRDD = keyvalueRDD.groupByKey() groupbykeyRDD: org.apache.spark.rdd.RDD[(String, Iterable[String])] = ShuffledRDD[122] at groupByKey at <console>:29 scala> groupbykeyRDD.mapValues(elem => elem.count).collect <console>:32: error: missing arguments for method count in trait TraversableOnce; follow this method with `_' if you want to treat it as a partially applied function groupbykeyRDD.mapValues(elem => elem.count).collect ^ scala> groupbykeyRDD.map(elem => (elem._1 ,elem._2.count)).collect <console>:32: error: missing arguments for method count in trait TraversableOnce; follow this method with `_' if you want to treat it as a partially applied function groupbykeyRDD.map(elem => (elem._1 ,elem._2.count)).collect 也不应该被摧毁。

1 个答案:

答案 0 :(得分:1)

使用LinkedHashMap代替HashMap来保留广告订单

  

LinkedHashMap是哈希表和链表的组合。它具有可预测的迭代顺序(链接列表),但检索速度是HashMap的检索速度。迭代的顺序由插入顺序决定,因此您将按照它们添加到此Map的顺序返回键/值。