如何使用scala检查RDD中每个地图的地图值?

时间:2014-11-21 09:29:02

标签: scala collections

我想检查RDD中每个地图的地图值,我的问题是

Let examples:RDD[Map[Int,String]]

examples = 
Map(0 -> sunny, 1 -> hot, 2 -> high, 3 -> FALSE, 4 -> no)
Map(0 -> sunny, 1 -> hot, 2 -> high, 3 -> TRUE, 4 -> no)
Map(0 -> overcast, 1 -> hot, 2 -> high, 3 -> FALSE, 4 -> yes)
Map(0 -> rainy, 1 -> mild, 2 -> high, 3 -> FALSE, 4 -> yes)
Map(0 -> rainy, 1 -> cool, 2 -> normal, 3 -> FALSE, 4 -> yes)

我想检查每个Map.ie的最后一个键值对中的“value”,这里每个Map中的最后一个键值对是4 - >不,4 - >不,4 - >是的,.....我需要检查该键值对中的值,即。不,不,是,是,.....如果一切都是“否”则返回否则返回“是”。

1 个答案:

答案 0 :(得分:0)

val examples = List(
  Map(0 -> "sunny", 1 -> "hot", 2 -> "high", 3 -> "FALSE", 4 -> "no"),
  Map(0 -> "sunny", 1 -> "hot", 2 -> "high", 3 -> "TRUE", 4 -> "no"),
  Map(0 -> "overcast", 1 -> "hot", 2 -> "high", 3 -> "FALSE", 4 -> "yes"),
  Map(0 -> "rainy", 1 -> "mild", 2 -> "high", 3 -> "FALSE", 4 -> "yes"),
  Map(0 -> "rainy", 1 -> "cool", 2 -> "normal", 3 -> "FALSE", 4 -> "yes"))

if (examples.forall(m => m(m.size - 1) == "yes")) 
  "yes"
else
  "no"

但这太可怕了。您选择的收藏品是可疑的。如果你有一个Map,你知道0 .. <some-upper-bound>的密钥没有间隙,那么你有一个索引序列而不是Map,你会发现如果你使用它会更容易操作它一些IndexedSequence(例如ListVector)。

适用于RDD的版本。关于选择收集的评论仍然适用

val conf = new SparkConf().setAppName("spark-scratch").setMaster("local")
val sc= new SparkContext(conf)

val rdd  = sc.parallelize(examples, 1)

val yesno = rdd.map(m=>m(m.size - 1))
               .reduce ((l,r)=> if (l == "yes" && r == "yes") "yes" else "no")