如何删除元素?

时间:2016-05-19 11:19:23

标签: scala apache-spark

我想完全放弃元素

scala> sc.parallelize(List("abc","def","","ge","","wer")).map(x => if(x!="") x).collect
res0: Array[Any] = Array(abc, def, (), ge, (), wer) 

但是我可以看到我无法完全放弃“”。还有()。也行不通:

scala> sc.parallelize(List("abc","def","","ge","","wer")).map(x => if(x!="") x else None).collect
res1: Array[java.io.Serializable] = Array(abc, def, None, ge, None, wer)

scala> sc.parallelize(List("abc","def","","ge","","wer")).map(x => if(x!="") x else Nil).collect
res2: Array[java.io.Serializable] = Array(abc, def, List(), ge, List(), wer)

另一种方法是使用flatMap,因为每个人都可以为每个原始元素返回0到多个元素。但

scala> sc.parallelize(List("abc","def","","ge","","wer")).flatMap(x => if(x!="") x else Nil).collect
res3: Array[Char] = Array(a, b, c, d, e, f, g, e, w, e, r)


scala> sc.parallelize(List("abc","def","","ge","","wer")).flatMap(x => if(x!="") x).collect
<console>:28: error: type mismatch;                                                                                               
 found   : Unit                                                                                                                  
 required: TraversableOnce[?]                                                         sc.parallelize(List("abc","def","","ge","","wer")).flatMap(x => if(x!="") x).collect

如何获得Array("abc","def","ge","wer")

2 个答案:

答案 0 :(得分:3)

flatMap上使用Option的一种方法,

rdd.flatMap( s => if (s.nonEmpty) Some(s) else None ).collect

答案 1 :(得分:1)

sc.parallelize(List("abc","def","","ge","","wer")).filter(!_.isEmpty).collect