我想完全放弃元素
scala> sc.parallelize(List("abc","def","","ge","","wer")).map(x => if(x!="") x).collect
res0: Array[Any] = Array(abc, def, (), ge, (), wer)
但是我可以看到我无法完全放弃“”。还有()。也行不通:
scala> sc.parallelize(List("abc","def","","ge","","wer")).map(x => if(x!="") x else None).collect
res1: Array[java.io.Serializable] = Array(abc, def, None, ge, None, wer)
或
scala> sc.parallelize(List("abc","def","","ge","","wer")).map(x => if(x!="") x else Nil).collect
res2: Array[java.io.Serializable] = Array(abc, def, List(), ge, List(), wer)
另一种方法是使用flatMap
,因为每个人都可以为每个原始元素返回0到多个元素。但
scala> sc.parallelize(List("abc","def","","ge","","wer")).flatMap(x => if(x!="") x else Nil).collect
res3: Array[Char] = Array(a, b, c, d, e, f, g, e, w, e, r)
scala> sc.parallelize(List("abc","def","","ge","","wer")).flatMap(x => if(x!="") x).collect
<console>:28: error: type mismatch;
found : Unit
required: TraversableOnce[?] sc.parallelize(List("abc","def","","ge","","wer")).flatMap(x => if(x!="") x).collect
如何获得Array("abc","def","ge","wer")
?
答案 0 :(得分:3)
在flatMap
上使用Option
的一种方法,
rdd.flatMap( s => if (s.nonEmpty) Some(s) else None ).collect
答案 1 :(得分:1)
sc.parallelize(List("abc","def","","ge","","wer")).filter(!_.isEmpty).collect