用两个词Spark Streaming过滤行

时间:2016-04-15 14:41:02

标签: scala apache-spark spark-streaming

有没有办法用一个表达式过滤包含单词“word1”或另一个“word2”的行 类似的东西:

val res = lines.filter(line => line.contains("word1" or "word2"))

因为这个表达式不起作用。

提前谢谢

1 个答案:

答案 0 :(得分:4)

If line is a String optimal choice would regexp:

val pattern = "word1|word2".r

lines.filter(line => pattern.findFirstIn(line).isDefined)

otherwise (other sequence type) you can use Seq.exists:

lines.filter(line => Seq("foo", "bar").exists(s => line.contains(s)))

which takes a single which maps from element to boolean (here (String) ⇒ Boolean) and:

tests whether a predicate holds for at least one element of this iterable collection.