包含下过滤器转换,与map和flatMap的工作方式不同

时间:2019-07-17 02:44:12

标签: scala apache-spark

我的数据集sample.txt包含4行数据,各行之间用空格隔开,如下所示

abc is this abc and xyz
this is abc
abc enter code here
xyz

我正在尝试使用Apache Spark Scala API查找数据集中包含字符串“ abc”的行总数(计数)。

代码:

val input=sc.textFile("file:///home/user/sample.txt");
val res0=input.map(_.split("\\n")).filter(_.contains("abc")).count
val res1=input.flatMap(_.split("\\n")).filter(_.contains("abc")).count
val res2=input.map(_.split(" ")).filter(_.contains("abc")).count

我得到的输出如下:

res0= 1
res1= 3
res2= 3

需要澄清两点。

1. why I got res0 as 1 when map function applied and got res1 as 3 when flatMap function applied with splitting based on "\\n"?

2. Why I got res2 as 3 when map function applied with splitting based on " "(Space) ?

0 个答案:

没有答案