我的数据集sample.txt包含4行数据,各行之间用空格隔开,如下所示
abc is this abc and xyz
this is abc
abc enter code here
xyz
我正在尝试使用Apache Spark Scala API查找数据集中包含字符串“ abc”的行总数(计数)。
代码:
val input=sc.textFile("file:///home/user/sample.txt");
val res0=input.map(_.split("\\n")).filter(_.contains("abc")).count
val res1=input.flatMap(_.split("\\n")).filter(_.contains("abc")).count
val res2=input.map(_.split(" ")).filter(_.contains("abc")).count
我得到的输出如下:
res0= 1
res1= 3
res2= 3
需要澄清两点。
1. why I got res0 as 1 when map function applied and got res1 as 3 when flatMap function applied with splitting based on "\\n"?
2. Why I got res2 as 3 when map function applied with splitting based on " "(Space) ?