Question

只是想知道过滤器是否将数据转换为元组？例如

val filesLines = sc.textFile("file.txt")
val split_lines = filesLines.map(_.split(";"))

val filteredData = split_lines.filter(x => x(4)=="Blue")

//如果我们想要映射数据，请从这里使用元组格式即。 x._3 OR x（3）

val blueRecords = filteredData.map(x => x._1, x._2)

OR

val blueRecords = filteredData.map(x => x(0), x(1))

Answer 1

过滤器不会更改RDD - 过滤后的数据仍然是RDD（Array [String]）

Answer 2

不，所有filter都采用谓词函数并使用它，以便集合中的任何数据点在通过该谓词时返回false，然后它们不会传递回结果集。所以，数据相同：

filesLines //RDD[String] (lines of the file)
split_lines //RDD[Array[String]] (lines delimited by semicolon)
filteredData //RDD[Array[String]] (lines delimited by semicolon where the 5th item is Blue

因此，要使用filteredData，您必须使用带有适当索引的括号将数据作为数组访问

在Spark中，过滤器功能是否将数据转换为元组？

2 个答案: