Question

我正在尝试使用apache spark / scala找到最多没有单词的行。我在spark-shell中运行程序。

当我使用以下代码时，我得到了正确的输出：

scala> file1.map(line => line.split(" ").size).reduce((a, b) => if (a > b) a else b)

但是当我尝试使用以下代码收集结果时出现错误：

scala> file1.map(line => line.split(" ").size).reduce((a, b) => if (a > b) a else b).collect()
<console>:30: error: value collect is not a member of Int
              file1.map(line => line.split(" ").size).reduce((a, b) => if (a > b) a else b).collect()

为什么我在使用collect()操作时出错？

Answer 1

reduce是一项操作，可将T类型的一系列值减少为T类型的单个值。

reduce（f：（T，T）⇒T）：T 使用指定的可交换和关联二元运算符减少此RDD的元素。

在reduce之后你得到了最终结果（你还可以collect编辑其他转换。

在您的情况下，指定reduce的值并检查其类型。它是Int。

val result = file1.
  map(line => line.split(" ").size).
  reduce((a, b) => if (a > b) a else b)
// check the type of the value from `reduce`
scala> :type result
Int

reduce与collect非常相似，因为两者都是为您提供值的操作，但是collect会为您提供Array[T] ...

collect（）：Array [T] 返回一个包含此RDD中所有元素的数组。

... reduce只有一个值T。

为什么收集失败的“错误：值收集不是Int的成员”对reduce的结果？

1 个答案: