Question

我正在尝试使用map函数将每行的第一个和第三个索引值b检索为arrray。

以下将生成第一个单词的数组：

File.map(l => l.split(" ")(0)).collect()

我尝试过以下但没有运气：

File.map(l => l.split(" ")(0)(2)).collect()

File.map(l => l.split(" ")(0,2)).collect()

File.map(l => l.split(" ")(0)+(2)).collect()

Answer 1

这是你可以做的，你需要从map函数返回元组，如下所示。

File.map(l => (l.split(" ")(0), l.split(" ")(2)))
    .collect()

希望这有帮助！

Answer 2

您可以将其作为模式匹配：

File.
  map {
    _.split(" ").take(3) match {
      case Array(firstWord, _, thirdWord) => (firstWord, thirdWord)
      // Consider handling cases where there are fewer than three words
    }
  }.
  collect()

Answer 3

如果您期待RDD[Array[String]]，那么您可以执行以下操作

File.map(line => line.split(" ")).map(words => Array(words(0), words(2))).collect()

如何在Spark中使用map函数获取第一个和第三个单词

3 个答案: