Question

我试图从文本文件和跟随它的每个单词创建每个单词的对rdd。

例如，

$(document).height(); // returns height of HTML document $(window).width(); // returns width of browser viewport $(document).width(); // returns width of HTML document

似乎我几乎可以在这里使用zip功能，如果我能够以第二位的偏移量1开始。

我该怎么做，还是有更好的方法？

我仍然不习惯在这里考虑功能编程。

Answer 1

您可以操纵索引，然后加入初始对RDD：

val rdd = sc.parallelize("I'm trying to create a".split(" "))

val el1 = rdd.zipWithIndex().map(l => (-1+l._2, l._1))
val el2 = rdd.zipWithIndex().map(l => (l._2, l._1))

el2.join(el1).map(l => l._2).collect()

哪个输出：

Array[(String, String)] = Array((I'm,trying), (trying,to), (to,create), (create,a))

Spark - 从文本文件和紧随其后的单词创建单词列表

1 个答案: