Question

我的输入文件看起来像：

1：3 5 7

3：6 9

2：5

...

我希望得到两个清单第一个列表由“：”之前的数字组成，第二个列表由“：”之后的数字组成。上例中的两个列表是：

1 3 2

3 5 7 6 9 5

我只是编写如下代码：

 val rdd = sc.textFile("input.txt");

 val s = rdd.map(_.split(":"));

但不知道如何实施以下内容。感谢。

Answer 1

我会使用平面图！所以，

val rdd = sc.textFile("input.txt")
val s = rdd.map(_.split(": ")) # I recommend adding a space after the colon
val before_colon = s.map(x => x(0))
val after_colon = s.flatMap(x => x(1).split(" "))

现在您有两个RDD，一个包含冒号前的项目，另一个包含冒号后面的项目！

如果冒号前的文本部分可能有多个数字，例如1 2 3：4 5 6，我会写val before_colon = s.flatMap(x => x(0).split(" "))

如何在spark scala中使用split？

1 个答案: