Scala-dataframe,将不同的列合并为新行

时间:2017-08-16 22:28:28

标签: scala

我目前正在使用scala并想知道我们是否可以将不同的列合并为一个? 例如,如果我得到:

+------+--------+-------+----------+-----+
| User | family | phone | location | raz |
+------+--------+-------+----------+-----+
| u1   | f1     | p1    | l1       | r1  |
+------+--------+-------+----------+-----+
| u2   | f2     | p2    | l2       | r2  |
+------+--------+-------+----------+-----+
| u3   | f3     | p3    | l3       | r3  |
+------+--------+-------+----------+-----+

如何将手机,位置和raz合并为1列,每个列的值在不同的行上?

| User | family | new   |
+------+--------+-------+
| u1   | f1     | p1    |
+------+--------+-------+
| u1   | f1     | l1    |
+------+--------+-------+
| u1   | f1     | r1    |
+------+--------+-------+
| u2   | f2     | p2    |
+------+--------+-------+
| u2   | f2     | l2    |
+------+--------+-------+
| u2   | f2     | r2    |
+------+--------+-------+
| u3   | f3     | p3    |
+------+--------+-------+
| u3   | f3     | l3    |
+------+--------+-------+
| u3   | f3     | r3    |
+------+--------+-------+

由于

1 个答案:

答案 0 :(得分:0)

一种方法是将这些列展平为array列并explode列:

val df = Seq(
  ("u1", "f1", "p1", "l1", "r1"),
  ("u2", "f2", "p2", "l2", "r2"),
  ("u3", "f3", "p3", "l3", "r3")
).toDF("User", "family", "phone", "location", "raz")

val df2 = df.
  withColumn("plr", array($"phone", $"location", $"raz")).
  withColumn("new", explode($"plr")).
  select("User", "family", "new")

df2.show
+----+------+---+
|User|family|new|
+----+------+---+
|  u1|    f1| p1|
|  u1|    f1| l1|
|  u1|    f1| r1|
|  u2|    f2| p2|
|  u2|    f2| l2|
|  u2|    f2| r2|
|  u3|    f3| p3|
|  u3|    f3| l3|
|  u3|    f3| r3|
+----+------+---+