如何将dataFrame数组转换为单个数据框?

时间:2018-05-01 06:52:17

标签: scala apache-spark dataframe hive

我有array个数据框,名为" dataFrames"看起来像这样:

dataFrames(0)
+----------+--------------------+---------+-------------+
|Periodo   |              frutas|freq     |prods_qty    |
+----------+--------------------+---------+-------------+
|         1|Apple, Watermelon   |        1|            2|
|         1|Banana, StrawBerry  |        2|            2|
+----------+--------------------+---------+-------------+

dataFrames(1)
+----------+--------------------+---------+-------------+
|Periodo   |              frutas|freq     |prods_qty    |
+----------+--------------------+---------+-------------+
|         2|Naranjas, Fresas    |        7|            2|
|         2|Pineapple, Apples   |        9|            2|
+----------+--------------------+---------+-------------+

好吧,我需要像这样得到一个dataframe

+----------+--------------------+---------+-------------+
|Periodo   |              frutas|freq     |prods_qty    |
+----------+--------------------+---------+-------------+
|         1|Apple, Watermelon   |        1|            2|
|         1|Banana, StrawBerry  |        2|            2|
|         2|Naranjas, Fresas    |        7|            2|
|         2|Pineapple, Apples   |        9|            2|
+----------+--------------------+---------+-------------+

对于此示例,数组的长度为1,但数组可以是任何大小。

可以实现这个......或者我需要将数据帧存储到配置单元表中?

提前致谢

1 个答案:

答案 0 :(得分:0)

您可以使用reduceunionAll一个数据框序列或数组放在一起:

val dfs = Array(df1, df2, df3)

val all = dfs.reduce(_ unionAll _)