在Scala

时间:2015-11-05 21:32:16

标签: scala apache-spark spark-dataframe

例如,首先我有一个像这样的数据框

+----+-----+-----+--------------------+-----+
|year| make|model|             comment|blank|
+----+-----+-----+--------------------+-----+
|2012|Tesla|    S|          No comment|     |
|1997| Ford| E350|Go get one now th...|     |
|2015|Chevy| Volt|                null| null|
+----+-----+-----+--------------------+-----+

我们有2012年,1997年和2015年。我们还有另一个像这样的数据框

+----+-----+-----+--------------------+-----+
|year| make|model|             comment|blank|
+----+-----+-----+--------------------+-----+
|2012|BMW  |    3|          No comment|     |
|1997|VW   | GTI |   get              |     |
|2015|MB   | C200|                good| null|
+----+-----+-----+--------------------+-----+

我们也有2012年,1997年,2015年。我们如何将同年的行合并在一起?感谢

输出应该是这样的

+----+-----+-----+--------------------+-----++-----+-----+--------------------------+
|year| make|model|             comment|blank|| make|model|             comment|blank|
+----+-----+-----+--------------------+-----++-----+-----+-----+--------------------+
|2012|Tesla|    S|          No comment|     |BMW   | 3   |          no comment|
|1997| Ford| E350|Go get one now th...|     |VW    |GTI  |      get           |
|2015|Chevy| Volt|                null| null|MB    |C200 |             Good   |null
+----+-----+-----+--------------------+-----++----+-----+-----+---------------------+

1 个答案:

答案 0 :(得分:1)

您可以通过简单的join获得所需的表格。类似的东西:

val joined = df1.join(df2, df1("year") === df2("year"))

我加载了您的输入,以便我看到以下内容:

scala> df1.show
...
year make  model comment
2012 Tesla S     No comment
1997 Ford  E350  Go get one now
2015 Chevy Volt  null

scala> df2.show
...
year make model comment
2012 BMW  3     No comment
1997 VW   GTI   get
2015 MB   C200  good

当我运行join时,我得到:

scala> val joined = df1.join(df2, df1("year") === df2("year"))
joined: org.apache.spark.sql.DataFrame = [year: string, make: string, model: string, comment: string, year: string, make: string, model: string, comment: string]

scala> joined.show
...
year make  model comment        year make model comment
2012 Tesla S     No comment     2012 BMW  3     No comment
2015 Chevy Volt  null           2015 MB   C200  good
1997 Ford  E350  Go get one now 1997 VW   GTI   get

有一点需要注意的是,您的列名称可能不明确,因为它们在数据框架中的命名方式相同(因此您可以更改其名称以使对结果数据框的操作更容易编写)。