pyspark左外部联接-无法获得混合结果

时间:2019-01-13 21:16:05

标签: join pyspark

我想加入2个pyspark dfs,其中df_template具有我在输出中所需的所有列和行,而df_proc中具有df_template中某些(但不是全部)行/列组合的数据。我正在使用的代码是:

df_blend = df_template.join(df_proc, ["metro_area"],"left").select(df_template["*"])

但是所有返回的都是原始的df_template:

+----------+-----------+-----------+-----------+-----------+-----------+-----------+
|metro_area| option_001| option_002| option_003| option_004| option_005| option_006|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
| A10000501|       null|       null|       null|       null|       null|       null|
| A10000502|       null|       null|       null|       null|       null|       null|
| A10000503|       null|       null|       null|       null|       null|       null|
| A10000504|       null|       null|       null|       null|       null|       null|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+

这似乎很基本,但是我无法弄清楚如何获得想要的结果,有什么建议吗?这是我希望输出看起来像...

+----------+-----------+-----------+-----------+-----------+-----------+-----------+
|metro_area| option_001| option_002| option_003| option_004| option_005| option_006|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
| A10000501|       1455|         26|         19|         65|         38|       null|
| A10000502|        654|       1876|       1950|        886|       null|       null|
| A10000503|       null|       null|       null|       null|       null|       null|
| A10000504|        774|        854|       1012|        271|       null|       null|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+

作为参考,以下是原始数据帧。 df_template

+----------+-----------+-----------+-----------+-----------+-----------+-----------+
|metro_area| option_001| option_002| option_003| option_004| option_005| option_006|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
| A10000501|       null|       null|       null|       null|       null|       null|
| A10000502|       null|       null|       null|       null|       null|       null|
| A10000503|       null|       null|       null|       null|       null|       null|
| A10000504|       null|       null|       null|       null|       null|       null|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+

df_proc

+----------+-----------+-----------+-----------+-----------+-----------+-----------+
|metro_area| option_001| option_002| option_003| option_004| option_005| option_006|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
| A10000502|        654|       1876|       1950|        886|       null|       null|
| A10000504|        774|        854|       1012|        271|       null|       null|
| Al0000501|       1455|         26|         19|         65|         38|       null|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+

0 个答案:

没有答案