我想加入2个pyspark dfs,其中df_template具有我在输出中所需的所有列和行,而df_proc中具有df_template中某些(但不是全部)行/列组合的数据。我正在使用的代码是:
df_blend = df_template.join(df_proc, ["metro_area"],"left").select(df_template["*"])
但是所有返回的都是原始的df_template:
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
|metro_area| option_001| option_002| option_003| option_004| option_005| option_006|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
| A10000501| null| null| null| null| null| null|
| A10000502| null| null| null| null| null| null|
| A10000503| null| null| null| null| null| null|
| A10000504| null| null| null| null| null| null|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
这似乎很基本,但是我无法弄清楚如何获得想要的结果,有什么建议吗?这是我希望输出看起来像...
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
|metro_area| option_001| option_002| option_003| option_004| option_005| option_006|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
| A10000501| 1455| 26| 19| 65| 38| null|
| A10000502| 654| 1876| 1950| 886| null| null|
| A10000503| null| null| null| null| null| null|
| A10000504| 774| 854| 1012| 271| null| null|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
作为参考,以下是原始数据帧。 df_template
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
|metro_area| option_001| option_002| option_003| option_004| option_005| option_006|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
| A10000501| null| null| null| null| null| null|
| A10000502| null| null| null| null| null| null|
| A10000503| null| null| null| null| null| null|
| A10000504| null| null| null| null| null| null|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
df_proc
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
|metro_area| option_001| option_002| option_003| option_004| option_005| option_006|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
| A10000502| 654| 1876| 1950| 886| null| null|
| A10000504| 774| 854| 1012| 271| null| null|
| Al0000501| 1455| 26| 19| 65| 38| null|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+