我有两个看起来像的数据框,我想根据国家/地区合并它们
df1:
+-------------------+-------------------+--------------+----------+----------+
| Country/Region | ObservationDate | Confirmed | Deaths | Recovered|
+-------------------+-------------------+--------------+----------+----------+
| Mainland China | 2020-01-22 | 547 | 17 | 28 |
| Indonesia | 2020-01-22 | 0 | 0 | 0 |
| Japan | 2020-01-22 | 2 | 0 | 0 |
| Thailand | 2020-01-22 | 2 | 0 | 0 |
| Mainland China | 2020-01-23 | 639 | 18 | 30 |
+-----------------+---------------------+--------------+----------+----------+
df2:
+-----------------+-------------------+--------------------+
| Country | Region | Tropic/Nontropic |
+-----------------+-------------------+--------------------+
| Mainland China | Asia & Pacific | nontropic |
| Indonesia | Asia & Pacific | tropic |
| Japan | Asia & Pacific | nontropic |
| Thailand | Asia & Pacific | tropic |
+-----------------+-------------------+--------------------+
我想要的输出可能看起来像这样:
df__new:
+-------------------+-------------------+--------------+----------+----------+-------------------+--------------------+
| Country/Region | ObservationDate | Confirmed | Deaths | Recovered| Region | Tropic/Nontropic |
+-------------------+-------------------+--------------+----------+----------+-------------------+--------------------+
| Mainland China | 2020-01-22 | 547 | 17 | 28 | Asia & Pacific | nontropic |
| Indonesia | 2020-01-22 | 0 | 0 | 0 | Asia & Pacific | tropic |
| Japan | 2020-01-22 | 2 | 0 | 0 | Asia & Pacific | nontropic |
| Thailand | 2020-01-22 | 2 | 0 | 0 | Asia & Pacific | tropic |
| Mainland China | 2020-01-23 | 639 | 18 | 30 | Asia & Pacific | nontropic |
+-----------------+---------------------+--------------+----------+----------+-------------------+--------------------+
我尝试过:
pd.merge(df_new, df_cat, on=['Country/Region', 'Country'], how='left')
但是它引发了一个错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-86-eff12d512209> in <module>
----> 1 pd.merge(df_new, df_cat, on=['Country/Region', 'Country'], how='left')
~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
84 copy=copy,
85 indicator=indicator,
---> 86 validate=validate,
87 )
88 return op.get_result()
~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
625 self.right_join_keys,
626 self.join_names,
--> 627 ) = self._get_merge_keys()
628
629 # validate the merge keys dtypes. We may need to coerce
~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py in _get_merge_keys(self)
981 if not is_rkey(rk):
982 if rk is not None:
--> 983 right_keys.append(right._get_label_or_level_values(rk))
984 else:
985 # work-around for merge_asof(right_index=True)
~\anaconda3\lib\site-packages\pandas\core\generic.py in _get_label_or_level_values(self, key, axis)
1690 values = self.axes[axis].get_level_values(key)._values
1691 else:
-> 1692 raise KeyError(key)
1693
1694 # Check for duplicates
KeyError: 'Country/Region'
如何在df_new上获得结果?
答案 0 :(得分:4)
问题在于您要比较的列具有不同的名称,因此您不能只写on=['Country/Region', 'Country']
,而必须在每个数据帧中指定列名称。
错误消息KeyError: 'Country/Region'
指出它搜索的是表之一中不存在的列。
尝试-
pd.merge(left=df1, right=df2, left_on='Country/Region', right_on='Country', how='left')
查看文档here