合并两个重复的数据框

时间:2020-04-30 12:10:54

标签: python pandas dataframe

我有两个看起来像的数据框,我想根据国家/地区合并它们

df1:

+-------------------+-------------------+--------------+----------+----------+
| Country/Region    | ObservationDate   |  Confirmed   |  Deaths  | Recovered|
+-------------------+-------------------+--------------+----------+----------+
|  Mainland China   |  2020-01-22       |    547       |   17     |  28      | 
|  Indonesia        |  2020-01-22       |    0         |   0      |  0       |
|  Japan            |  2020-01-22       |    2         |   0      |  0       |     
|  Thailand         |  2020-01-22       |    2         |   0      |  0       |    
|  Mainland China   |  2020-01-23       |    639       |   18     |  30      |  
+-----------------+---------------------+--------------+----------+----------+

df2:

+-----------------+-------------------+--------------------+
| Country         |  Region           |  Tropic/Nontropic  |
+-----------------+-------------------+--------------------+
|  Mainland China |  Asia & Pacific   | nontropic          |
|  Indonesia      |  Asia & Pacific   | tropic             |
|  Japan          |  Asia & Pacific   | nontropic          |
|  Thailand       |  Asia & Pacific   | tropic             | 
+-----------------+-------------------+--------------------+

我想要的输出可能看起来像这样:

df__new:

+-------------------+-------------------+--------------+----------+----------+-------------------+--------------------+
| Country/Region    | ObservationDate   |  Confirmed   |  Deaths  | Recovered|  Region           |  Tropic/Nontropic  |
+-------------------+-------------------+--------------+----------+----------+-------------------+--------------------+
|  Mainland China   |  2020-01-22       |    547       |   17     |  28      |  Asia & Pacific   | nontropic          | 
|  Indonesia        |  2020-01-22       |    0         |   0      |  0       |  Asia & Pacific   | tropic             |
|  Japan            |  2020-01-22       |    2         |   0      |  0       |  Asia & Pacific   | nontropic          |     
|  Thailand         |  2020-01-22       |    2         |   0      |  0       |  Asia & Pacific   | tropic             |    
|  Mainland China   |  2020-01-23       |    639       |   18     |  30      |  Asia & Pacific   | nontropic          |  
+-----------------+---------------------+--------------+----------+----------+-------------------+--------------------+

我尝试过:

pd.merge(df_new, df_cat, on=['Country/Region', 'Country'], how='left')

但是它引发了一个错误:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-86-eff12d512209> in <module>
----> 1 pd.merge(df_new, df_cat, on=['Country/Region', 'Country'], how='left')

~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     84         copy=copy,
     85         indicator=indicator,
---> 86         validate=validate,
     87     )
     88     return op.get_result()

~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
    625             self.right_join_keys,
    626             self.join_names,
--> 627         ) = self._get_merge_keys()
    628 
    629         # validate the merge keys dtypes. We may need to coerce

~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py in _get_merge_keys(self)
    981                     if not is_rkey(rk):
    982                         if rk is not None:
--> 983                             right_keys.append(right._get_label_or_level_values(rk))
    984                         else:
    985                             # work-around for merge_asof(right_index=True)

~\anaconda3\lib\site-packages\pandas\core\generic.py in _get_label_or_level_values(self, key, axis)
   1690             values = self.axes[axis].get_level_values(key)._values
   1691         else:
-> 1692             raise KeyError(key)
   1693 
   1694         # Check for duplicates

KeyError: 'Country/Region'

如何在df_new上获得结果?

1 个答案:

答案 0 :(得分:4)

问题在于您要比较的列具有不同的名称,因此您不能只写on=['Country/Region', 'Country'],而必须在每个数据帧中指定列名称。

错误消息KeyError: 'Country/Region'指出它搜索的是表之一中不存在的列。

尝试-

pd.merge(left=df1, right=df2, left_on='Country/Region', right_on='Country', how='left')

查看文档here