熊猫合并功能产生重复错误

时间:2020-06-05 17:25:12

标签: python pandas dataframe merge duplicates

我对熊猫还很陌生,并且在合并两个特定的数据帧时遇到困难。

右表如下:

Right table

左表如下:

Left table

这是我要运行的代码:

with pd.HDFStore(spadl_h5) as spadlstore:
    games = spadlstore["games"].merge(spadlstore["competitions"], 
                                      left_on='competitionId', right_on='wyId')

这是我收到的错误:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-93-98ebcfc55d82> in <module>
      1 with pd.HDFStore(spadl_h5) as spadlstore:
----> 2     games = spadlstore["games"].merge(spadlstore["competitions"], 
      3                                       left_on='competitionId', right_on='wyId')

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py in merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
   7282         from pandas.core.reshape.merge import merge
   7283 
-> 7284         return merge(
   7285             self,
   7286             right,

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/reshape/merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     71     validate=None,
     72 ) -> "DataFrame":
---> 73     op = _MergeOperation(
     74         left,
     75         right,

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/reshape/merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
    625             self.right_join_keys,
    626             self.join_names,
--> 627         ) = self._get_merge_keys()
    628 
    629         # validate the merge keys dtypes. We may need to coerce

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/reshape/merge.py in _get_merge_keys(self)
    994                         right_keys.append(rk)
    995                     if lk is not None:
--> 996                         left_keys.append(left._get_label_or_level_values(lk))
    997                         join_names.append(lk)
    998                     else:

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py in _get_label_or_level_values(self, key, axis)
   1690             values = self.axes[axis].get_level_values(key)._values
   1691         else:
-> 1692             raise KeyError(key)
   1693 
   1694         # Check for duplicates

KeyError: 'competitionId'

'left_on'和'right_on'列均为int64。 我还尝试了合并中“如何”(左/右/外/内)的所有可能版本,但仍然收到相同的错误。

(在我尝试使用Socceraction软件包时,表格均为h5格式)

1 个答案:

答案 0 :(得分:0)

不是根据您的问题左右翻转吗?

也许尝试:

with pd.HDFStore(spadl_h5) as spadlstore:
    games = spadlstore["games"].merge(spadlstore["competitions"], 
                                      left_on='wyId', right_on='competitionId')