根据包含集合的列合并两个数据框

时间:2019-10-08 20:38:24

标签: python pandas

我有两个数据框,并且两个列中的一个包含一个set。 现在,如果两组都相等,我想基于该列合并数据框。

我尝试将df.merge用作

les=df_report_2.merge(df_report_1,how='inner',on='saltids')

df_report_1和df_report_2是数据框,saltids是由设置的数据类型组成的列。

我收到以下错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-36-c8ca93f2c1e8> in <module>()
----> 1 les=df_report_2.merge(df_report_1,how='inner',on='saltids')

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
   6866                      right_on=right_on, left_index=left_index,
   6867                      right_index=right_index, sort=sort, suffixes=suffixes,
-> 6868                      copy=copy, indicator=indicator, validate=validate)
   6869 
   6870     def round(self, decimals=0, *args, **kwargs):

/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     46                          copy=copy, indicator=indicator,
     47                          validate=validate)
---> 48     return op.get_result()
     49 
     50 

/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.pyc in get_result(self)
    544                 self.left, self.right)
    545 
--> 546         join_index, left_indexer, right_indexer = self._get_join_info()
    547 
    548         ldata, rdata = self.left._data, self.right._data

/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.pyc in _get_join_info(self)
    754         else:
    755             (left_indexer,
--> 756              right_indexer) = self._get_join_indexers()
    757 
    758             if self.right_index:

/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.pyc in _get_join_indexers(self)
    733                                   self.right_join_keys,
    734                                   sort=self.sort,
--> 735                                   how=self.how)
    736 
    737     def _get_join_info(self):

/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.pyc in _get_join_indexers(left_keys, right_keys, sort, how, **kwargs)
   1128 
   1129     # get left & right join labels and num. of levels at each location
-> 1130     llab, rlab, shape = map(list, zip(* map(fkeys, left_keys, right_keys)))
   1131 
   1132     # get flat i8 keys from label lists

TypeError: type object argument after * must be an iterable, not itertools.imap

1 个答案:

答案 0 :(得分:0)

作为一种解决方法,您可以加入列的哈希值。

由于集不能散列,因此需要转换为冻结集,然后计算散列。

代码看起来像

df_report_1['saltids_hashed'] = df_report_1['saltids'].map(lambda x: hash(frozenset(x)))

df_report_2['saltids_hashed'] = df_report_2['saltids'].map(lambda x: hash(frozenset(x)))

les=df_report_2.merge(df_report_1,how='inner',on='saltids_hashed')

您显然可以在最终输出中删除哈希列。