我有两个数据框,并且两个列中的一个包含一个set。 现在,如果两组都相等,我想基于该列合并数据框。
我尝试将df.merge用作
les=df_report_2.merge(df_report_1,how='inner',on='saltids')
df_report_1和df_report_2是数据框,saltids是由设置的数据类型组成的列。
我收到以下错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-36-c8ca93f2c1e8> in <module>()
----> 1 les=df_report_2.merge(df_report_1,how='inner',on='saltids')
/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
6866 right_on=right_on, left_index=left_index,
6867 right_index=right_index, sort=sort, suffixes=suffixes,
-> 6868 copy=copy, indicator=indicator, validate=validate)
6869
6870 def round(self, decimals=0, *args, **kwargs):
/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
46 copy=copy, indicator=indicator,
47 validate=validate)
---> 48 return op.get_result()
49
50
/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.pyc in get_result(self)
544 self.left, self.right)
545
--> 546 join_index, left_indexer, right_indexer = self._get_join_info()
547
548 ldata, rdata = self.left._data, self.right._data
/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.pyc in _get_join_info(self)
754 else:
755 (left_indexer,
--> 756 right_indexer) = self._get_join_indexers()
757
758 if self.right_index:
/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.pyc in _get_join_indexers(self)
733 self.right_join_keys,
734 sort=self.sort,
--> 735 how=self.how)
736
737 def _get_join_info(self):
/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.pyc in _get_join_indexers(left_keys, right_keys, sort, how, **kwargs)
1128
1129 # get left & right join labels and num. of levels at each location
-> 1130 llab, rlab, shape = map(list, zip(* map(fkeys, left_keys, right_keys)))
1131
1132 # get flat i8 keys from label lists
TypeError: type object argument after * must be an iterable, not itertools.imap
答案 0 :(得分:0)
作为一种解决方法,您可以加入列的哈希值。
由于集不能散列,因此需要转换为冻结集,然后计算散列。
代码看起来像
df_report_1['saltids_hashed'] = df_report_1['saltids'].map(lambda x: hash(frozenset(x)))
df_report_2['saltids_hashed'] = df_report_2['saltids'].map(lambda x: hash(frozenset(x)))
les=df_report_2.merge(df_report_1,how='inner',on='saltids_hashed')
您显然可以在最终输出中删除哈希列。