我正在使用pandas数据帧,我希望得到2个数据帧df1和df2之间的巧合。
DF1:
+------------+-------+
| features | col2 |
+------------+-------+
| [1.0, 2.0] | 2 |
+------------+-------+
| [1.0, 3.0] | 1 |
+------------+-------+
DF2:
+------------+-------+
| features | col2 |
+------------+-------+
| [1.0, 2.0] | 2 |
+------------+-------+
| [1.0, 4.0] | 5 |
+------------+-------+
列'features'的类型为DenseVector。
两个数据帧的列(名为“features”)都是DenseVector类型。我有下一个代码:
s1 = pandas.merge(df1, df2, how='inner', on=['features'])
我检查了两个数据帧都有一个具有相同值的DenseVector元素。例如:DenseVector([1.0,2.0,3.0])。但是没有巧合被s1捕获。
如果我检查df1的DenseVector元素在df2中的下一个代码,当我得到True时,我得到False:
df1.features[0] in df2.features
如果我应用下一个代码,我会得到True,因为它会比较向量的所有元素:
df1.features[0].all() in df2.features.all()
如何应用内连接以获得重合的矢量?
答案 0 :(得分:0)
我不确定我们是否可以使用列表字段作为合并的键,但下面是您的问题的解决方法:
import pandas as pd
d1 = {'features': [[1.0,2.0], [1.0,3.0]],
'col2': [2,1]}
d2 = {'features': [[1.0,2.0], [1.0,4.0]],
'col2': [2,5]}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
df1['features1'] = df1['features'][0]
df1['features2'] = df1['features'][1]
df2['features1'] = df2['features'][0]
df2['features2'] = df2['features'][1]
mdf = pd.merge(df1,df2,on=['features1', 'features2'], suffixes = ('_df1', '_df2'))
print(mdf)