我有两个不同的数据帧,没有列匹配。然后我们做一些计算,我希望df1和df2中的相关行在一个新的df(比如说df3)中逐行。
例如...... df1.csv:
Col1,Col2,Col3
1,ABC,"c, perl"
2,DEFn,"python, video"
3,GHI,"web develpoment, java"
和df2.csv:
ColA,ColB,ColC,ColD
1,X,Z,"c, python"
2,Y,Z,"perl, lakes"
我根据df1的Col3列和df2的ColD做了一些工作并得到一些值(让我们调用值VAL)。
我现在想要这样的东西(对于VAL的任何非零值):
df3 :
Col1,Col2,Col3,ColA,ColB,ColC,ColD,VAL
示例代码:
for index1, row1 in df1.iterrows():
#print row1
tags = str(row1['Col3']).lower()
print 'tags : ' + tags
for index2, row2 in df2.iterrows():
#print row2
skills = str(row2['ColD']).lower()
print 'skills : ' + skills
v1 = text_to_vector(tags)
v2 = text_to_vector(skills)
cosine = get_cosine(v1, v2)
print 'Cosine:', cosine
cos['cosine_sim'] = cosine
if cosine!=0.0:
#print row1, row2, cosine
dfp = dfp.append(row1)
dfm = dfm.append(row2)
#df4 = pd.concat([dfp, dfm], axis=1)
df4 = dfp.append(dfm, ignore_index=True)
c = pd.concat([df4, cos], axis=0)
dfresult = dfresult.append(df4, ignore_index=True)
print '#####'
我希望输出如下:
Col1,Col2,Col3,ColA,ColB,ColC,ColD,VAL
1,hacking using c,"c, perl",1,Amit,Kumar,"c, python",0.5
2,hacking using c,"c, perl",2,deepak,Kumar,"perl, lakes",0.5
3,video using python,"python, video",1,Amit,Kumar,"c, python",0.5
感谢任何帮助。