我的数据框是
data = {
'company_name' : ['auckland suppliers', 'Octagone', 'SodaBottel','Shimla Mirch'],
'year' : [2000, 2001, 2003, 2004],
'desc' : [' auckland has some good reviews','Octagone','we shall update you','we have varities of shimla mirch'],
}
df = pd.DataFrame(data)
我试过这段代码
df['CompanyMatch'] = df ['company_name'] == df ['desc']
我想打印"匹配"如果company_name列的第一个单词与desc列匹配。我很困惑,因为它放在index [0]的位置,以便它以这种方式打印:
> company_name desc CompanyMatch
> auckland suppliers auckland has some good reviews Match
> Octagone Octagone Match
> SodaBottel we shall update you NA
> Shimla Mirch we have varities of shimla mirch Match
答案 0 :(得分:5)
您可以numpy.where
与apply
一起使用in
检查另一列值,axis=1
按行处理:
import numpy as np
m = df.apply(lambda x: x['company_name'].lower() in x['desc'].lower(), axis=1)
df['CompanyMatch'] = np.where(m, 'Match', np.nan)
print (df)
company_name desc year CompanyMatch
0 auckland suppliers auckland has some good reviews 2000 nan
1 Octagone Octagone 2001 Match
2 SodaBottel we shall update you 2003 nan
3 Shimla Mirch we have varities of shimla mirch 2004 Match
编辑:
仅用于比较第一个单词:
m = df.apply(lambda x: x['company_name'].split()[0].lower() in x['desc'].lower(), axis=1)
df['CompanyMatch'] = np.where(m, 'Match', np.nan)
print (df)
company_name desc year CompanyMatch
0 auckland suppliers auckland has some good reviews 2000 Match
1 Octagone Octagone 2001 Match
2 SodaBottel we shall update you 2003 nan
3 Shimla Mirch we have varities of shimla mirch 2004 Match