我的目的是要找到与列City
中的行匹配的general_text
,但是匹配必须准确。
我尝试使用搜索IN
,但没有得到预期的结果,因此我尝试使用str.contain
,但尝试这样做的方式却显示了一个错误。关于如何正确或有效地进行操作的任何提示?
df['matched'] = df.apply(lambda x: x.City in x.general_text, axis=1)
但是它给了我下面的结果:
data = [['palm springs john smith':'spring'],
['palm springs john smith':'palm springs'],
['palm springs john smith':'smith'],
['hamptons amagansett':'amagansett'],
['hamptons amagansett':'hampton'],
['hamptons amagansett':'gans'],
['edward riverwoods lake':'wood'],
['edward riverwoods lake':'riverwoods']]
df = pd.DataFrame(data, columns = [ 'general_text':'City'])
df['match'] = df.apply(lambda x: x['general_text'].str.contain(
x.['City']), axis = 1)
我想通过上面的代码接收到的内容仅与此匹配:
data = [['palm springs john smith':'palm springs'],
['hamptons amagansett':'amagansett'],
['edward riverwoods lake':'riverwoods']]
答案 0 :(得分:3)
您可以使用单词边界\b\b
进行完全匹配:
import re
f = lambda x: bool(re.search(r'\b{}\b'.format(x['City']), x['general_text']))
或者:
f = lambda x: bool(re.findall(r'\b{}\b'.format(x['City']), x['general_text']))
df['match'] = df.apply(f, axis = 1)
print (df)
general_text City match
0 palm springs john smith spring False
1 palm springs john smith palm springs True
2 palm springs john smith smith True
3 hamptons amagansett amagansett True
4 hamptons amagansett hampton False
5 hamptons amagansett gans False
6 edward riverwoods lake wood False
7 edward riverwoods lake riverwoods True