Question

我正在尝试在df列中找到一个子字符串首席。它与split()可以在带有空格的文本上正常工作，但不能与find()一起正常工作。

sum(df['JobTitle'].apply(lambda x :'chief' in x.lower().split() ))
sum(df['JobTitle'].apply(lambda x :  x.lower().find('chief') ==1))

能否请您重点介绍find使用中的问题？

Answer 1

您可以尝试使用re：

import re

# if it appears, add 1, else add 0
sum(df['JobTitle'].apply(lambda x : int(bool(re.findall(r'\bchief\b', x.lower()))))

# add the number of times the word appears
sum(df['JobTitle'].apply(lambda x : len(re.findall(r'\bchief\b', x.lower())))

编辑如果您想捕获chief而不是里面带有首长的单词，例如mischief，请使用r'\bchief\b'

演示：https://regex101.com/r/jYOfM1/1

问题查找子字符串

1 个答案: