我需要从数据帧中提取精确匹配及其10个字符之前和之后的字符。
Search Text parts
boy named A boy named Alex.He lives in.... A boy named ale
Jenny Girl named Jennying as Jenny. This girl is really nice long..... nnying as Jenny. This gir
我尝试了以下代码:
part= []
for index, row in df.iterrows():
c=row['text'].lower().split().count(row['Search'].lower())
idx = row['text'].lower().find(row['Search'].lower())
if idx<10:
substr = row['text'][:idx+len(row['Search'])+10]
else:
subs = row['text'][idx-10:idx+len(row['Search'])+10]
part.append(substr)
df['parts'] = part
如果我使用split(),它将为单个单词完全匹配提供正确的结果,但是对于诸如“ boy named”之类的组合单词,其计数为零。