我的句子(df.sentence)的df列如下:
sentence
His name is Paul. He's in jail.
Her name is Allison. She's a doctor.
He is named Steve. He's an engineer.
等
当前,我具有如下设置的循环以提取名称:
for i in range(len(df.sentence):
if 'name is' in df['sentence'][i]:
name = re.findall(r'(?<=name is\s)[a-z]+',str(df['sentence'][i]),re.I)
但是,这不起作用。或者我可能只需要帮助来正确设置正则表达式。
已更新(无法正确输出):
for i in range(len(df)):
if '[name is|named]' in df['sentence'][i]:
name = df.sentence.i.str.extract('[name is|named]\s(.*?)(?=\.|\s)')
else:
pass
答案 0 :(得分:1)
在断言后使用后退:
df.str.extract(r'(?<= name is |is named )(\w+)')
输出:
0
0 Paul
1 Allison
2 Steve
答案 1 :(得分:0)
如果此列中的所有行均采用相同格式(如第四个感兴趣的单词),则直接获取索引4。