这是我的数据集
Id. Text
1 Dear Mr. Alpha Terra, your food is delivered
2 Dear Mrs. Betta Irina Viruva, your drink is delivered
我想要的是在Mr,
或Mrs,
之后但,
之前检测单词。所以,我可以得到这个名字,那就是我想要的东西
Id. Text Name
1 Dear Mr. Alpha Terra, your food is delivered Alpha Terra
2 Dear Mrs. Betta Irina Viruva, your drink is delivered Betta Irina Viruva
答案 0 :(得分:2)
试试这个:
In [134]: df.Text.str.split('.',expand=True)[1].str.split(',',expand=True)[0]
Out[134]:
0 Alpha Terra
1 Betta Irina Viruva
Name: 0, dtype: object
答案 1 :(得分:2)
一种选择是使用以下模式进行匹配:
.*Mrs?\.\s+([^,]+).*
这将捕获Mr.
或Mrs.
之后的所有逗号,但不包括以下第一个逗号。
line = "Dear Mrs. Betta Irina Viruva, your drink is delivered"
matches = re.match(r'.*Mrs?\.\s+([^,]+).*', line, re.M|re.I)
if matches:
print "Name: ", matches.group(1)
else:
print "No match!!"
答案 2 :(得分:1)
使用extract
:
df['Name'] = df['Text'].str.extract(r'Mrs?\.\s+(.*?),', expand=False)
print (df)
Id. Text Name
0 1 Dear Mr. Alpha Terra, your food is delivered Alpha Terra
1 2 Dear Mrs. Betta Irina Viruva, your drink is de... Betta Irina Viruva
答案 3 :(得分:1)
当你要求正则表达式时,试试这个:
import pandas
data = [{'ID': 1, 'Text': 'Dear Mr. Alpha Terra, your food is delivered'},
{'ID': 2, 'Text': 'Dear Mrs. Betta Irina Viruva, your drink is delivered'}]
df = pandas.DataFrame(data)
df['Name'] = df.Text.str.extract(r'\.(.*?),')
print(df)
这是一个repl供您试用。