我有两个数据框:
df
:
id string_data
1 My name is Jeff
2 Hello, I am John
3 I like Brad he is cool.
另一个名为allnames
的数据框包含这样的名称列表:
id name
1 Jeff
2 Brad
3 John
4 Emily
5 Ross
我想用df
替换allnames['name']
中出现在"Firstname"
中的所有单词
预期输出:
id string_data
1 My name is Firstname
2 Hello, I am Firstname
3 I like Firstname he is cool.
我尝试过:
nameList = '|'.join(allnames['name'])
df['string_data'].str.replace(nameList, "FirstName", case = False))
但是它替换了几乎99%的单词
答案 0 :(得分:6)
如果在Series.str.replace
中添加单词边界,则您的解决方案应该可以工作:
moment().add({hours:5, minutes: 30}).format('HH:mm')
或通过字典用nameList = '|'.join(r"\b{}\b".format(x) for x in allnames['name'])
df['string_data'] = df['string_data'].str.replace(nameList, "FirstName", case = False)
print (df)
id string_data
0 1 My name is FirstName
1 2 Hello, I am FirstName
2 3 I like FirstName he is cool.
和get
替换值:
join
编辑:您可以通过d = dict.fromkeys(allnames['name'], 'Firstname')
f = lambda x: ' '.join(d.get(y, y) for y in x.split())
df['string_data'] = df['string_data'].apply(f)
print (df)
id string_data
0 1 My name is Firstname
1 2 Hello, I am Firstname
2 3 I like Firstname he is cool.
将所有值转换为小写:
lower