嗨,我想在数据框值中找到特定术语,然后通过与字典键匹配来替换它们。
数据框:
Search term Application
safe high school trip 1
spring break trips 2
gap year trips 1
我有一个要在字典中替换的单词列表,关键字是要查找的术语,然后替换为值。
{'high school': ['high-school'],
'spring break': ['spring-break'],
'gap year': ['gap-year']}
想法输出:
Search term Application
safe high-school trip 1
spring-break trips 2
gap-year trips 1
我找不到在数据框值中替换部分字符串的方法,所以当我将数据框读取为字符串时
with open('df.csv','r',encoding='UTF-8') as f:
s = f.read() + '\n'
然后使用str.replace像这样一个接一个地替换它们,它可以完成工作,但是效率很低。
s = str.replace(s, 'gap year', 'gap-year')
如果有一种方法可以用“-”替换特定术语中的空格,那么就不需要使用字典
谢谢
答案 0 :(得分:0)
您可以将df.replace
与regex=True
一起使用
例如:
to_replace = {'high school': 'high-school','spring break': 'spring-break','gap year': 'gap-year'}
df["Search term"] = df["Search term"].replace(to_replace, regex=True)
print(df)
输出:
Search term Application
0 safe high-school trip 1
1 spring-break trips 2
2 gap-year trips 1
答案 1 :(得分:0)
首先通过删除列表来更改字典,然后使用regex=True
和Series.replace
替换子字符串:
d = {'high school': 'high-school',
'spring break': 'spring-break',
'gap year': 'gap-year'}
df['Search term'] = df['Search term'].replace(d, regex=True)
print (df)
Search term Application
0 safe high-school trip 1
1 spring-break trips 2
2 gap-year trips 1