查找和替换数据框中的字符串

时间:2018-08-28 05:47:15

标签: python dataframe

嗨,我想在数据框值中找到特定术语,然后通过与字典键匹配来替换它们。

数据框:

    Search         term               Application
safe high school   trip                1
spring break       trips               2
gap year           trips               1

我有一个要在字典中替换的单词列表,关键字是要查找的术语,然后替换为值。

{'high school': ['high-school'],
'spring break': ['spring-break'],
'gap year': ['gap-year']}

想法输出:

    Search         term               Application
safe high-school   trip                1
spring-break       trips               2
gap-year           trips               1

我找不到在数据框值中替换部分字符串的方法,所以当我将数据框读取为字符串时

with open('df.csv','r',encoding='UTF-8') as f:
    s = f.read() + '\n'

然后使用str.replace像这样一个接一个地替换它们,它可以完成工作,但是效率很低。

s = str.replace(s, 'gap year', 'gap-year')

如果有一种方法可以用“-”替换特定术语中的空格,那么就不需要使用字典

谢谢

2 个答案:

答案 0 :(得分:0)

您可以将df.replaceregex=True一起使用

例如:

to_replace = {'high school': 'high-school','spring break': 'spring-break','gap year': 'gap-year'}
df["Search term"] = df["Search term"].replace(to_replace, regex=True)
print(df)

输出:

             Search term  Application
0  safe high-school trip            1
1     spring-break trips            2
2         gap-year trips            1

答案 1 :(得分:0)

首先通过删除列表来更改字典,然后使用regex=TrueSeries.replace替换子字符串:

d = {'high school': 'high-school',
     'spring break': 'spring-break',
     'gap year': 'gap-year'}

df['Search term'] = df['Search term'].replace(d, regex=True)

print (df)
             Search term  Application
0  safe high-school trip            1
1     spring-break trips            2
2         gap-year trips            1