使用Pandas

时间:2017-02-10 14:24:40

标签: python pandas

我有一个各种值的列表,我需要用一个值替换(Drive-by)。我做了我的研究,但我能找到的最接近的帖子是下面附带的链接,它没有使用熊猫。实现这一目标的最可行方法是什么?

Python replace multiple strings

fourth = pd.read_csv('C:/infocentertracker.csv')
fourth = fourth.rename(columns={'Phone Number: ': 'Phone Number:'})
fourth['Source:'] = fourth['Source:'].replace('......', 'Drive-by')

fourth.to_csv(.............)

Drive By
Drive-By
Drive-by; Return Visitor
Drive/LTX.com/Internes Srch                  Replace all with Drive-by
Driving By/Lantana Website
Drive by
Driving By/Return Visitor
Drive by/Resident Referral
Driving by
Drive- by
Driving by/LTX Website
Driving By
Driving by/Return Visitor
Drive By/Return Visitor
Drive By/LTX Website

3 个答案:

答案 0 :(得分:2)

您可以使用str.startswith之前的布尔值掩码替换所有以Driv开头的值,并且想法来自comment of Marat

df.loc[df.col.str.startswith('Driv'), 'col'] = 'Drive-by'

样品:

print (fourth)
                            col
0                      Drive By
1                      Drive-By
2      Drive-by; Return Visitor
3   Drive/LTX.com/Internes Srch
4    Driving By/Lantana Website
5                      Drive by
6     Driving By/Return Visitor
7    Drive by/Resident Referral
8                    Driving by
9                     Drive- by
10       Driving by/LTX Website
11                   Driving By
12    Driving by/Return Visitor
13      Drive By/Return Visitor
14         Drive By/LTX Website
15                          aaa
fourth.loc[fourth['Source:'].str.startswith('Driv'), 'Source:'] = 'Drive-by'
print (fourth)
     Source:
0   Drive-by
1   Drive-by
2   Drive-by
3   Drive-by
4   Drive-by
5   Drive-by
6   Drive-by
7   Drive-by
8   Drive-by
9   Drive-by
10  Drive-by
11  Drive-by
12  Drive-by
13  Drive-by
14  Drive-by
15       aaa

Series.mask的另一个解决方案:

fourth['Source:']=fourth['Source:'].mask(fourth['Source:'].str.startswith('Driv', na=False),
                                       'Drive-by')
print (fourth)
     Source:
0   Drive-by
1   Drive-by
2   Drive-by
3   Drive-by
4   Drive-by
5   Drive-by
6   Drive-by
7   Drive-by
8   Drive-by
9   Drive-by
10  Drive-by
11  Drive-by
12  Drive-by
13  Drive-by
14  Drive-by
15       aaa

答案 1 :(得分:1)

当您请求pandas方法时,有以下一个选项:

fourth.ix[fourth['column name with values'].str.contains('driv', case=False, na=False), 'column name with values'] = 'Drive-by'

我更喜欢使用不一定需要pandas的正则表达式:

import re

[re.sub('(Driv.+)', 'Drive-by', i) for i in fourth['column name']]

答案 2 :(得分:0)

您可以在Pandas中用单个值替换多个值(列表)

govt_alias = ['govt', 'govern']
df['installer'].str.replace('|'.join(govt_alias), 'government')

在您的特定情况下,其他答案更为理想,但我展示的方法是可概括的。