Question

我有一个各种值的列表，我需要用一个值替换（Drive-by）。我做了我的研究，但我能找到的最接近的帖子是下面附带的链接，它没有使用熊猫。实现这一目标的最可行方法是什么？

fourth = pd.read_csv('C:/infocentertracker.csv')
fourth = fourth.rename(columns={'Phone Number: ': 'Phone Number:'})
fourth['Source:'] = fourth['Source:'].replace('......', 'Drive-by')

fourth.to_csv(.............)

Drive By
Drive-By
Drive-by; Return Visitor
Drive/LTX.com/Internes Srch                  Replace all with Drive-by
Driving By/Lantana Website
Drive by
Driving By/Return Visitor
Drive by/Resident Referral
Driving by
Drive- by
Driving by/LTX Website
Driving By
Driving by/Return Visitor
Drive By/Return Visitor
Drive By/LTX Website

Answer 1

您可以使用str.startswith之前的布尔值掩码替换所有以Driv开头的值，并且想法来自comment of Marat：

df.loc[df.col.str.startswith('Driv'), 'col'] = 'Drive-by'

样品：

print (fourth)
                            col
0                      Drive By
1                      Drive-By
2      Drive-by; Return Visitor
3   Drive/LTX.com/Internes Srch
4    Driving By/Lantana Website
5                      Drive by
6     Driving By/Return Visitor
7    Drive by/Resident Referral
8                    Driving by
9                     Drive- by
10       Driving by/LTX Website
11                   Driving By
12    Driving by/Return Visitor
13      Drive By/Return Visitor
14         Drive By/LTX Website
15                          aaa

fourth.loc[fourth['Source:'].str.startswith('Driv'), 'Source:'] = 'Drive-by'
print (fourth)
     Source:
0   Drive-by
1   Drive-by
2   Drive-by
3   Drive-by
4   Drive-by
5   Drive-by
6   Drive-by
7   Drive-by
8   Drive-by
9   Drive-by
10  Drive-by
11  Drive-by
12  Drive-by
13  Drive-by
14  Drive-by
15       aaa

Series.mask的另一个解决方案：

fourth['Source:']=fourth['Source:'].mask(fourth['Source:'].str.startswith('Driv', na=False),
                                       'Drive-by')
print (fourth)
     Source:
0   Drive-by
1   Drive-by
2   Drive-by
3   Drive-by
4   Drive-by
5   Drive-by
6   Drive-by
7   Drive-by
8   Drive-by
9   Drive-by
10  Drive-by
11  Drive-by
12  Drive-by
13  Drive-by
14  Drive-by
15       aaa

Answer 2

当您请求pandas方法时，有以下一个选项：

fourth.ix[fourth['column name with values'].str.contains('driv', case=False, na=False), 'column name with values'] = 'Drive-by'

我更喜欢使用不一定需要pandas的正则表达式：

import re

[re.sub('(Driv.+)', 'Drive-by', i) for i in fourth['column name']]

Answer 3

您可以在Pandas中用单个值替换多个值（列表）

govt_alias = ['govt', 'govern']
df['installer'].str.replace('|'.join(govt_alias), 'government')

在您的特定情况下，其他答案更为理想，但我展示的方法是可概括的。

使用Pandas

3 个答案: