我无法以正确的方式更换pandas中的字符串。我不确定我是否仅限于使用熊猫,并且单独使用熊猫可能无法做到这一点。
这就是我的数据框的外观:
(ID: 10) 247333605 0.0
(ID: 20) 36738870 0.0
(ID: 40) 4668036427 0.0
(ID: 50) 1918647972 0.0
(ID: 60) 4323165902 44125.0
(ID: 80) 145512255 0.0
Assigned (ID: 30) 42050340 0.0
Assigned (ID: 40) 130880371376 0.0
Assigning (ID: 30) 1095844753 0.0
Cancelled (ID: 40) 937280 0.0
Cancelled (ID: 80) 16857720813 0.0
Planned (ID: 20) 9060392597 0.0
Planning (ID: 10) 108484297031 0.0
Processed (ID: 70) 133289880880 0.0
Revoked (ID: 50) 2411903072 0.0
Writing (ID: 50) 146408550024 0.0
Written (ID: 60) 139458227923 1018230.0
对于每个(ID:x),它应与具有正确ID的已分配(ID:x),已取消(ID:x)等匹配。
使用与此行类似的行:
input_data['last_status'] = input_data.last_status.str.replace('(ID: 10)', 'Planning (ID: 10)')
我的输出是:
(Assigned (ID: 40)) 0.0
(Cancelled (ID: 80)) 0.0
(Planned (ID: 20)) 0.0
(Planning (ID: 10)) 0.0
(Writing (ID: 50)) 0.0
(Written (ID: 60)) 44125.0
Assigned (Assigned (ID: 40)) 0.0
Assigned (ID: 30) 0.0
Assigning (ID: 30) 0.0
Cancelled (Assigned (ID: 40)) 0.0
Cancelled (Cancelled (ID: 80)) 0.0
Planned (Planned (ID: 20)) 0.0
Planning (Planning (ID: 10)) 0.0
Processed (ID: 70) 0.0
Revoked (Writing (ID: 50)) 0.0
Writing (Writing (ID: 50)) 0.0
Written (Written (ID: 60)) 1018230.0
如您所见,所有(ID:x)都被替换,但仍然与正确的术语不匹配。
我理想的数据框如下所示:
Assigned (ID: 30) 42050340 0.0
Assigned (ID: 40) 130880371376 0.0
Assigning (ID: 30) 1095844753 0.0
Cancelled (ID: 40) 937280 0.0
Cancelled (ID: 80) 16857720813 0.0
Planned (ID: 20) 9060392597 0.0
Planning (ID: 10) 108484297031 0.0
Processed (ID: 70) 133289880880 0.0
Revoked (ID: 50) 2411903072 0.0
Writing (ID: 50) 146408550024 0.0
Written (ID: 60) 139458227923 1018230.0
我一定会使用pandas,因为数据集很大,我有不同的实现,但它们花了我几天的时间来运行。有没有办法在熊猫中做到这一点?
我之前从未在stackoverflow上问过什么。我希望我的问题很明确。
答案 0 :(得分:1)
如果您想要概括,可以将str.replace
用于SOL / EOL锚点。
df['last_status'].str.replace(r'^(\(ID: \d+\))$', r'Planning: \1')
0 Planning: (ID: 10)
1 Planning: (ID: 20)
2 Planning: (ID: 40)
3 Planning: (ID: 50)
4 Planning: (ID: 60)
5 Planning: (ID: 80)
6 Assigned (ID: 30)
7 Assigned (ID: 40)
8 Assigning (ID: 30)
9 Cancelled (ID: 40)
10 Cancelled (ID: 80)
11 Planned (ID: 20)
12 Planning (ID: 10)
13 Processed (ID: 70)
14 Revoked (ID: 50)
15 Writing (ID: 50)
16 Written (ID: 60)
Name: last_status, dtype: object
如果您只想替换特定ID,请将正则表达式更改为 -
r'^(\(ID: 10\))$'
或者,
r'^(\(ID: {}\))$'.format(number)
其中number
是一个变量,用于保存ID值以执行替换。