只有在匹配完全字符串时,才能替换pandas python中的字符串

时间:2018-01-25 17:11:41

标签: python pandas

我无法以正确的方式更换pandas中的字符串。我不确定我是否仅限于使用熊猫,并且单独使用熊猫可能无法做到这一点。

这就是我的数据框的外观:

 (ID: 10)              247333605                      0.0  
  (ID: 20)               36738870                      0.0  
  (ID: 40)             4668036427                      0.0  
  (ID: 50)             1918647972                      0.0  
  (ID: 60)             4323165902                  44125.0  
  (ID: 80)              145512255                      0.0  
 Assigned (ID: 30)       42050340                      0.0  
 Assigned (ID: 40)   130880371376                      0.0  
 Assigning (ID: 30)    1095844753                      0.0  
 Cancelled (ID: 40)        937280                      0.0  
 Cancelled (ID: 80)   16857720813                      0.0  
 Planned (ID: 20)      9060392597                      0.0  
 Planning (ID: 10)   108484297031                      0.0  
 Processed (ID: 70)  133289880880                      0.0  
 Revoked (ID: 50)      2411903072                      0.0  
 Writing (ID: 50)    146408550024                      0.0  
 Written (ID: 60)    139458227923                1018230.0  

对于每个(ID:x),它应与具有正确ID的已分配(ID:x),已取消(ID:x)等匹配。

使用与此行类似的行:

input_data['last_status'] = input_data.last_status.str.replace('(ID: 10)', 'Planning (ID: 10)')

我的输出是:

(Assigned (ID: 40))                                0.0  
  (Cancelled (ID: 80))                               0.0  
  (Planned (ID: 20))                                 0.0  
  (Planning (ID: 10))                                0.0  
  (Writing (ID: 50))                                 0.0  
  (Written (ID: 60))                             44125.0  
 Assigned (Assigned (ID: 40))                        0.0  
 Assigned (ID: 30)                                   0.0  
 Assigning (ID: 30)                                  0.0  
 Cancelled (Assigned (ID: 40))                       0.0  
 Cancelled (Cancelled (ID: 80))                      0.0  
 Planned (Planned (ID: 20))                          0.0  
 Planning (Planning (ID: 10))                        0.0  
 Processed (ID: 70)                                  0.0  
 Revoked (Writing (ID: 50))                          0.0  
 Writing (Writing (ID: 50))                          0.0  
 Written (Written (ID: 60))                    1018230.0  

如您所见,所有(ID:x)都被替换,但仍然与正确的术语不匹配。

我理想的数据框如下所示:

 Assigned (ID: 30)       42050340                      0.0  
 Assigned (ID: 40)   130880371376                      0.0  
 Assigning (ID: 30)    1095844753                      0.0  
 Cancelled (ID: 40)        937280                      0.0  
 Cancelled (ID: 80)   16857720813                      0.0  
 Planned (ID: 20)      9060392597                      0.0  
 Planning (ID: 10)   108484297031                      0.0  
 Processed (ID: 70)  133289880880                      0.0  
 Revoked (ID: 50)      2411903072                      0.0  
 Writing (ID: 50)    146408550024                      0.0  
 Written (ID: 60)    139458227923                1018230.0 

我一定会使用pandas,因为数据集很大,我有不同的实现,但它们花了我几天的时间来运行。有没有办法在熊猫中做到这一点?

我之前从未在stackoverflow上问过什么。我希望我的问题很明确。

1 个答案:

答案 0 :(得分:1)

如果您想要概括,可以将str.replace用于SOL / EOL锚点。

df['last_status'].str.replace(r'^(\(ID: \d+\))$', r'Planning: \1')

0     Planning: (ID: 10)
1     Planning: (ID: 20)
2     Planning: (ID: 40)
3     Planning: (ID: 50)
4     Planning: (ID: 60)
5     Planning: (ID: 80)
6      Assigned (ID: 30)
7      Assigned (ID: 40)
8     Assigning (ID: 30)
9     Cancelled (ID: 40)
10    Cancelled (ID: 80)
11      Planned (ID: 20)
12     Planning (ID: 10)
13    Processed (ID: 70)
14      Revoked (ID: 50)
15      Writing (ID: 50)
16      Written (ID: 60)
Name: last_status, dtype: object

如果您只想替换特定ID,请将正则表达式更改为 -

r'^(\(ID: 10\))$'

或者,

r'^(\(ID: {}\))$'.format(number)

其中number是一个变量,用于保存ID值以执行替换。