Question

我有一列包含9个字符的数字。我需要对该列中的所有值执行一些操作以达到长度12。这是原始数据：

493    123456789
494    123456789
496    115098765
497    123456789
498    987654321
499    987654321

现在，我需要对数字进行一些修改：

在第一个1之后，需要插入20
在最后5个数字之前，需要插入一个0

所需的解决方案是：

493    120234056789
494    120234056789
496    120150098765
497    120234056789
498    920876054321
499    920876054321

我该怎么做？预先感谢。

Answer 1

使用indexing with str作为切片值：

s = df['col'].astype(str)
df['new'] = s.str[0] + '20' + s.str[1:-5] + '0' + s.str[-5:]
print (df)
           col           new
493  123456789  120234056789
494  123456789  120234056789
496  115098765  120150098765
497  123456789  120234056789
498  987654321  920876054321
499  987654321  920876054321

与apply相似的解决方案：

df['new'] = df['col'].astype(str).apply(lambda x:x[0] + '20' + x[1:-5] + '0' + x[-5:])

@Mark Wang的表现

#6k rows   
df = pd.concat([df] * 1000, ignore_index=True)

In [241]: %%timeit
     ...: s = df['col'].astype(str)
     ...: df['new1'] = s.str[0] + '20' + s.str[1:-5] + '0' + s.str[-5:]
     ...: 
19.5 ms ± 1.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [242]: %%timeit 
     ...: df['new2'] = df['col'].astype(str).apply(lambda x:x[0] + '20' + x[1:-5] + '0' + x[-5:])
     ...: 
11.4 ms ± 120 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

第二个更快，因为熊猫文本功能更慢。原因之一是，它们正确处理了缺失值。

Answer 2

纯正则表达式替代

In [1067]: df[1].astype(str).replace(r'^(\d)(\d+)(\d{5})$', r'\g<1>20\g<2>0\g<3>', regex=True)
Out[1067]: 
0    120234056789
1    120234056789
2    120150098765
3    120234056789
4    920876054321
5    920876054321
Name: 1, dtype: object

修改“熊猫系列”列中的所有值

2 个答案: