Question

我有以下数据框，其中col_1是整数类型：

print(df)

col_1 
100
200
00153
00164

如果位数等于3，我想添加两个零。

final_col
00100
00200
00153
00164

我尝试过：

df.col_1 = df.col_1.astype(int).astype(str)

df["final_col"] = np.where(len(df["col_1"]) == 3, "00" + df.col_1, df.col_1 )

但是它不会产生预期的输出（满足条件时不会将两位数字相加）。

我该如何解决？

Answer 1

使用str.zfill：

df['final_col'] = df['col_1'].astype(str).str.zfill(5)

[出]

   final_col
0      00100
1      00200
2      00153
3      00164

更新，如果您只想将len精确地填充到3，请使用Series.where 感谢@yatu指出：

df.col_1.where(df.col_1.str.len().ne(3),
               df.col_1.astype(str).str.zfill(5))

Answer 2

使用series.str.pad()的另一种方法：

df.col_1.astype(str).str.pad(5,fillchar='0')

您的解决方案应更新为：

(np.where(df["col_1"].astype(str).str.len()==3, 
       "00" + df["col_1"].astype(str),df["col_1"].astype(str)))

但是当字符串的长度小于5并且不等于3时，这将不起作用，因此，我建议您不要使用它。

Answer 3

# after converting it to str , you can foolow up list comprehension.

df=pd.DataFrame({'col':['100','200','00153','00164']})
df['col_up']=['00'+x if len(x)==3 else x for x in df.col ]
df

###output

    col    col_up
0   100     00100
1   200     00200
2   00153   00153
3   00164   00164


    ### based on the responses in comments 
  %%timeit -n 10000
 df.col.str.pad(5,fillchar='0') 
142 µs ± 5.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


     %%timeit -n 10000
 ['00'+x if len(x)==3 else x for x in df.col ]
21.1 µs ± 952 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

  %%timeit -n 10000
  df.col.astype(str).str.pad(5,fillchar='0')
243 µs ± 7.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Answer 4

由于col列的数据类型为str，因此您可以使用.str提取字符串，并使用.pad()将字符串填充为宽度0 = 5的字符串{{ 1}}。

检查documnetation

.pad(5, fillchar='0')

IN[1]:  df = pd.DataFrame({'col':['100','200','00153','00164']})
        df

In[2]:  df['final_col'] = df.col.astype(str).str.pad(5, fillchar='0')
        df

此外，您可以转换数据类型-，如果列的数据类型不是字符串-，则使用Out[2]: col final_col 0 100 00100 1 200 00200 2 00153 00153 3 00164 00164将其转换为字符串，然后在其上使用.astype(dtype)

零填充熊猫列

4 个答案: