Pandas根据当前值更新列

时间:2016-03-17 14:49:43

标签: python-2.7 numpy pandas

我试图为以数字格式提供的zipcode添加一些零。我认为这样可行(它在.str.startswith()之前的类似情况下有效。有什么建议吗?

data['loczipstr'] = data['loczip'].astype(str)
data['loczipstr'] = np.where(len(data['loczipstr']) == 3, "0000" +data['loczipstr'], data['loczipstr'])
data['loczipstr'] = np.where(len(data['loczipstr']) == 4, "000" + data['loczipstr'], data['loczipstr'])
data['loczipstr'] = np.where(len(data['loczipstr']) == 5, "00" + data['loczipstr'], data['loczipstr'])
data['loczipstr'] = np.where(len(data['loczipstr']) == 6, "0" + data['loczipstr'], data['loczipstr'])

这些行已执行,但根本不会更改data['loczipstr']

注意:长度范围从3到6,因为四位数的邮政编码看起来像1023.0,因此字符长度为6

2 个答案:

答案 0 :(得分:3)

将df转换为str,然后在str.zfill长度7上使用向量化max

In [76]:
df['loczipstr'] = df['loczip'].astype(str).str.zfill(7)
df

Out[76]:
     loczip loczipstr
0       111   0000111
1     11111   0011111
2    111111   0111111
3   1111111   1111111
4  11111111  11111111

答案 1 :(得分:1)

您可以使用loc并使用str.len创建mask

print data
     loczip
0       111
1     11111
2    111111
3   1111111
4  11111111

data['loczipstr'] = data['loczip'].astype(str)

data.loc[data['loczipstr'].str.len() == 3, 'loczipstr'] = "0000" + data['loczipstr']
data.loc[data['loczipstr'].str.len() == 4, 'loczipstr'] = "000"  + data['loczipstr']
data.loc[data['loczipstr'].str.len() == 5, 'loczipstr'] = "00"   + data['loczipstr']
data.loc[data['loczipstr'].str.len() == 6, 'loczipstr'] = "0"    + data['loczipstr']
print data
     loczip loczipstr
0       111   0000111
1     11111   0011111
2    111111   0111111
3   1111111   1111111
4  11111111  11111111