Question

我试图在float(64)数据帧中的date_of_birth列中用一些零填充drugs_tall值。 date_of_birth包含一些NA。

这是我最初的想法：

drugs_tall.loc[drugs_tall['date_of_birth'].isnull() == False, ['date_of_birth']] = drugs_tall.loc[drugs_tall['date_of_birth'].isnull() == False, ['date_of_birth']].astype('int').astype('str').str.zfill(6)

但是，这会产生错误

AttributeError: 'DataFrame' object has no attribute 'str'

我只是通过（可行）来解决了这个问题：

drugs_tall.loc[drugs_tall['date_of_birth'].isnull() == False, ['date_of_birth']] = drugs_tall.loc[drugs_tall['date_of_birth'].isnull() == False, ['date_of_birth']].astype('int').astype('str')

drugs_tall['date_of_birth'] = drugs_tall['date_of_birth'].str.zfill(6)

请注意，无法直接转到：

drugs_tall['date_of_birth'] = drugs_tall['date_of_birth'].str.zfill(6)

因为这将产生错误：

AttributeError: Can only use .str accessor with string values, which use 
np.object_ dtype in pandas

如果不使用.loc选择，也无法更改数据类型：

drugs_tall['date_of_birth'].astype('int').astype('str')

这将给出：

ValueError: Cannot convert non-finite values (NA or inf) to integer

我是要以一种奇怪的方式解决这个问题还是误解数据帧的工作方式？我知道我的两线解决方案相当简短，但是我不明白是什么使两线解决方案与我最初的想法有所不同。

谢谢

Answer 1

您的列索引器应该是标量'dob'，而不是列表['dob']。这就是为什么您找到一个数据框作为索引操作的输出的原因。这是有道理的：一列列被解释为一个数据框，标量列给出了一个序列。

对于您的任务，您可以将pd.Series.notnull与pd.DataFrame.loc一起使用。如果Pandas将您的值存储为float，则建议使用整数转换。

df = pd.DataFrame({'dob': [np.nan, None, 11585, 52590]})

mask = df['dob'].notnull()
df.loc[mask, 'dob'] = df.loc[mask, 'dob'].astype(int).astype(str).str.zfill(6)

print(df)

      dob
0     NaN
1     NaN
2  011585
3  052590

熊猫在使用.loc过滤的数据帧上使用.str

1 个答案: