Question

我试图将数字字符串写到csv中，并稍后将其作为数据帧读回。但是，熊猫在读取时会自动将我的字符串从object类型转换为int64类型。

df = pandas.DataFrame({'col1':['00123','00125']}) 
print(df['col1'].dtype) 
df.to_csv('test.csv',index=False)
new_df = pandas.read_csv('test.csv') 
print(new_df['col1'].dtype)

object #value of first print
int64 #value of second print

我该如何保留写时的dtype或阻止读时的更改？

编辑：我注意到，如果我在df上使用astype('|S')，则new_df将成为对象类型。即使df.dtype不变。在我看来，这似乎并不直观。如果有人可以向我解释这一点，我将不胜感激。

df = pandas.DataFrame({'col1':['00123','00125']}) 
df['col1']=df['col1'].astype('|S')  
print(df['col1'].dtype) 
df.to_csv('test.csv',index=False) 
new_df = pandas.read_csv('test.csv') 
print(new_df['col1'].dtype)

object #value of first print
object #value of second print

Answer 1

我建议将df类型写为excel

df.to_excel('test.xlsx',index=False)

或在读取文件时传递列类型

pd.read_csv('test.csv',dtype = {'col1': object})
Out[346]: 
    col1
0  00123
1  00125

在熊猫中编写和加载数字字符串的cvs时如何防止dtype更改

1 个答案: