我想使用pandas将列的类型转换为int。这是源代码:
# CustomerID is missing on several rows. Drop these rows and encode customer IDs as Integers.
cleaned_data = retail_data.loc[pd.isnull(retail_data.CustomerID) == False]
cleaned_data['CustomerID'] = cleaned_data.CustomerID.astype(int)
这引发了以下警告:
SettingWithCopyWarning:尝试在a的副本上设置值 来自DataFrame的切片
如何避免此警告?有没有更好的方法将CustomerID的类型转换为int?我在python 3.5上。
答案 0 :(得分:2)
在一个loc
中使用它:
retail_data.loc[~retail_data.CustomerID.isnull(),'CustomerID'] = retail_data.loc[~retail_data.CustomerID.isnull(),'CustomerID'].astype(int)
示例:
import pandas as pd
import numpy as np
retail_data = pd.DataFrame(np.random.rand(4,1)*10, columns=['CustomerID'])
retail_data.iloc[2,0] = np.nan
print(retail_data)
CustomerID
0 9.872067
1 5.645863
2 NaN
3 9.008643
retail_data.loc[~retail_data.CustomerID.isnull(),'CustomerID'] = retail_data.loc[~retail_data.CustomerID.isnull(),'CustomerID'].astype(int)
CustomerID
0 9.0
1 5.0
2 NaN
3 9.0
您会注意到列的dtype仍然是浮点数,因为np.nan
无法在int
列中进行编码。
如果您确实要删除这些行而不更改基础retail_data,请创建一个实际的copy()
:
cleaned_data = retail_data.loc[~retail_data.CustomerID.isnull()].copy()