如何避免pandas中的SettingWithCopyWarning?

时间:2016-12-10 13:55:45

标签: pandas python-3.5

我想使用pandas将列的类型转换为int。这是源代码:

# CustomerID is missing on several rows. Drop these rows and encode customer IDs as Integers.
cleaned_data = retail_data.loc[pd.isnull(retail_data.CustomerID) == False]
cleaned_data['CustomerID'] = cleaned_data.CustomerID.astype(int)

这引发了以下警告:

  

SettingWithCopyWarning:尝试在a的副本上设置值   来自DataFrame的切片

如何避免此警告?有没有更好的方法将CustomerID的类型转换为int?我在python 3.5上。

1 个答案:

答案 0 :(得分:2)

在一个loc中使用它:

retail_data.loc[~retail_data.CustomerID.isnull(),'CustomerID'] = retail_data.loc[~retail_data.CustomerID.isnull(),'CustomerID'].astype(int)

示例:

import pandas as pd
import numpy as np

retail_data = pd.DataFrame(np.random.rand(4,1)*10, columns=['CustomerID'])
retail_data.iloc[2,0] = np.nan
print(retail_data)

   CustomerID
0    9.872067
1    5.645863
2         NaN
3    9.008643

retail_data.loc[~retail_data.CustomerID.isnull(),'CustomerID'] = retail_data.loc[~retail_data.CustomerID.isnull(),'CustomerID'].astype(int)

       CustomerID
0         9.0
1         5.0
2         NaN
3         9.0

您会注意到列的dtype仍然是浮点数,因为np.nan无法在int列中进行编码。

如果您确实要删除这些行而不更改基础retail_data,请创建一个实际的copy()

cleaned_data = retail_data.loc[~retail_data.CustomerID.isnull()].copy()