我无法从 Sci-kit learn categorical Imputer 中获得所需的结果。
df_cat = df[["Suburb", "Address", "Type", "Method", "SellerG", "Date", "YearBuilt", "CouncilArea", "Regionname"]]
df_cat['Suburb'] = pd.to_numeric(df_cat['Suburb'], errors='coerce')
df_cat['Address'] = pd.to_numeric(df_cat['Address'], errors='coerce')
df_cat['Type'] = pd.to_numeric(df_cat['Type'], errors='coerce')
df_cat['Method'] = pd.to_numeric(df_cat['Method'], errors='coerce')
df_cat['SellerG'] = pd.to_numeric(df_cat['SellerG'], errors='coerce')
df_cat['YearBuilt'] = pd.to_numeric(df_cat['YearBuilt'], errors='coerce')
df_cat['CouncilArea'] = pd.to_numeric(df_cat['CouncilArea'], errors='coerce')
df_cat['Regionname'] = pd.to_numeric(df_cat['Regionname'], errors='coerce')
df_cat['Date'] = pd.to_numeric(df_cat['Date'], errors='coerce')
from sklearn.preprocessing import Imputer
imputer2 = Imputer(strategy="most_frequent")
imputer2.fit(df_cat)
imputer2.statistics_
结果
array([ nan, nan, nan, nan, 1.00e+00, nan,
1.97e+03, nan, nan])
现有值被 NaN 取代。这与我的要求完全相反。