如何解决:“ ValueError:输入包含NaN,无穷大或对于dtype('float32')而言太大的值。”?

时间:2018-12-29 11:57:18

标签: python-2.7

我正在使用anaconda导航器。我的数据集包含空字段。我尝试将其删除,但仍然存在错误:“ ValueError:输入包含NaN,无穷大或对于dtype('float32')而言太大的值。”

from sklearn.preprocessing import Imputer
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
from sklearn.preprocessing import LabelEncoder
X = pd.read_csv("f.csv")
y= pd.read_csv("target.csv")
print (X.head())
print(X.columns)
print(X[u'screen_name'])

le=LabelEncoder()

for col in X.columns.values:

    if X[col].values.any()=='nan':
        X[col].values=0;
    if X[col].dtypes=='object':
        # data=X[col]
        #X.shape
        #le.fit(X[col])
        print("current column is ")
        print(col)
        print(X[col])
        X[col]=le.fit_transform(X[col])
        print("after tranformation")
        print(X[col]) 
mean_imputer = Imputer(missing_values='NaN', strategy='mean', axis=0)
mean_imputer = mean_imputer.fit(X)
imputed_df = mean_imputer.transform(X)
clf = RandomForestClassifier(n_estimators=10, max_depth=6, n_jobs=1, verbose=2)
model = clf.fit(X, y)

1 个答案:

答案 0 :(得分:0)

问题可能是您的Imputer试图用字符串'NaN'替换值。实际的NaN仍将存在于数据中,从而导致模型在拟合期间抱怨。相反,尝试

mean_imputer = Imputer(missing_values=np.nan, strategy='mean', axis=0)