当我执行命令时:
clf.fit(train_data, train_label)
我收到以下错误
ValueError:输入包含NaN,无穷大或对于dtype来说太大的值(' float32')。
问题是数组train_data
的大小(18000,20)。我试过使用这个命令:
clf.fit(np.float32(train_data), train_label)
或
train_data = np.array([s[0].astype('float32') for s in train_data])
在以下链接中找到列车文件(python)中的数据集train_data和train_label:
https://www.dropbox.com/s/b3017gi18x6x325/train?dl=0
但是,我无法得到数组中的所有值" train_data"对clf.fit
函数有效。有什么帮助吗?
答案 0 :(得分:1)
刚刚找到了解决此错误的解决方案。您需要缩放数据:
代码:
from sklearn.ensemble import RandomForestClassifier
import pickle
import numpy as np
from sklearn.preprocessing import scale
with open('train', 'rb') as f:
train_data, train_label = pickle.load(f)
#some diagnostic to see if there are NaNs. No NaN were found !
print(np.isnan(train_data))
print(np.where(np.isnan(train_data)))
print(np.nan_to_num(train_data))
print(np.isnan(train_label))
print(np.where(np.isnan(train_label)))
#so need to scale
train_data = scale(train_data)
clf = RandomForestClassifier()
clf.fit(train_data, train_label)