我将我的输入读作pandas dataframe并填充NaN:
df = df.fillna(0)
之后,我分成了火车和测试集,并使用sklearn进行分类。
features = df.drop('class',axis=1)
labels = df['class']
features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.3, random_state=42)
clf.fit(features_train, labels_train)
但我仍然有错误
" NaN错误":ValueError:输入包含NaN,无穷大或对于dtype来说太大的值(' float32')。
似乎fillna()
找不到丢失的数据。我怎样才能找到" NaN"是
答案 0 :(得分:0)
df.isnull().sum()
可以显示数据框内是否存在任何NaN
答案 1 :(得分:-1)
你问
我怎样才能找到" NaN"是
可视化有问题的数据在框架中的位置会有帮助吗?
您可以尝试matplotlib.pyplot.spy
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# lets make some initial clean data
df = pd.DataFrame(
data={
'alpha': [0, 1, 2],
'beta': [3, 4, 5],
'gamma': [6, 7, 8]
},
index=['one', 'two', 'three']
)
# add some problematic points
# `NaN`s, infinities and stuff that is
# just not numeric
df.loc['one', 'beta'] = 'not a number but not NaN'
df.loc['two', 'alpha'] = np.NaN
df.loc['three', 'gamma'] = np.infty
fig, axes = plt.subplots(1, 3)
axes[0].spy(df.isnull())
axes[0].set_title('NaN elements')
axes[1].spy(df == np.infty)
axes[1].set_title('infinite elements')
axes[2].spy(~df.applymap(np.isreal))
axes[2].set_title('Non numeric elements')