Question

我将我的输入读作pandas dataframe并填充NaN：

df = df.fillna(0)

之后，我分成了火车和测试集，并使用sklearn进行分类。

features = df.drop('class',axis=1)
labels = df['class']
features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.3, random_state=42)
clf.fit(features_train, labels_train)

但我仍然有错误

＆＃34; NaN错误＆＃34;：ValueError：输入包含NaN，无穷大或对于dtype来说太大的值（＆＃39; float32＆＃39;）。

似乎fillna()找不到丢失的数据。我怎样才能找到＆＃34; NaN＆＃34;是

Answer 1

df.isnull().sum()

可以显示数据框内是否存在任何NaN

Answer 2

你问

我怎样才能找到＆＃34; NaN＆＃34;是

可视化有问题的数据在框架中的位置会有帮助吗？

您可以尝试matplotlib.pyplot.spy

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# lets make some initial clean data
df = pd.DataFrame(
    data={
        'alpha': [0, 1, 2],
        'beta': [3, 4, 5],
        'gamma': [6, 7, 8]
    },
    index=['one', 'two', 'three']
)
# add some problematic points
# `NaN`s, infinities and stuff that is 
#  just not numeric
df.loc['one', 'beta'] = 'not a number but not NaN'
df.loc['two', 'alpha'] = np.NaN
df.loc['three', 'gamma'] = np.infty

fig, axes = plt.subplots(1, 3)
axes[0].spy(df.isnull())
axes[0].set_title('NaN elements')
axes[1].spy(df == np.infty)
axes[1].set_title('infinite elements')
axes[2].spy(~df.applymap(np.isreal))
axes[2].set_title('Non numeric elements')

调试＆＃34; NaN＆＃34;使用pandas数据帧输入的sklearn错误

2 个答案: