Question

我是python的新手，我一直在研究这个分类数据集来预测肥料。即使我删除了带有任何nan值的行，也遇到了input contains NaN错误。我真的希望有人能帮助我解决这个问题。谢谢您。

import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
    
features = pd.read_csv('Fertilizer Prediction.csv')
features.head(5)
    
features.dropna(how='any').shape
    
y = features['Name']
X = features.drop(columns=['Name'])
    
for col in X.dtypes[X.dtypes == 'object'].index:
    for_dummy = X.pop(col)
     X = pd.concat([X, pd.get_dummies(for_dummy, prefix=col)], axis=1)
    
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
    
y_train.values.ravel()
X_train.values.ravel()
    
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier
    
model().fit(X_train, y_train)

[这些是错误的屏幕截图][1]

我使用的数据集来自Kaggle，我将在下面链接它： https://www.kaggle.com/gdabhishek/fertilizer-prediction?select=Fertilizer+Prediction.csv

Answer 1

根据dropna的文档，您需要拥有inplace=True才能删除NaN并更改数据框。因此，根据您的代码，您需要替换以下行：

features.dropna(how='any').shape

使用

features.dropna(how='any',inplace=True)

输入包含NaN

1 个答案: