我是python的新手,我一直在研究这个分类数据集来预测肥料。即使我删除了带有任何nan值的行,也遇到了input contains NaN
错误。我真的希望有人能帮助我解决这个问题。谢谢您。
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
features = pd.read_csv('Fertilizer Prediction.csv')
features.head(5)
features.dropna(how='any').shape
y = features['Name']
X = features.drop(columns=['Name'])
for col in X.dtypes[X.dtypes == 'object'].index:
for_dummy = X.pop(col)
X = pd.concat([X, pd.get_dummies(for_dummy, prefix=col)], axis=1)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
y_train.values.ravel()
X_train.values.ravel()
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier
model().fit(X_train, y_train)
[这些是错误的屏幕截图][1]
我使用的数据集来自Kaggle,我将在下面链接它: https://www.kaggle.com/gdabhishek/fertilizer-prediction?select=Fertilizer+Prediction.csv
答案 0 :(得分:0)
根据dropna
的文档,您需要拥有inplace=True
才能删除NaN
并更改数据框。因此,根据您的代码,您需要替换以下行:
features.dropna(how='any').shape
使用
features.dropna(how='any',inplace=True)