您好我正在尝试使用此视频Hello World - 机器学习食谱#1 Google开发人员进行决策树分类。
这是我的代码。
#Import the Pandas library
import pandas as pd
#Load the train and test datasets to create two DataFrames
train_url = "http://s3.amazonaws.com/assets.datacamp.com/course/Kaggle/train.csv" train = pd.read_csv(train_url)
#Print the head of the train and test dataframes
train.head()
test_url = "http://s3.amazonaws.com/assets.datacamp.com/course/Kaggle/test.csv" test = pd.read_csv(test_url)
#Print the head of the train and test dataframes
test.head()
#from sklearn import tree
from sklearn import tree
#find the best feature to predict Survival rate
#define X_features and Y_labels
col_names=['Pclass','Age','SibSp','Parch']
X_features= train[col_names]
#assign survial to label
Y_labels= train.Survived
#create a decision tree classifier
clf=tree.DecisionTreeClassifier()
#fit (find patterns in Data)
clf=clf.fit(X_features, Y_labels)
clf.predict(test[col_names])
ValueError Traceback(最近一次调用last)in()13#Y_train_sparse = Y_labels.to_sparse()14#fit(在数据中查找模式)---> 15 clf = clf.fit(X_features,Y_labels)16#clf.predict(test [col_names])
C:\ Users \用户nitinahu \应用程序数据\本地\连续\ Anaconda3 \ lib中\站点包\ sklearn \树\ tree.py in fit(self,X,y,sample_weight,check_input,X_idx_sorted)152 random_state = check_random_state(self.random_state)153 if check_input: - > 154 X = check_array(X,dtype = DTYPE, accept_sparse =“csc”)155 if issparse(X):156 X.sort_indices()
C:\用户\ nitinahu \应用程序数据\本地\连续\ Anaconda3 \ lib中\站点包\ sklearn \ utils的\ validation.py 在check_array(array,accept_sparse,dtype,order,copy, force_all_finite,ensure_2d,allow_nd,ensure_min_samples, ensure_min_features,warn_on_dtype,estimator)396%(array.ndim, estimator_name))397 if force_all_finite: - > 398 _assert_all_finite(array)399 400 shape_repr = _shape_repr(array.shape)
C:\用户\ nitinahu \应用程序数据\本地\连续\ Anaconda3 \ lib中\站点包\ sklearn \ utils的\ validation.py 在_assert_all_finite(X)52而不是np.isfinite(X).all()):53提高 ValueError(“输入包含NaN,无穷大”--->> 54“或值 大于%r。“%X.dtype)55 56
ValueError:输入包含NaN,无穷大或太大的值 D型( 'FLOAT32')。
答案 0 :(得分:0)
只需检查您在回复中获得的所有值。
一个或两个是超出限定值,这会导致溢出。