我正在尝试构建DecisionTreeClassifier模型。 当我运行以下命令时,出现ValueError:找到输入样本数量不一致的输入变量:[2,27244]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)
数据结构和形状如下:
myData的形状是(27244,8) 客房int64 价格int64 距离float64 浴室float64 汽车花车64 土地面积float64 建筑面积float64 年制float64 dtype:对象 [0 1 2 ... 27241 27242 27243]
X = [27244行x 7列],[“房间”,“距离”,“浴室”,“汽车”,“尺寸”,“建筑面积”,“年建成”]) y = [27244行x 1列]
feature_cols_2 = ["Rooms","Distance","Bathroom","Car","Landsize","BuildingArea","YearBuilt"]
df = pd.get_dummies(data [feature_cols_2]) preprocessedData = replace_missing_value(df,feature_cols_2)
X = (preprocessedData, feature_cols_2)
y = data[data.columns[1:2]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)
clfr = DecisionTreeClassifier(max_depth = 6) # We limit the depth to 6 levels.
clfr.fit(x_train,y_train)