Question

我正在尝试构建DecisionTreeClassifier模型。当我运行以下命令时，出现ValueError：找到输入样本数量不一致的输入变量：[2，27244]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

数据结构和形状如下：

myData的形状是（27244，8）客房int64 价格int64 距离float64 浴室float64 汽车花车64 土地面积float64 建筑面积float64 年制float64 dtype：对象 [0 1 2 ... 27241 27242 27243]

X = [27244行x 7列]，[“房间”，“距离”，“浴室”，“汽车”，“尺寸”，“建筑面积”，“年建成”]） y = [27244行x 1列]

feature_cols_2 = ["Rooms","Distance","Bathroom","Car","Landsize","BuildingArea","YearBuilt"]

df = pd.get_dummies（data [feature_cols_2]） preprocessedData = replace_missing_value（df，feature_cols_2）

X = (preprocessedData, feature_cols_2)
y = data[data.columns[1:2]]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

clfr = DecisionTreeClassifier(max_depth = 6)  # We limit the depth to 6 levels.
clfr.fit(x_train,y_train)

get ValueError：找到样本数量不一致的输入变量：[2，27244]

0 个答案: