get ValueError:找到样本数量不一致的输入变量:[2,27244]

时间:2019-04-03 20:27:24

标签: python pandas

我正在尝试构建DecisionTreeClassifier模型。 当我运行以下命令时,出现ValueError:找到输入样本数量不一致的输入变量:[2,27244]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

数据结构和形状如下:

myData的形状是(27244,8) 客房int64 价格int64 距离float64 浴室float64 汽车花车64 土地面积float64 建筑面积float64 年制float64 dtype:对象 [0 1 2 ... 27241 27242 27243]

X = [27244行x 7列],[“房间”,“距离”,“浴室”,“汽车”,“尺寸”,“建筑面积”,“年建成”]) y = [27244行x 1列]

feature_cols_2 = ["Rooms","Distance","Bathroom","Car","Landsize","BuildingArea","YearBuilt"]

df = pd.get_dummies(data [feature_cols_2]) preprocessedData = replace_missing_value(df,feature_cols_2)

X = (preprocessedData, feature_cols_2)
y = data[data.columns[1:2]]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

clfr = DecisionTreeClassifier(max_depth = 6)  # We limit the depth to 6 levels.
clfr.fit(x_train,y_train)

0 个答案:

没有答案