我正在使用sklearn
进行分类任务。我想在“表”中的数据上训练我的模型,并在不同表“ test”中的数据上对其进行测试。这两个表具有相同的确切功能,但行数不同。我有下面的代码,但出现错误:
(<class 'ValueError'>, ValueError('Found input variables with inconsistent numbers of samples: [123, 174]',), <traceback object at 0x0000016476E10C48>).
我在做什么错?
get_train_data = 'select * from train;'
get_test_data = 'select * from test;'
df_train = pd.read_sql_query(get_train_data, con=connection)
df_test = pd.read_sql_query(get_test_data, con=connection)
X = df_train[:, 2:30]
Y = df_test[:, :30]
X_train, X_test, Y_train, Y_test = train_test_split(X, Y)
model.fit(X_train, Y_train)
predictions = model.predict(X_test)
split_mat=confusion_matrix(Y_test, predictions)
答案 0 :(得分:1)
如果要在数据框df_train
上进行训练并在数据框df_test
上进行测试,为什么要使用df_train
的功能和df_test
的目标列并将其传递到train_test_split
函数?
您可以简单地执行以下操作:
get_train_data = 'select * from train;'
get_test_data = 'select * from test;'
df_train = pd.read_sql_query(get_train_data, con=connection)
df_test = pd.read_sql_query(get_test_data, con=connection)
X_train = df_train[:, 2:30]
y_train = df_train.y # assuming y is the name of your target variable in df_train
X_test = df_test[:, i:j] # change i to j with the number that allow you to take the same columns as X_train
y_test = df_test.y # assuming y is the name of your target variable in df_test
model.fit(X_train, y_train)
predictions = model.predict(X_test)
# Do something with predictions, e.g.
mean(predictions == y_test)