我输入了以下代码行:
# import relevant statistical packages
import numpy as np
import pandas as pd
import statsmodels.api as sm
import pylab as pl
import sklearn.linear_model as skl
import sklearn.metrics as metrics
from sklearn.model_selection import train_test_split
# import data
url = "/<...>/Smarket.csv" # relative url within my computer
Smarket = pd.read_csv(url, index_col = 'SlNo')
X3 = Smarket[['Lag1', 'Lag2', 'Lag3', 'Lag4', 'Lag5', 'Volume']]
Y3 = Smarket['Direction']
X_train, X_test, y_train, y_test = train_test_split(X3, Y3, test_size=0.2016)
data_1 = pd.concat([pd.DataFrame(y_train), X_train], axis = 1)
model_1 = sm.formula.glm(formula = 'y_train~X_train', data = data_1, family= sm.families.Binomial()).fit()
X_new = model_1.predict(X_test)
现在是在我收到以下错误的最后一个代码中:
PatsyError: Number of rows mismatch between data argument and X_train (252 versus 998)
y_train~X_train
^^^^^^^
我只是无法理解为什么出现此错误。我知道这可能是因为X_test和X_train之间的数据数量不匹配。我该如何更改代码以获得预测值?