Logistic回归:使用过去的数据进行训练并使用当前数据进行预测?

时间:2017-12-29 21:36:50

标签: python scikit-learn logistic-regression

我已经使用现有数据训练并测试了我的逻辑回归,但现在需要输出未来预测。我想包括我在训练和测试集中使用的2017年值来预测2018年的概率。

这是我用来训练和测试我的模型的代码:

Xadj = train.ix[:,('2016 transaction count','critical_CI', 'critical_CN','critical_CS', 
  'critical_FI', 'critical_IN','critical_OI','critical_RA','create_year_2012', 'create_year_2013', 
      'create_year_2014', 'create_year_2015','create_year_2016')]

#Coded is the transformation of 2017 transaction count to a binary variable
y = y=train.ix[:,('2017 transaction count coded')] 

logit_model=sm.Logit(y,Xadj)
result=logit_model.fit()
print(result.summary())

X_train, X_test, y_train, y_test = train_test_split(Xadj, y, test_size=0.3, random_state=42)

from sklearn.linear_model import LogisticRegression
from sklearn import metrics
logreg = LogisticRegression()
logreg.fit(X_train, y_train)

y_pred = logreg.predict(X_test)
print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(logreg.score(X_test, y_test)))

#Cross Validation
from sklearn import model_selection
from sklearn.model_selection import cross_val_score
kfold = model_selection.KFold(n_splits=10, random_state=7)
modelCV = LogisticRegression()
scoring = 'accuracy'
results = model_selection.cross_val_score(modelCV, X_train, y_train, cv=kfold, scoring=scoring)
print("10-fold cross validation average accuracy: %.3f" % (results.mean()))

为了尝试导出2018年的预测,我做了以下工作:

#Create 2018 Purchase Probability
train['2018 Purchase Probability']=pd.DataFrame({'2018 Purchase Probability' : []})

yact=train.ix[:,('2018 Purchase Probability')]
#Adding in 2017 values
X = train.ix[:, ('2017 transaction count','critical_CI', 'critical_CN','critical_CS', 
  'critical_FI', 'critical_IN','critical_OI','critical_RA','create_year_2012', 'create_year_2013', 
      'create_year_2014', 'create_year_2015','create_year_2016','create_year_2017')]

from sklearn.preprocessing import scale, StandardScaler
scaler = StandardScaler()
scaler.fit(Xadj)
X = scaler.transform(Xadj)
X_pred = scaler.transform(X)

from sklearn.linear_model import LogisticRegression
from sklearn import metrics
logreg = LogisticRegression()
logreg.fit(Xadj, y)

#Generate 0/1 prediction
prediction = logreg.predict(X= X)

#Generate odds ratio
precent_prediction = logreg.predict_proba(X= X)

prediction = pd.DataFrame(prediction)

我不确定我是否正确地完成了这项工作,并从我的输出(大多数是1)来判断我不认为我有。我是Python新手,我正在努力将我的测试模型转变为可用于做出决策的未来预测。

提前感谢您的帮助!

0 个答案:

没有答案