sklearn TimeSeriesSplit cross_val_predict仅适用于分区

时间:2017-01-19 23:50:43

标签: python scikit-learn logistic-regression cross-validation

我正在尝试使用带有LogisticRegression估算器的sklearn版本0.18.1中的TimeSeriesSplit交叉验证策略。我收到一条错误声明:

  

cross_val_predict仅适用于分区

以下代码段显示了如何重现:

from sklearn import linear_model, neighbors
from sklearn.model_selection import train_test_split, cross_val_predict, TimeSeriesSplit, KFold, cross_val_score
import pandas as pd
import numpy as np
from datetime import date, datetime

df = pd.DataFrame(data=np.random.randint(0,10,(100,5)), index=pd.date_range(start=date.today(), periods=100), columns='x1 x2 x3 x4 y'.split())


X, y = df['x1 x2 x3 x4'.split()], df['y']
score = cross_val_score(linear_model.LogisticRegression(fit_intercept=True), X, y, cv=TimeSeriesSplit(n_splits=2))
y_hat = cross_val_predict(linear_model.LogisticRegression(fit_intercept=True), X, y, cv=TimeSeriesSplit(n_splits=2), method='predict_proba')

我做错了什么?

1 个答案:

答案 0 :(得分:5)

有几种方法可以在tier meaning onset_sgroup head_face_mu self_focused 0 expert head_face_mu self_focused 0 expert head_face_mu context_focused 0 expert upper_body_mu self_focused 0 expert upper_body_mu self_focused 0 expert head_face_mu communication_focused 0 novice head_face_mu context_focused 0 novice head_face_mu context_focused 0 novice upper_body_mu self_focused 0 novice upper_body_mu self_focused 0 novice upper_body_mu self_focused 0 novice head_face_mu self_focused 0.18 novice lower_body_mu self_focused 0.667 novice head_face_mu communication_focused 0.69 novice head_face_mu context_focused 1.139 novice head_face_mu context_focused 1.301 novice head_face_mu context_focused 1.32 novice lower_body_mu self_focused 1.66 novice head_face_mu context_focused 1.98 novice lower_body_mu self_focused 2.205 novice head_face_mu communication_focused 2.297 novice head_face_mu context_focused 2.349 novice lower_body_mu self_focused 2.417 novice upper_body_mu self_focused 2.666 novice head_face_mu self_focused 2.675 expert head_face_mu context_focused 3.218 novice head_face_mu context_focused 3.353 novice head_face_mu context_focused 3.436 expert head_face_mu context_focused 3.588 novice head_face_mu context_focused 3.697 novice upper_body_mu self_focused 4.006 novice upper_body_mu context_focused 4.033 novice upper_body_mu self_focused 4.06 expert head_face_mu context_focused 4.33 novice upper_body_mu self_focused 4.332 novice upper_body_mu self_focused 4.44 novice head_face_mu context_focused 4.738 novice lower_body_mu self_focused 5.395 novice head_face_mu self_focused 5.428 novice lower_body_mu self_focused 5.926 novice head_face_mu context_focused 6.283 novice head_face_mu context_focused 7.002 novice head_face_mu self_focused 7.031 novice lower_body_mu self_focused 7.189 novice upper_body_mu communication_focused 7.45 novice lower_body_mu self_focused 7.632 expert 1.144 head_face_mu self_focused 7.739 expert 2.159 lower_body_mu self_focused 8.943 novice 9.517 head_face_mu context_focused 9.002 expert 4.608 中传递cv参数。在这里你必须通过生成器进行拆分。例如

cross_val_score

给出一个发电机。有了这个,您可以生成CV序列和测试索引数组。第一个看起来像这样:

y = range(14)
cv = TimeSeriesSplit(n_splits=2).split(y)

您还可以将数据框作为print cv.next() (array([0, 1, 2, 3, 4, 5, 6, 7]), array([ 8, 9, 10, 11, 12, 13])) 的输入。

split

在你的情况下,这应该有效:

df = pd.DataFrame(data=np.random.randint(0,10,(100,5)), 
                  index=pd.date_range(start=date.today(), 
                  periods=100), columns='x1 x2 x3 x4 y'.split())

cv = TimeSeriesSplit(n_splits=2).split(df)
print cv.next()
    (array([ 0,  1,  2, ..., 31, 32, 33]), array([34, 35, 36, ..., 64, 65, 66]))

有关详细信息,请查看cross_val_scoreTimeSeriesSplit