稀疏(csr)特征矩阵的分层KFold

时间:2015-11-07 22:42:47

标签: python machine-learning scikit-learn sparse-matrix cross-validation

我有一个包含模型特征的大型稀疏矩阵(95000,12000)。我想在python中使用Sklearn.cross_validation模块进行分层K折叠交叉验证。但是,我还没有找到一种在python中索引稀疏矩阵的方法。

无论如何,我可以在稀疏特征矩阵上执行StratifiedKFold吗?

1 个答案:

答案 0 :(得分:0)

试试这个:

# First make sure sparse matrix is to_csr
X_sparse = x.tocsr()
y= output
X_train = {}
Y_train = {}

skf = StratifiedKFold(5, shuffle=True, random_state=12345)
i=0
for train_index, test_index in skf.split(X,y):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train[i], X_test[i] = X[train_index], X[test_index]
    y_train[i], y_test[i] = y[train_index], y[test_index]
    i +=1