在sklearn的多标签中使用k倍拆分时出错

时间:2018-08-23 14:59:13

标签: python scikit-learn cross-validation multilabel-classification

我想进行K折交叉验证。 之前 K字交叉验证的代码是这样的:并且运行良好

df = pd.read_csv('finalupdatedothers-multilabel.csv')

X= df[['sentences']]

dfy = df[['ADR','WD','EF','INF','SSI','DI','others']]
df1 = dfy.stack().reset_index()
df1.columns = ['a','b','c']
y_train_text = df1.groupby('a')['b'].apply(list)

lb = preprocessing.MultiLabelBinarizer()
# Run classifier
stop_words = stopwords.words('english')

classifier=make_pipeline(CountVectorizer(),
                  TfidfTransformer(),
                  #SelectKBest(chi2, k=4),
                  OneVsRestClassifier(SGDClassifier()))

#combined_features = FeatureUnion([("pca", pca), ("univ_select", selection)])

random_state = np.random.RandomState(0)
# Split into training and test
X_train, X_test, y_train, y_test = train_test_split(X_train, y_train_text, test_size=.2,
                                                    random_state=random_state)
print y_train
# # Binarize the output classes
Y = lb.fit_transform(y_train)
Y_test=lb.transform(y_test)
classifier.fit(X_train, Y)
y_score = classifier.fit(X_train, Y).decision_function(X_test)
print ("y_score"+str(y_score))
predicted = classifier.predict(X_test)
all_labels = lb.inverse_transform(predicted)

#print accuracy_score
print ("accuracy : "+str(accuracy_score(Y_test, predicted)))

print ("micro f-measure "+str(f1_score(Y_test, predicted, average='weighted')))

print("precision"+str(precision_score(Y_test,predicted,average='weighted')))

print("recall"+str(recall_score(Y_test,predicted,average='weighted')))

for item, labels in zip(X_test, all_labels):
    print ('%s => %s' % (item, ', '.join(labels)))

当我更改代码以使用k倍交叉验证而不是train_tes_split时。我收到此错误:

ValueError: Found input variables with inconsistent numbers of samples: [1, 6008]

已使用iloc更新 我使用k折交叉验证的代码如下:

kf = KFold(n_splits=10)
kf.get_n_splits(X)
KFold(n_splits=2, random_state=None, shuffle=False)
for train_index, test_index in kf.split(X):
    X_train, X_test = X.iloc[train_index], X.iloc[test_index]
    y_train, y_test = y_train_text.iloc[train_index], 
                                   y_train_text.iloc[test_index]

您能告诉我我做错了什么吗?

我的数据如下:

,sentences,ADR,WD,EF,INF,SSI,DI,others
0,"extreme weight gain, short-term memory loss, hair loss.",1.0,,,,,,
1,I am detoxing from Lexapro now.,,,,,,,1.0
2,I slowly cut my dosage over several months and took vitamin supplements to help.,,,,,,,1.0

0 个答案:

没有答案