在拆分数据测试和文本分类训练中,如何解决“类型错误:只能将整数标量数组转换为标量索引”

时间:2019-04-27 19:13:37

标签: python-3.x scikit-learn svm

我想创建一个程序来使用SVM对文本数据进行分类。但是在此之前,我必须使用StratifiedKFold()将数据分为训练数据和测试数据。

但最终出现此错误:

'Traceback (most recent call last):
  File "C:\Users\Administrator\PycharmProjects\untitled1\main.py", line 115, in <module>
     y_train, y_test = labels[train_index], labels[test_index]
TypeError: only integer scalar arrays can be converted to a scalar index'

如何解决此代码中的错误?

这是在python 3.7上运行的代码

labels = []
label_np = np.array(labels)

with open(path, encoding='utf-8') as in_file:
    data = csv.reader(in_file)
    for line in data:
        label_ = np.append(label_np, line)

model = SVC(kernel='linear')
total_svm = []
total_mat_svm = np.zeros((2,2))

kf = StratifiedKFold(n_splits=3)
kf.get_n_splits(result_preprocess, label_)

for train_index, test_index in kf.split(result_preprocess, label_):
    # print('Train : ', test_index, 'Test : ', test_index)
    x_train, x_test = result_preprocess[train_index], result_preprocess[test_index]
    y_train, y_test = label_[train_index], label_[test_index]

vectorizer = TfidfVectorizer(min_df=5,
                             max_df=0.8,
                             sublinear_tf=True,
                             use_idf=True)
train_vector = vectorizer.fit_transform(x_train)
test_vector = vectorizer.transform(x_test)

model.fit(x_train, y_train)
hasil_svm = model.predict(x_test)

total_mat_svm = total_mat_svm + confusion_matrix(y_test, hasil_svm)
total_svm = total_mat_svm + sum(y_test==hasil_svm)

print(total_mat_svm)

我希望结果是分类性能和分类的混淆矩阵。

1 个答案:

答案 0 :(得分:0)

请查看以下答案:numpy array TypeError: only integer scalar arrays can be converted to a scalar index

我怀疑不仅result_preprocess,而且labels也是您数据管道中的列表。在这种情况下,解决方案就是在运行代码段之前将labels转换为NumPy数组:

import numpy as np
labels = np.array(labels)