我将文本分为2类。一种是命令性的,另一种是非命令性的。我以朴素贝叶斯分类器需要的方式准备了文本。但是,现在,我还需要使用SVM。我该怎么办? (我也需要对文本进行分类并计算准确性。)感谢您阅读并尝试回答我的问题。
all_words_list = [word for (sent, cat) in train for word in sent]
all_words = nltk.FreqDist(all_words_list)
word_items = all_words.most_common(1000)
word_features = [word for (word, count) in word_items]
def document_features(document, word_features):
document_words = set(document)
features = {}
for word in word_features:
features['contains({})'.format(word)] = (word in document_words)
return features
featuresets = [(document_features(d, word_features), c) for (d, c) in
train]
train_set, test_set = featuresets[360:], featuresets[:360]
classifier = nltk.NaiveBayesClassifier.train(train_set)
print (nltk.classify.accuracy(classifier, test_set))
答案 0 :(得分:1)
我建议先将您的数据集划分为训练并正确测试
X
包含功能变量,Y
包含响应变量,我们将其分成70%-30%
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=101,test_size=0.3)
比
from sklearn import svm
from sklearn import metrics
#on sklearn docs you can find more about SVM parameters
model = svm.SVC(kernel='rbf',C=10000.0,gamma = 'auto')
model = model.fit(X_train, y_train)
print('Accuracy is ', round(metrics.accuracy_score(model.predict(X_test),y_test),2))