我有一个文本文件,每行有一个句子: 例如“”您是否在银行账户中注册了您的电子邮件ID?“
我想将其分类为疑问句或非疑问句。仅供参考,这些是来自银行网站的句子。 我见过this answer 使用这个nltk代码块:
import nltk
nltk.download('nps_chat')
posts = nltk.corpus.nps_chat.xml_posts()[:10000]
def dialogue_act_features(post):
features = {}
for word in nltk.word_tokenize(post):
features['contains({})'.format(word.lower())] = True
return features
featuresets = [(dialogue_act_features(post.text), post.get('class')) for post in posts]
size = int(len(featuresets) * 0.1)
train_set, test_set = featuresets[size:], featuresets[:size]
classifier = nltk.NaiveBayesClassifier.train(train_set)
print(nltk.classify.accuracy(classifier, test_set))
所以我对我的文本文件进行了一些预处理,即删除单词,删除单词等,使每个句子成为一个单词。从上面的代码中,我有一个训练有素的分类器。如何在我的句子文本文件(原始或预处理)上实现它?
更新:here是我的文本文件的示例。
答案 0 :(得分:1)
import nltk
nltk.download('nps_chat')
posts = nltk.corpus.nps_chat.xml_posts()[:10000]
def dialogue_act_features(post):
features = {}
for word in nltk.word_tokenize(post):
features['contains({})'.format(word.lower())] = True
return features
featuresets = [(dialogue_act_features(post.text), post.get('class')) for post in posts]
size = int(len(featuresets) * 0.1)
train_set, test_set = featuresets[size:], featuresets[:size]
classifier = nltk.NaiveBayesClassifier.train(featuresets)
print(nltk.classify.accuracy(classifier, test_set))
0.668
classifier = nltk.NaiveBayesClassifier.train(featuresets)
print(classifier.classify(dialogue_act_features(line)))
答案 1 :(得分:0)
对文本文件中的所有行执行此操作:
classifier = nltk.NaiveBayesClassifier.train(featuresets)
print(classifier.classify(dialogue_act_features(line)))