输入分类器时出现值错误-campaign_name中的系列是一个字符串,tokenized_sents是几个字符串。发生什么事了?
df['campaign_name'] = str.lower(df.campaign_name)
df['tokenized_sents'] = df.campaign_name.apply(nltk.word_tokenize)
X = df['tokenized_sents']
y = df['campaign_name']
xTrain, xTest, yTrain, yTest = tts(X, y, train_size=0.65, random_state=1)
classifier = nltk.NaiveBayesClassifier.train(xTrain,yTrain)
[ValueError: too many values to unpack (expected 2)][1]
答案 0 :(得分:0)
您正在classifier = nltk.NaiveBayesClassifier.train(xTrain,yTrain)
中传递两个值xTrain和yTrain。您只能通过一个训练。
这可能是要找的东西吗?
classifier = nltk.NaiveBayesClassifier.train(xTrain)
y_pred = classifier.classify(xTest)
答案 1 :(得分:0)
如果您查看NLTK的朴素贝叶斯Classfier的文档:
@classmethod
def train(cls, labeled_featuresets, estimator=ELEProbDist):
"""
:param labeled_featuresets: A list of classified featuresets,
i.e., a list of tuples ``(featureset, label)``.
"""
您应该以与scikit-learn设置中使用的稍有不同的方式来组织功能和标签。这里是功能集和标签的元组列表,而不是功能矩阵和标签列表。