NLP分类器-Python太多值无法解压缩

时间:2018-07-11 17:38:38

标签: python classification nltk

输入分类器时出现值错误-campaign_name中的系列是一个字符串,tokenized_sents是几个字符串。发生什么事了?

df['campaign_name'] = str.lower(df.campaign_name)
df['tokenized_sents'] = df.campaign_name.apply(nltk.word_tokenize)

X = df['tokenized_sents']
y = df['campaign_name']

xTrain, xTest, yTrain, yTest = tts(X, y, train_size=0.65, random_state=1)

classifier = nltk.NaiveBayesClassifier.train(xTrain,yTrain)

[ValueError: too many values to unpack (expected 2)][1]

2 个答案:

答案 0 :(得分:0)

您正在classifier = nltk.NaiveBayesClassifier.train(xTrain,yTrain)中传递两个值xTrain和yTrain。您只能通过一个训练。

这可能是要找的东西吗?

classifier = nltk.NaiveBayesClassifier.train(xTrain)

y_pred = classifier.classify(xTest)

答案 1 :(得分:0)

如果您查看NLTK的朴素贝叶斯Classfier的文档:

@classmethod
    def train(cls, labeled_featuresets, estimator=ELEProbDist):
        """
        :param labeled_featuresets: A list of classified featuresets,
            i.e., a list of tuples ``(featureset, label)``.
        """

您应该以与scikit-learn设置中使用的稍有不同的方式来组织功能和标签。这里是功能集和标签的元组列表,而不是功能矩阵和标签列表。