使用scikit LinearSVC预测置信度

时间:2013-07-26 14:42:34

标签: python scikit-learn

我正在使用LinerSVC技术对文本进行分类,但我希望每个预测都附加一个预测置信度。

这就是我现在所拥有的:

    train_set = self.read_training_files()
    count_vect = CountVectorizer()
    X_train_counts = count_vect.fit_transform([e[0] for e in train_set])
    tfidf_transformer = TfidfTransformer()
    X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
    clf = LinearSVC(C=1).fit(X_train_tfidf, [e[1] for e in train_set])
    _ = text_clf.fit([e[0] for e in train_set], [e[1] for e in train_set])
    foods = list(self.get_foods())
    lenfoods = len(foods)
    i = 0
    for food in foods:
        fd = self.get_modified_food(food)
        food_desc = fd['fields']['title'].replace(',', '').lower()
        X_new_counts = count_vect.transform([food_desc])
        X_new_tfidf = tfidf_transformer.transform(X_new_counts)
        predicted = clf.predict(X_new_tfidf)

变量“预测”将包含预测的类别编号,不包括置信水平。我一直在阅读源代码here,但我找不到合适的属性来执行此操作。

1 个答案:

答案 0 :(得分:5)

我认为你在寻找错误的地方:)。你看过了吗:

相关的decision function


对我来说,sklearn的文档非常有帮助;有时候比代码更多:)