SVM决策功能:可视化类分离

时间:2017-05-16 13:14:57

标签: python matplotlib scikit-learn

为了通过linearSVC可视化两个类的分离,我使用了一个图(在下面的函数中定义)

def show_linearSVC_class_separation(linearSVC: 'LinearSVC', X_test, y_test):
    y_decision_score = linearSVC.decision_function(X_test)

    # getting the score of the truly positive individuals
    y_positive_decision_score = y_decision_score[y_test == 1]

    # getting the score of the truly negative individuals
    y_negative_decision_score = y_decision_score[y_test == 0]

    # counting the distribution of each score value in each class
    positive_count = Counter(y_positive_decision_score)
    negative_count = Counter(y_negative_decision_score)

    # sorting the decision scores to draw a good curve 
    y_positive_decision_score = np.sort(list(positive_count.keys()))
    y_positive_distribution = [positive_count[key] for key in y_positive_decision_score]
    y_negative_decision_score = np.sort(list(negative_count.keys()))
    y_negative_distribution = [negative_count[key] for key in y_negative_decision_score]

    # the alpaha is useful to see the overlaping area between the two classes
    plt.fill_between(y_positive_decision_score, 0, y_positive_distribution, color='blue', alpha=0.5, hatch='')
    plt.plot(y_positive_decision_score, y_positive_distribution, color='blue', marker='.')
    plt.fill_between(y_negative_decision_score, 0, y_negative_distribution, color='red', alpha=0.5, hatch='')
    plt.plot(y_negative_decision_score, y_negative_distribution, color='red', marker='.')


    plt.legend(['True_positives', 'True_negatives']).draggable()
    plt.xlabel('SVM decision_function values')
    plt.ylabel('Number of data points')
    plt.show()

但是,结果是......非常难看,自己动手:exemple of a visualization using the function above

我认为这是因为有很多decision_value s计数为1。也许直方图是要走的路。如何在间隔中存储decision_values并计算属于每个间隔的数据点?
  我需要间隔具有相同的长度,例如(长度= 1):

interval  || counting
[-7 ; -6] -> 20
]-6 ; -5] -> 30
....
] 5 ; 6] -> 10

或许,还有另一种可视化二进制类分离的方法。

进行可视化,我从这篇博客文章Roc curve demonstration中获取灵感。

2 个答案:

答案 0 :(得分:1)

经过一些环顾(matplolib和numpy文档)之后,我终于决定尝试使用histogramme来可视化类séparation(知道我正在研究多维向量空间,~200k维度)。登记/>   这是函数

''' Plots the seperation plane 
      Args:
           LinearSVC: An LinearSVC instance that was previously fitted (.fit())
'''
def show_linearSVC_class_separation(linearSVC: 'LinearSVC', X_test, y_test):

    y_decision_score = linearSVC.decision_function(X_test)

    # getting the score of the truly positive individuals
    y_positive_decision_score = y_decision_score[y_test == 1]

    # getting the score of the truly negative individuals
    y_negative_decision_score = y_decision_score[y_test == 0]

    # get the (min-1) and the (max +1) scores to be sure to include all the scores in the intervals of the histogramme
    _, min_positive = np.modf(y_positive_decision_score.min() - 1)
    _, max_positive = np.modf(y_positive_decision_score.max() + 1)
    positive_bins = np.arange(min_positive, max_positive + 1)

    # get the (min-1) and the (max +1) scores to be sure to include all the scores in the intervals of the histogramme    
    _, min_negative = np.modf(y_negative_decision_score.min() - 1)
    _, max_negative = np.modf(y_negative_decision_score.max() + 1)
    negative_bins = np.arange(min_negative, max_negative + 1)

    # plot the two histograms, alpha (the transparency) is for the overlapping areas
    plt.hist(y_positive_decision_score, bins=positive_bins, alpha=0.5, label='True positives', color='b')
    plt.hist(y_negative_decision_score, bins=negative_bins, alpha=0.5, label='True negatives', color='r')

    plt.xlabel('SVM decision_function values')
    plt.ylabel('Number of data points')
    plt.show()

以下是问题中相同示例的结果:

svm class separation

答案 1 :(得分:0)

尝试以下

<?php
require_once 'conf/config.php';


if (!empty($_REQUEST['magName'] && $_REQUEST['year'] && $_REQUEST['issue'] 
)) {

$magazineName = $_REQUEST['magName'];
$year = $_REQUEST['year'] ;
$issue = $_REQUEST['issue'] ;



}

这应该产生一个类似的情节: enter image description here