为了通过linearSVC
可视化两个类的分离,我使用了一个图(在下面的函数中定义)
def show_linearSVC_class_separation(linearSVC: 'LinearSVC', X_test, y_test):
y_decision_score = linearSVC.decision_function(X_test)
# getting the score of the truly positive individuals
y_positive_decision_score = y_decision_score[y_test == 1]
# getting the score of the truly negative individuals
y_negative_decision_score = y_decision_score[y_test == 0]
# counting the distribution of each score value in each class
positive_count = Counter(y_positive_decision_score)
negative_count = Counter(y_negative_decision_score)
# sorting the decision scores to draw a good curve
y_positive_decision_score = np.sort(list(positive_count.keys()))
y_positive_distribution = [positive_count[key] for key in y_positive_decision_score]
y_negative_decision_score = np.sort(list(negative_count.keys()))
y_negative_distribution = [negative_count[key] for key in y_negative_decision_score]
# the alpaha is useful to see the overlaping area between the two classes
plt.fill_between(y_positive_decision_score, 0, y_positive_distribution, color='blue', alpha=0.5, hatch='')
plt.plot(y_positive_decision_score, y_positive_distribution, color='blue', marker='.')
plt.fill_between(y_negative_decision_score, 0, y_negative_distribution, color='red', alpha=0.5, hatch='')
plt.plot(y_negative_decision_score, y_negative_distribution, color='red', marker='.')
plt.legend(['True_positives', 'True_negatives']).draggable()
plt.xlabel('SVM decision_function values')
plt.ylabel('Number of data points')
plt.show()
我认为这是因为有很多decision_value
s计数为1。也许直方图是要走的路。如何在间隔中存储decision_values
并计算属于每个间隔的数据点?
我需要间隔具有相同的长度,例如(长度= 1):
interval || counting
[-7 ; -6] -> 20
]-6 ; -5] -> 30
....
] 5 ; 6] -> 10
或许,还有另一种可视化二进制类分离的方法。
进行可视化,我从这篇博客文章Roc curve demonstration中获取灵感。
答案 0 :(得分:1)
经过一些环顾(matplolib和numpy文档)之后,我终于决定尝试使用histogramme来可视化类séparation(知道我正在研究多维向量空间,~200k
维度)。登记/>
这是函数
''' Plots the seperation plane
Args:
LinearSVC: An LinearSVC instance that was previously fitted (.fit())
'''
def show_linearSVC_class_separation(linearSVC: 'LinearSVC', X_test, y_test):
y_decision_score = linearSVC.decision_function(X_test)
# getting the score of the truly positive individuals
y_positive_decision_score = y_decision_score[y_test == 1]
# getting the score of the truly negative individuals
y_negative_decision_score = y_decision_score[y_test == 0]
# get the (min-1) and the (max +1) scores to be sure to include all the scores in the intervals of the histogramme
_, min_positive = np.modf(y_positive_decision_score.min() - 1)
_, max_positive = np.modf(y_positive_decision_score.max() + 1)
positive_bins = np.arange(min_positive, max_positive + 1)
# get the (min-1) and the (max +1) scores to be sure to include all the scores in the intervals of the histogramme
_, min_negative = np.modf(y_negative_decision_score.min() - 1)
_, max_negative = np.modf(y_negative_decision_score.max() + 1)
negative_bins = np.arange(min_negative, max_negative + 1)
# plot the two histograms, alpha (the transparency) is for the overlapping areas
plt.hist(y_positive_decision_score, bins=positive_bins, alpha=0.5, label='True positives', color='b')
plt.hist(y_negative_decision_score, bins=negative_bins, alpha=0.5, label='True negatives', color='r')
plt.xlabel('SVM decision_function values')
plt.ylabel('Number of data points')
plt.show()
以下是问题中相同示例的结果:
答案 1 :(得分:0)