我正在研究广告系列的倾向建模问题。我的数据集由历来点击过广告的用户和未点击过的用户组成。
要测量模型的性能,我使用sklearn绘制了累积增益和升力图。下面是相同的代码:
import matplotlib.pyplot as plt
import scikitplot as skplt
Y_test_pred_ = model.predict_proba(X_test_df)[:]
skplt.metrics.plot_cumulative_gain(Y_test, Y_test_pred_)
plt.show()
skplt.metrics.plot_lift_curve(Y_test, Y_test_pred_)
plt.show()
的图形
我只需要针对基线曲线绘制1类曲线。 有办法吗?
答案 0 :(得分:1)
您可以使用 kds 包。
对于累积收益图:
# pip install kds
import kds
kds.metrics.plot_cumulative_gain(y_test, y_prob)
对于提升图:
import kds
kds.metrics.plot_lift(y_test, y_prob)
# REPRODUCABLE EXAMPLE
# Load Dataset and train-test split
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn import tree
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.33,random_state=3)
clf = tree.DecisionTreeClassifier(max_depth=1,random_state=3)
clf = clf.fit(X_train, y_train)
y_prob = clf.predict_proba(X_test)
# CUMMULATIVE GAIN PLOT
import kds
kds.metrics.plot_cumulative_gain(y_test, y_prob[:,1])
# LIFT PLOT
kds.metrics.plot_lift(y_test, y_prob[:,1])
答案 1 :(得分:0)
如果需要,我可以解释代码:
Args:
df:数据框包含一个得分列和一个目标列
分数:包含分数列名称的字符串
target:包含目标列名称的字符串
title:包含将要生成的图形名称的字符串
def get_cum_gains(df, score, target, title):
df1 = df[[score,target]].dropna()
fpr, tpr, thresholds = roc_curve(df1[target], df1[score])
ppr=(tpr*df[target].sum()+fpr*(df[target].count()-
df[target].sum()))/df[target].count()
plt.figure(figsize=(12,4))
plt.subplot(1,2,1)
plt.plot(ppr, tpr, label='')
plt.plot([0, 1], [0, 1], 'k--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.grid(b=True, which='both', color='0.65',linestyle='-')
plt.xlabel('%Population')
plt.ylabel('%Target')
plt.title(title+'Cumulative Gains Chart')
plt.legend(loc="lower right")
plt.subplot(1,2,2)
plt.plot(ppr, tpr/ppr, label='')
plt.plot([0, 1], [1, 1], 'k--')
plt.grid(b=True, which='both', color='0.65',linestyle='-')
plt.xlabel('%Population')
plt.ylabel('Lift')
plt.title(title+'Lift Curve')
答案 2 :(得分:0)
这有点hacky,但它可以满足您的需求。重点是得到
访问 matplotlib 创建的 ax
变量。然后
操作它以删除不需要的情节。
# Some dummy data to work with
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
X, y = load_breast_cancer(return_X_y=True)
# ploting
import scikitplot as skplt
import matplotlib.pyplot as plt
# classify
clf = LogisticRegression(solver='liblinear', random_state=42).fit(X, y)
# classifier's output probabilities for the two classes
y_preds_probas = clf.predict_proba(X)
# get access to the figure and axes
fig, ax = plt.subplots()
# ax=ax creates the plot on the same ax we just initialized.
skplt.metrics.plot_lift_curve(y, y_preds_probas, ax=ax)
## Now the solution to your problem.
del ax.lines[0] # delete the desired class plot
ax.legend().set_visible(False) # hide the legend
ax.legend().get_texts()[0].set_text("Cancer") # turn the legend back on
plt.show()
您可能需要处理 ax.lines[1]
等
删除你想要的当然。