Question

首先，我会说我在Python上做了很多工作，但是我正在尝试通过数学/数据绘图进入新领域，所以请多多包涵。我的数据集包括4列-人员，x，y坐标以及对这些坐标的二进制响应。借助这些数据，我正在寻求做一些不同的事情。

为每组x，y坐标返回一个概率值
创建某种图形（热图/密度？），该图形将显示图形区域的可能性为0/1
使用“人员”列评估数据的子集

基于我已经完成的研究，sklearn.linear_model LogisticRegression似乎是解决此问题的最佳方法（也已与pyGAM结合使用）。正如我的脚本显示的那样，我在数据集上运行的“ predict_proba”功能最深入，但是我在其他地方做错了什么，或者我不知道如何解释结果，因为它们似乎相去甚远。如果有人可以帮助我，我将非常感谢。

data_df = frame[['person','x_value','y_value','binary_result']]

#Create a scatter plot of the x,y coordinates with regard to their binary result
fig = plt.figure(figsize=(4,4))
ax = fig.add_subplot(1, 1, 1)

bin_res = [0,1]
bin_col = ['r','g']

for res,col in zip(bin_res,bin_col):
    plot_df = data_df[(data_df['binary_result']  == res)]
    ax.scatter(plot_df['x_value'], plot_df['y_value'], c=col, marker='.')   

plt.show()

#Execute logistic regression on the dataset
x = data_df[['x_value','y_value']]
y = data_df[['binary_result']]

log_reg = linear_model.LogisticRegression(solver='lbfgs').fit(x,np.ravel(y))

predictions = log_reg.predict(x)
predict_a = log_reg.predict_proba(x)

print(predict_a)

使用逻辑回归评估数据集-Python / sklearn

0 个答案: