Question

我有一个随机变量如下：

f（x）= 1，概率为g（x）

f（x）= 0，概率为1-g（x）

其中0 < g（x）＆lt; 1.

假设g（x）= x。让我们说我在不知道函数g的情况下观察这个变量并获得如下100个样本：

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binned_statistic

list = np.ndarray(shape=(200,2))

g = np.random.rand(200)
for i in range(len(g)):
    list[i] = (g[i], np.random.choice([0, 1], p=[1-g[i], g[i]]))

print(list)
plt.plot(list[:,0], list[:,1], 'o')

Plot of 0s and 1s

现在，我想从这些点检索函数g。我能想到的最好的是使用绘制直方图并使用均值统计量：

bin_means, bin_edges, bin_number = binned_statistic(list[:,0], list[:,1], statistic='mean', bins=10)
plt.hlines(bin_means, bin_edges[:-1], bin_edges[1:], lw=2)

Histogram mean statistics

相反，我想对生成函数进行连续估计。

我猜这是关于内核密度估计但我找不到合适的指针。

Answer 1

直截了当，没有明确地拟合估算器：

import seaborn as sns 
g = sns.lmplot(x= , y= , y_jitter=.02 , logistic=True)

插入x=您的外生变量和类似y =因变量。如果你有很多数据点，那么y_jitter就是更好的可见性。 logistic = True是这里的要点。它将为您提供数据的逻辑回归线。

Seaborn基本上围绕matplotlib量身定制，与pandas配合使用，以防您想要将数据扩展到DataFrame。

二进制值随机变量的局部加权平滑

1 个答案: