Question

我正在尝试在ECDF上的2点之间生成数字。在这些边界处，ECDF后面的原始数据很少。

当我使用生成的ECDF查看边界处的分位数，然后将其返回到边界处的实际值时（使用numpy分位数功能），我得到的值远低于原始边界。

这意味着当我在这两个有界值之间生成数字时，生成的实际数字可能会低于下界。

我认为这是因为在这些边界处的数据稀疏，并且在计算ecdf（_lower）时选择的分位数将恢复为曲线上先前值的分位数。我将在下面提供一些代码来演示我的工作。

我尝试找到不同的ECDF函数，而不是从原始数据中找到曲线上最低的先前值-分位数对，而是计算线性插值以找出分位数应在下限处。我还没有成功。

from statsmodels.distributions.empirical_distribution import ECDF

#compute ecdf of sorted values in the dataframe
ecdf = ECDF(df)

#define the lower and upper bounds
_lower = 5e9
_upper = 10e9

#generate the quantiles at these lower and upper bounds
a = ecdf(_lower)
b = ecdf(_upper)

#compute values between these bounds
generated_values = np.quantile(df, np.random.uniform(a, b, 10))

#Here I find that the computed values are not within the original bounds

#Check out what the quantile at the lower bound looks like when converted back to a value:

lower_bound_value = np.quantile(df, a)

#Output is smaller than _lower

是否有包含线性插值的ECDF函数（在包装中）？

0 个答案: