Question

我想使用np.percentile为每一行获取不同的分位数。

例如，给定这个2行数组，我想在第一行中获取20％的百分位数，在第二行中获取60％的百分位数。

dat = np.array([[1, 10, 3], [4, -1, 5]])
dat
# array([[ 1, 10,  3],
#        [ 4, -1,  5]])

从20％开始：

np.percentile(dat, 0.2, axis=1)
# array([ 1.008, -0.98 ])

第60个：

np.percentile(dat, 0.6, axis=1)
# array([ 1.024, -0.94 ])

基于此，理想的结果将是[1.008, -0.94]。

在分位数处传递向量会将结果扩展为n x n数组：

np.percentile(dat, [0.2, 0.6], axis=1)
# array([[ 1.008, -0.98 ],
#        [ 1.024, -0.94 ]])

此结果的对角线产生正确的结果：

np.percentile(dat, [0.2, 0.6], axis=1).diagonal()
# array([ 1.008, -0.94 ])

但是，这对于大型阵列来说代价过高。有没有一种方法可以直接为每一行计算具有相应分位数的百分位数？

Answer 1

如果与数据类型没有冲突，则可以将百分位数和数据连接起来，然后使用np.apply_along_axis以便将百分位数与数据分开：

def percentile_qarray_np(dat, q):
  return np.apply_along_axis(
    lambda x: np.percentile(x[1:], x[0]),
    1,
    np.concatenate([np.array(q)[:, np.newaxis], dat], axis=1)
  )

例如：

n = 10
percentiles = np.linspace(0, 100, n)
a = np.arange(n**2).reshape(n, n)
print(percentile_qarray_np(a, percentiles))

现在位于synthimpute包中。

Answer 2

将数组变成具有所需分位数的一列apply后，可以使用DataFrame：

def percentile_qarray_df(dat, q):
  # dat: numpy array.
  # q: Vector with the same number of rows as dat.
  df = pd.DataFrame(dat)
  df['q'] = q
  return df.apply(lambda x: np.percentile(x.drop('q'), x.q), axis=1)

例如：

percentile_qarray_df(dat, [0.2, 0.6])
# 0    1.008
# 1   -0.940
# dtype: float64

这仍然很慢。

使用numpy百分位数为每一行获取不同的分位数

2 个答案: