将列表的每个值映射到其加权百分位数

时间:2018-01-15 16:14:30

标签: python numpy scipy

我想计算列表(或numpy数组)中每个值的百分位数,并按另一个列表中的权重加权。例如,给定一些f我想:

x = [1, 2, 3, 4]
weights = [2, 2, 3, 3]
f(x, weights)

获得[20, 40, 70, 100]

我可以使用

计算单个项目的未加权百分位数
from scipy import stats
stats.percentileofscore(x, 3)
# 75.0

Map each list value to its corresponding percentile我还可以使用

计算每个Weighted version of scipy percentileofscore
[stats.percentileofscore(x, a, 'rank') for a in x]
# [25.0, 50.0, 75.0, 100.0]

根据has 4 state constants in Player我可以使用以下方法计算单个项目的加权百分位数:

def weighted_percentile_of_score(x, weights, score, kind='weak'):
    npx = np.array(x)
    npw = np.array(weights)

    if kind == 'rank':  # Equivalent to 'weak' since we have weights.
        kind = 'weak'

    if kind in ['strict', 'mean']:
        indx = npx < score
        strict = 100 * sum(npw[indx]) / sum(weights)
    if kind == 'strict':
        return strict

    if kind in ['weak', 'mean']:    
        indx = npx <= score
        weak = 100 * sum(npw[indx]) / sum(weights)
    if kind == 'weak':
        return weak

    if kind == 'mean':
        return (strict + weak) / 2

被称为:

weighted_percentile_of_score(x, weights, 3))  # 70.0 as desired.

如何(有效地)为列表中的每个项目执行此操作?

2 个答案:

答案 0 :(得分:1)

this answer调整为Weighted percentile using numpy您可以对数组进行排序,然后将权重cumsum除以权重:

def weighted_percentileofscore(values, weights=None, values_sorted=False):
    """ Similar to scipy.percentileofscore, but supports weights.
    :param values: array-like with data.
    :param weights: array-like of the same length as `values`.
    :param values_sorted: bool, if True, then will avoid sorting of initial array.
    :return: numpy.array with percentiles of sorted array.
    """
    values = np.array(values)
    if weights is None:
        weights = np.ones(len(values))
    weights = np.array(weights)

    if not values_sorted:
        sorter = np.argsort(values)
        values = values[sorter]
        weights = weights[sorter]

    total_weight = weights.sum()
    return 100 * np.cumsum(weights) / total_weight

验证

weighted_percentileofscore(x, weights)
# array([20., 40., 70., 100. ])

如果传递未排序的数组,则必须将其映射回原始排序,因此最好先排序。

这应该比为每个值单独计算要快得多。

答案 1 :(得分:0)

这不是很有效,但您可以结合问题中列出的方法:

[weighted_percentile_of_score(x, weights, val) for val in x]
# [20.0, 40.0, 70.0, 100.0]