我想计算列表(或numpy数组)中每个值的百分位数,并按另一个列表中的权重加权。例如,给定一些f
我想:
x = [1, 2, 3, 4]
weights = [2, 2, 3, 3]
f(x, weights)
获得[20, 40, 70, 100]
。
我可以使用
计算单个项目的未加权百分位数from scipy import stats
stats.percentileofscore(x, 3)
# 75.0
每Map each list value to its corresponding percentile我还可以使用
计算每个Weighted version of scipy percentileofscore[stats.percentileofscore(x, a, 'rank') for a in x]
# [25.0, 50.0, 75.0, 100.0]
根据has 4 state constants in Player我可以使用以下方法计算单个项目的加权百分位数:
def weighted_percentile_of_score(x, weights, score, kind='weak'):
npx = np.array(x)
npw = np.array(weights)
if kind == 'rank': # Equivalent to 'weak' since we have weights.
kind = 'weak'
if kind in ['strict', 'mean']:
indx = npx < score
strict = 100 * sum(npw[indx]) / sum(weights)
if kind == 'strict':
return strict
if kind in ['weak', 'mean']:
indx = npx <= score
weak = 100 * sum(npw[indx]) / sum(weights)
if kind == 'weak':
return weak
if kind == 'mean':
return (strict + weak) / 2
被称为:
weighted_percentile_of_score(x, weights, 3)) # 70.0 as desired.
如何(有效地)为列表中的每个项目执行此操作?
答案 0 :(得分:1)
将this answer调整为Weighted percentile using numpy您可以对数组进行排序,然后将权重cumsum
除以权重:
def weighted_percentileofscore(values, weights=None, values_sorted=False):
""" Similar to scipy.percentileofscore, but supports weights.
:param values: array-like with data.
:param weights: array-like of the same length as `values`.
:param values_sorted: bool, if True, then will avoid sorting of initial array.
:return: numpy.array with percentiles of sorted array.
"""
values = np.array(values)
if weights is None:
weights = np.ones(len(values))
weights = np.array(weights)
if not values_sorted:
sorter = np.argsort(values)
values = values[sorter]
weights = weights[sorter]
total_weight = weights.sum()
return 100 * np.cumsum(weights) / total_weight
验证
weighted_percentileofscore(x, weights)
# array([20., 40., 70., 100. ])
如果传递未排序的数组,则必须将其映射回原始排序,因此最好先排序。
这应该比为每个值单独计算要快得多。
答案 1 :(得分:0)
这不是很有效,但您可以结合问题中列出的方法:
[weighted_percentile_of_score(x, weights, val) for val in x]
# [20.0, 40.0, 70.0, 100.0]