我想计算熊猫数据帧每一行的加权中值。
我发现了这个不错的功能(https://stackoverflow.com/a/29677616/10588967),但似乎无法传递2d数组。
def weighted_quantile(values, quantiles, sample_weight=None, values_sorted=False, old_style=False):
""" Very close to numpy.percentile, but supports weights.
NOTE: quantiles should be in [0, 1]!
:param values: numpy.array with data
:param quantiles: array-like with many quantiles needed
:param sample_weight: array-like of the same length as `array`
:param values_sorted: bool, if True, then will avoid sorting of initial array
:param old_style: if True, will correct output to be consistent with numpy.percentile.
:return: numpy.array with computed quantiles.
"""
values = numpy.array(values)
quantiles = numpy.array(quantiles)
if sample_weight is None:
sample_weight = numpy.ones(len(values))
sample_weight = numpy.array(sample_weight)
assert numpy.all(quantiles >= 0) and numpy.all(quantiles <= 1), 'quantiles should be in [0, 1]'
if not values_sorted:
sorter = numpy.argsort(values)
values = values[sorter]
sample_weight = sample_weight[sorter]
weighted_quantiles = numpy.cumsum(sample_weight) - 0.5 * sample_weight
if old_style:
# To be convenient with numpy.percentile
weighted_quantiles -= weighted_quantiles[0]
weighted_quantiles /= weighted_quantiles[-1]
else:
weighted_quantiles /= numpy.sum(sample_weight)
return numpy.interp(quantiles, weighted_quantiles, values)
使用链接中的代码,可以完成以下操作:
weighted_quantile([1, 2, 9, 3.2, 4], [0.0, 0.5, 1.])
但是,这不起作用:
values = numpy.random.randn(10,5)
quantiles = [0.0, 0.5, 1.]
sample_weight = numpy.random.randn(10,5)
weighted_quantile(values, quantiles, sample_weight)
我收到以下错误:
weighted_quantiles = np.cumsum(sample_weight) - 0.5 * sample_weight
ValueError:操作数不能与形状(250,)(10,5,5)一起广播
问题 是否有可能以向量化的方式在数据帧上应用此加权分位数函数,或者我只能使用.apply()实现此目的?
非常感谢您的光临!
答案 0 :(得分:0)
np.cumsum(sample_weight)
返回一维列表。所以您想使用
将其重塑为(10,5,5)np.cumsum(sample_weight).reshape(10,5,5)