Question

import numpy as np
import pandas as pd

np.percentile([0,10], [10,50,90])
# array([ 1.,  5.,  9.])

df = pd.DataFrame({'a':[0,10], 'b':[0,30]})

print(df)
#     a   b
# 0   0   0
# 1  10  30


df.apply(np.percentile, axis=0, q=[10,20,30,40,50,75,100])

理想情况下，应返回包含每列相关百分位数的数据框（例如，列b=[3,6,9,12,22.5,30]），但我得到：

ValueError: Shape of passed values is (2, 7), indices imply (2, 2)

似乎pandas要么应用于返回每列的标量，要么返回与列长度相同的向量。有没有办法将不同长度的向量返回到原始数据？

python 3.4.3;大熊猫16.1

Answer 1

您可以从结果中构建一个系列：

In [27]:

df.apply(lambda x: pd.Series(np.percentile(x, axis=0, q=[10,20,30,40,50,75,100])))
Out[27]:
      a     b
0   1.0   3.0
1   2.0   6.0
2   3.0   9.0
3   4.0  12.0
4   5.0  15.0
5   7.5  22.5
6  10.0  30.0

因此它不会抱怨不正确的形状

Answer 2

你应该这样做。更简单（它在引擎盖下使用np.percentile。）

In [9]: df.quantile([.10,.20,.30,.40,.50,.75,1])
Out[9]: 
         a     b
0.10   1.0   3.0
0.20   2.0   6.0
0.30   3.0   9.0
0.40   4.0  12.0
0.50   5.0  15.0
0.75   7.5  22.5
1.00  10.0  30.0

Answer 3

您还可以启用raw：

df.apply(np.percentile, axis=0, q=[10,20,30,40,50,75,100], raw=True)

# a       [1.0, 2.0, 3.0, 4.0, 5.0, 7.5, 10.0]
# b    [3.0, 6.0, 9.0, 12.0, 15.0, 22.5, 30.0]

Answer 4

另一个与EdChum的答案相同的版本，但在apply内而不是np.percentile内分割：

df.apply(lambda x: pd.Series(np.percentile(x, q=[10,20,30,40,50,75,100])), axis=0)

在pandas.apply（）中返回一个不同长度的列向量？

4 个答案: