Question

我有一个包含100个实例的向量，我需要平均值和每10个实例的最小值。在大熊猫中做到这一点的最佳方法是什么？

我解决了如下问题：

mean = []
min = []
aux = 0
for i in range(10, len(df)+1, 10):
    mean.append(df[aux:i].mean())
    mean.append(df[aux:i].min())
    aux = i

大熊猫有更有效的方法吗？

Answer 1

对于如下所示的数据框：

>>> df.head()
          0
0  0.963734
1  0.797373
2  0.623054
3  0.420744
4  0.306232

您的解决方案返回如下内容：

>>> mean
[0    0.587664
dtype: float64, 0    0.574274
dtype: float64, 0    0.462168
dtype: float64, 0    0.489871
dtype: float64, 0    0.496362
dtype: float64, 0    0.542037
dtype: float64, 0    0.336029
dtype: float64, 0    0.391856
dtype: float64, 0    0.47899
dtype: float64, 0    0.51505
dtype: float64]

>>> min
[0    0.306232
dtype: float64, 0    0.033548
dtype: float64, 0    0.083291
dtype: float64, 0    0.016033
dtype: float64, 0    0.131066
dtype: float64, 0    0.243215
dtype: float64, 0    0.052778
dtype: float64, 0    0.028525
dtype: float64, 0    0.170831
dtype: float64, 0    0.040911
dtype: float64]

只需使用groupby和agg，您就可以得到相同的结果，但格式更好：

>>> df.assign(count=np.repeat(range(10),10)).groupby('count').agg(['mean','min'])

              0          
           mean       min
count                    
0      0.587664  0.306232
1      0.574274  0.033548
2      0.462168  0.083291
3      0.489871  0.016033
4      0.496362  0.131066
5      0.542037  0.243215
6      0.336029  0.052778
7      0.391856  0.028525
8      0.478990  0.170831
9      0.515050  0.040911

这是将每一行分配给一个组，每组提供10个值。然后按该组号分组，并给出平均值和最小值

提取熊猫中的信息数据集

1 个答案: