python,pandas,使用groupby来计算df中具有多索引的平均值

时间:2017-08-05 13:23:00

标签: python pandas

让我们考虑以下数据框:

from pandas import Timestamp
dic={'volume': {('E7', Timestamp('2016-11-01 00:00:00')): Decimal('1204'),
  ('E7', Timestamp('2016-08-16 00:00:00')): Decimal('1070'),
  ('G6', Timestamp('2016-08-17 00:00:00')): Decimal('1702'),
  ('G6', Timestamp('2016-08-18 00:00:00')): Decimal('1262'),
  ('G6', Timestamp('2016-08-26 00:00:00')): Decimal('3333'),
  ('VG', Timestamp('2016-08-31 00:00:00')): Decimal('1123'),
  ('VG', Timestamp('2016-09-01 00:00:00')): Decimal('1581'),
  ('VG', Timestamp('2016-09-02 00:00:00')): Decimal('1276'),
  ('VG', Timestamp('2016-09-06 00:00:00')): Decimal('2417'),
           }}
df=pd.DataFrame(dic)

enter image description here

我希望每个符号(第一列)计算“音量”列的平均值。

我试过了df.groupby(level=0).mean(),但它没有用。

1 个答案:

答案 0 :(得分:1)

不要在Pandas中使用Decimal - 它不是原生的Numpy / Pandas dtype:

In [32]: df.dtypes
Out[32]:
volume     object   # <---- NOTE
dtype: object

将其转换为数字:

In [29]: df['vol'] = pd.to_numeric(df.volume)

In [30]: df
Out[30]:
              volume     vol
E7 2016-08-16   1070  1070.0
   2016-11-01   1204  1204.0
G6 2016-08-17   1702  1702.0
   2016-08-18   1262  1262.0
   2016-08-26   3333  3333.0
VG 2016-08-31   1123  1123.0
   2016-09-01   1581  1581.0
   2016-09-02   1276  1276.0
   2016-09-06   2417  2417.0

In [31]: df.mean(level=0)
Out[31]:
        vol
E7  1137.00
G6  2099.00
VG  1599.25