Question

我有以下pandas数据帧：

In [8]:  dfalph.head()

Out[8]:         token    year    uses  books
         386    xanthos  1830    3     3
         387    xanthos  1840    1     1
         388    xanthos  1840    2     2
         389    xanthos  1868    2     2
         390    xanthos  1875    1     1

我使用重复令牌聚合行，并且如下所示：

In [63]:  dfalph = dfalph[['token', 'year', 'uses', 'books']].groupby(['token', 'year']).agg([np.sum])
          dfalph.columns = dfalph.columns.droplevel(1)
          dfalph.head()

Out[63]:                 uses  books
          token    year     
          xanthos  1830  3     3
                   1840  3     3
                   1867  2     2
                   1868  2     2
                   1875  1     1

我希望将它们返回到列并具有整数索引，而不是在索引中包含'token'和'year'字段。

Answer 1

方法＃1：reset_index()

>>> g
              uses  books
               sum    sum
token   year             
xanthos 1830     3      3
        1840     3      3
        1868     2      2
        1875     1      1

[4 rows x 2 columns]
>>> g = g.reset_index()
>>> g
     token  year  uses  books
                   sum    sum
0  xanthos  1830     3      3
1  xanthos  1840     3      3
2  xanthos  1868     2      2
3  xanthos  1875     1      1

[4 rows x 4 columns]

方法＃2：不要使用as_index=False

首先制作索引

>>> g = dfalph[['token', 'year', 'uses', 'books']].groupby(['token', 'year'], as_index=False).sum()
>>> g
     token  year  uses  books
0  xanthos  1830     3      3
1  xanthos  1840     3      3
2  xanthos  1868     2      2
3  xanthos  1875     1      1

[4 rows x 4 columns]

Answer 2

我推迟接受的答案。尽管有两种方法可以做到这一点，但不一定会产生相同的输出。特别是当您在Grouper

中使用groupby时

index=False
reset_index()

示例df

+---------+---------+-------------+------------+
| column1 | column2 | column_date | column_sum |
+---------+---------+-------------+------------+
| A       | M       | 26-10-2018  |          2 |
| B       | M       | 28-10-2018  |          3 |
| A       | M       | 30-10-2018  |          6 |
| B       | M       | 01-11-2018  |          3 |
| C       | N       | 03-11-2018  |          4 |
+---------+---------+-------------+------------+

它们的工作方式不同。

df = df.groupby(
    by=[
        'column1',
        'column2',
        pd.Grouper(key='column_date', freq='M')
    ],
    as_index=False
).sum()

以上将给出

+---------+---------+------------+
| column1 | column2 | column_sum |
+---------+---------+------------+
| A       | M       |          8 |
| B       | M       |          3 |
| B       | M       |          3 |
| C       | N       |          4 |
+---------+---------+------------+

而

df = df.groupby(
    by=[
        'column1',
        'column2',
        pd.Grouper(key='column_date', freq='M')
    ]
).sum().reset_index()

会给予

+---------+---------+-------------+------------+
| column1 | column2 | column_date | column_sum |
+---------+---------+-------------+------------+
| A       | M       | 31-10-2018  |          8 |
| B       | M       | 31-10-2018  |          3 |
| B       | M       | 30-11-2018  |          3 |
| C       | N       | 30-11-2018  |          4 |
+---------+---------+-------------+------------+

Answer 3

您需要添加drop=True：

df.reset_index(drop=True)

df = df.groupby(
    by=[
        'column1',
        'column2',
        pd.Grouper(key='column_date', freq='M')
    ]
).sum().reset_index(drop=True)

如何在多个group by之后将pandas数据从索引移动到列

3 个答案: