我有以下pandas数据帧:
In [8]: dfalph.head()
Out[8]: token year uses books
386 xanthos 1830 3 3
387 xanthos 1840 1 1
388 xanthos 1840 2 2
389 xanthos 1868 2 2
390 xanthos 1875 1 1
我使用重复令牌聚合行,并且如下所示:
In [63]: dfalph = dfalph[['token', 'year', 'uses', 'books']].groupby(['token', 'year']).agg([np.sum])
dfalph.columns = dfalph.columns.droplevel(1)
dfalph.head()
Out[63]: uses books
token year
xanthos 1830 3 3
1840 3 3
1867 2 2
1868 2 2
1875 1 1
我希望将它们返回到列并具有整数索引,而不是在索引中包含'token'和'year'字段。
答案 0 :(得分:51)
方法#1:reset_index()
>>> g
uses books
sum sum
token year
xanthos 1830 3 3
1840 3 3
1868 2 2
1875 1 1
[4 rows x 2 columns]
>>> g = g.reset_index()
>>> g
token year uses books
sum sum
0 xanthos 1830 3 3
1 xanthos 1840 3 3
2 xanthos 1868 2 2
3 xanthos 1875 1 1
[4 rows x 4 columns]
方法#2:不要使用as_index=False
>>> g = dfalph[['token', 'year', 'uses', 'books']].groupby(['token', 'year'], as_index=False).sum()
>>> g
token year uses books
0 xanthos 1830 3 3
1 xanthos 1840 3 3
2 xanthos 1868 2 2
3 xanthos 1875 1 1
[4 rows x 4 columns]
答案 1 :(得分:2)
我推迟接受的答案。
尽管有两种方法可以做到这一点,但不一定会产生相同的输出。特别是当您在Grouper
groupby
时
index=False
reset_index()
示例df
+---------+---------+-------------+------------+
| column1 | column2 | column_date | column_sum |
+---------+---------+-------------+------------+
| A | M | 26-10-2018 | 2 |
| B | M | 28-10-2018 | 3 |
| A | M | 30-10-2018 | 6 |
| B | M | 01-11-2018 | 3 |
| C | N | 03-11-2018 | 4 |
+---------+---------+-------------+------------+
它们的工作方式不同。
df = df.groupby(
by=[
'column1',
'column2',
pd.Grouper(key='column_date', freq='M')
],
as_index=False
).sum()
以上将给出
+---------+---------+------------+
| column1 | column2 | column_sum |
+---------+---------+------------+
| A | M | 8 |
| B | M | 3 |
| B | M | 3 |
| C | N | 4 |
+---------+---------+------------+
而
df = df.groupby(
by=[
'column1',
'column2',
pd.Grouper(key='column_date', freq='M')
]
).sum().reset_index()
会给予
+---------+---------+-------------+------------+
| column1 | column2 | column_date | column_sum |
+---------+---------+-------------+------------+
| A | M | 31-10-2018 | 8 |
| B | M | 31-10-2018 | 3 |
| B | M | 30-11-2018 | 3 |
| C | N | 30-11-2018 | 4 |
+---------+---------+-------------+------------+
答案 2 :(得分:0)
您需要添加drop=True
:
df.reset_index(drop=True)
df = df.groupby(
by=[
'column1',
'column2',
pd.Grouper(key='column_date', freq='M')
]
).sum().reset_index(drop=True)