通过将几个数据帧与键[a,b,c]连为索引来构造一个数据帧
+-------+----------+----------+
| Index | IndexPos | SomeData |
+-------+----------+----------+
| a | 1 | some1 |
| | 2 | some2 |
| | 3 | some3 |
| b | 1 | some1 |
| | 2 | some2 |
| | 3 | some3 |
| c | 1 | some1 |
| | 2 | some2 |
| | 3 | some3 |
+-------+----------+----------+
现在将其切成最后两个元素,例如:
df.groupby(df.index.levels[0].name).tail(2)
此后,我想重新计算剩余元素IndexPos的大小:
+-------+----------+----------+
| Index | IndexPos | SomeData |
+-------+----------+----------+
| a | 1 | some2 |
| | 2 | some3 |
| b | 1 | some2 |
| | 2 | some3 |
| c | 1 | some2 |
| | 2 | some3 |
+-------+----------+----------+
有没有办法做到这一点,还是必须在连接之前将其切片?
答案 0 :(得分:2)
首先在groupby
上使用level=0
,然后使用tail
从每个组中获取最后两行,然后在切片的数据帧上使用groupby
+ cumcount
创建一个顺序每个组的计数器,并将其设置为level=1
的新索引:
d = df.groupby(level=0).tail(2)
d = d.droplevel(1).set_index(d.groupby(level=0).cumcount().add(1), append=True)
或者在受@anky解决方案启发的情况下,使用factorize
代替groupby
+ cumcount
:
d = df.groupby(level=0).tail(2)
d = d.droplevel(1).set_index(d.index.get_level_values(1).factorize()[0] + 1, append=True)
结果:
print(d)
SomeData
Index
a 1 some2
2 some3
b 1 some2
2 some3
c 1 some2
2 some3