熊猫掉落最后一组元素

时间:2020-08-31 19:12:19

标签: python pandas

我有一个看起来像这样的DataFrame "rewrites": [ { "source": "**", "destination": "/index.html" } ],

输入:

df = pd.DataFrame({'col1': ["a","b","c","d","e", "f","g","h"], 'col2': [1,1,1,2,2,3,3,3]})

我想从“ col2”分组中删除最后一行,这看起来像是...

预期输出:

 col1 col2
0   a   1
1   b   1
2   c   1
3   d   2
4   e   2
5   f   3
6   g   3
7   h   3

我写了 col1 col2 0 a 1 1 b 1 3 d 2 5 f 3 6 g 3 ,这让我想删除什么,但是当我尝试写df.groupby('col2').tail(1)时,出现了轴错误。有什么解决办法

2 个答案:

答案 0 :(得分:2)

看起来duplicated可以工作:

df[df.duplicated('col2', keep='last') | 
   (~df.duplicated('col2', keep=False))  # this is to keep all single-row groups
  ]

或者使用您的方法,应该删除索引:

# this would also drop all single-row groups
df.drop(df.groupby('col2').tail(1).index)

输出:

  col1  col2
0    a     1
1    b     1
3    d     2
5    f     3
6    g     3

答案 1 :(得分:1)

尝试一下:

df.groupby('col2', as_index=False).apply(lambda x: x.iloc[:-1,:]).reset_index(drop=True)