熊猫python删除组的最后一行

时间:2020-06-17 10:55:01

标签: python pandas pandas-groupby

我需要删除每个组的最后一个成员,因为这会使进一步的计算混乱。我不知道如何更好地解释我的问题,但是如果您需要进一步说明,请提出疑问。

我当前的代码:

 sampleDataUser = sampleData.groupby('user').filter(lambda x: x != sampleDataUser.tail(1))

返回此错误:

  ValueError: Can only compare identically-labeled DataFrame objects

样本数据:

df = [{ "user" : "seth", var1 = "5"}, {"user": "seth", "var1" : "8"}, {"user" : "chris", "var1" : "2"}]

预期输出:

df = [{ "user" : "seth", var1 = "5"}, {"user" : "chris", "var1" : "2"}]

1 个答案:

答案 0 :(得分:0)

要删除user的最后一行(如果有重复的话),请使用|链接的Series.duplicated进行按位OR进行掩码,并按boolean indexing进行过滤:

df = pd.DataFrame([{ "user" : "seth", "var1" : "50"},
                   { "user" : "seth", "var1" : "5"}, 
                   {"user": "seth", "var1" : "8"}, 
                   {"user" : "chris", "var1" : "2"}])
print (df)
    user var1
0   seth   50
1   seth    5
2   seth    8
3  chris    2

df = df[df['user'].duplicated(keep='last') | ~df['user'].duplicated(keep=False)]
print (df)
    user var1
0   seth   50
1   seth    5
3  chris    2

详细信息

print (df.assign(m1 = df['user'].duplicated(keep='last'),
                 m2 = ~df['user'].duplicated(keep=False),
                 both = df['user'].duplicated(keep='last') | 
                       ~df['user'].duplicated(keep=False)))
    user var1     m1     m2   both
0   seth   50   True  False   True
1   seth    5   True  False   True
2   seth    8  False  False  False
3  chris    2  False   True   True