我需要删除每个组的最后一个成员,因为这会使进一步的计算混乱。我不知道如何更好地解释我的问题,但是如果您需要进一步说明,请提出疑问。
我当前的代码:
sampleDataUser = sampleData.groupby('user').filter(lambda x: x != sampleDataUser.tail(1))
返回此错误:
ValueError: Can only compare identically-labeled DataFrame objects
样本数据:
df = [{ "user" : "seth", var1 = "5"}, {"user": "seth", "var1" : "8"}, {"user" : "chris", "var1" : "2"}]
预期输出:
df = [{ "user" : "seth", var1 = "5"}, {"user" : "chris", "var1" : "2"}]
答案 0 :(得分:0)
要删除user
的最后一行(如果有重复的话),请使用|
链接的Series.duplicated
进行按位OR
进行掩码,并按boolean indexing
进行过滤:
df = pd.DataFrame([{ "user" : "seth", "var1" : "50"},
{ "user" : "seth", "var1" : "5"},
{"user": "seth", "var1" : "8"},
{"user" : "chris", "var1" : "2"}])
print (df)
user var1
0 seth 50
1 seth 5
2 seth 8
3 chris 2
df = df[df['user'].duplicated(keep='last') | ~df['user'].duplicated(keep=False)]
print (df)
user var1
0 seth 50
1 seth 5
3 chris 2
详细信息:
print (df.assign(m1 = df['user'].duplicated(keep='last'),
m2 = ~df['user'].duplicated(keep=False),
both = df['user'].duplicated(keep='last') |
~df['user'].duplicated(keep=False)))
user var1 m1 m2 both
0 seth 50 True False True
1 seth 5 True False True
2 seth 8 False False False
3 chris 2 False True True