我有这样的DataFrame:
exiftool -overwrite_original -all= {}
我必须按 product_id dt products_qty
70063 2964562 2017-11-14 0.000
72719 2964562 2017-11-15 2.000
401533 2964562 2017-11-16 0.000
413201 2964562 2017-11-17 0.000
424227 2964562 2017-11-18 0.000
450345 2964733 2017-11-14 4.000
470446 2964733 2017-11-17 0.000
473233 2964733 2017-11-18 0.000
列对数据框进行分组,并计算最后重复的行数。例如,即使与最后一行重复,我们也不计算70063行。所以输出应该是这样的:
product_id
答案 0 :(得分:0)
使用:
#create unique consecutives values by products_qty
a = df['products_qty'].ne(df['products_qty'].shift()).cumsum()
#get lens of each group
b = df.groupby([df['product_id'], a]).size()
#filter out unique groups and get last row
df = b[b > 1].groupby(level=0).last().reset_index(name='count')
print (df)
product_id count
0 2964562 3
1 2964733 2
详情:
print (a)
70063 1
72719 2
401533 3
413201 3
424227 3
450345 4
470446 5
473233 5
Name: products_qty, dtype: int32
print (b)
product_id products_qty
2964562 1 1
2 1
3 3
2964733 4 1
5 2
dtype: int64