Python / Pandas - 连续计算最后一个重复行

时间:2017-11-19 14:25:54

标签: python pandas dataframe data-analysis

我有这样的DataFrame:

exiftool -overwrite_original -all= {}

我必须按 product_id dt products_qty 70063 2964562 2017-11-14 0.000 72719 2964562 2017-11-15 2.000 401533 2964562 2017-11-16 0.000 413201 2964562 2017-11-17 0.000 424227 2964562 2017-11-18 0.000 450345 2964733 2017-11-14 4.000 470446 2964733 2017-11-17 0.000 473233 2964733 2017-11-18 0.000 列对数据框进行分组,并计算最后重复的行数。例如,即使与最后一行重复,我们也不计算70063行。所以输出应该是这样的:

product_id

1 个答案:

答案 0 :(得分:0)

使用:

#create unique consecutives values by products_qty
a = df['products_qty'].ne(df['products_qty'].shift()).cumsum()
#get lens of each group
b = df.groupby([df['product_id'], a]).size() 
#filter out unique groups and get last row
df = b[b > 1].groupby(level=0).last().reset_index(name='count')
print (df)
   product_id  count
0     2964562      3
1     2964733      2

详情:

print (a)
70063     1
72719     2
401533    3
413201    3
424227    3
450345    4
470446    5
473233    5
Name: products_qty, dtype: int32
print (b)
product_id  products_qty
2964562     1               1
            2               1
            3               3
2964733     4               1
            5               2
dtype: int64