Question

我正在尝试按sales_orders，sales_orders_item，delivery_date和billing_block分组。我每个月都有这四列的数据集。然后，我的目标是找到所有月份中的前delivery_date和latest_delivery_date，看看在delivery_date差异较大的月份中是否存在某种模式。

    cars = {'sales_orders': ['A','A','B','B' ,'C','D'],
        'sales_orders_item': [10,10,10,10,10,10],
        'delivery_date':['2020-01-01', '2020-05-31','2020-03-31','2020-04-30','2020-04-15','2020-05-05'],
        'billing_block':[1,2,1,3,4,5],
        'Amount':[100,200,300,400,500,600]
        }

df = pd.DataFrame(cars, columns = ['sales_orders', 'sales_orders_item','delivery_date','billing_block', 'Amount'])

print (df)

cols = ['sales_orders', 'sales_orders_item', 'billing_block']

df = df.groupby(cols, as_index = False)

df = df.aggregate({'delivery_date': [np.min, np.max]})

这对于获取日期很好，但是问题是billing_block可能正在进行更改，从而导致行重复。

如何从最高的billing_block获取delivery_date，以避免重复的行？在上述情况下，sales_order'A'应该具有billing_block 2，sales_order B应该具有3。

熊猫集团通过获取最后的价值

0 个答案: