Question

我有一个pandas数据框的订单：

OrderID OrderDate   Value   CustomerID
1       2017-11-01  12.56   23
2       2017-11-06  1.56    23
3       2017-11-08  2.67    23
4       2017-11-12  5.67    99
5       2017-11-13  7.88    23
6       2017-11-19  3.78    99

让我们看看ID为23的客户。他在历史上的第一个订单是2017-11-01。这个日期是他第一周的开始日期。这意味着他在2017-11-01和2017-11-07之间的所有订单都被分配到他的第1周（这不是像星期一到星期日那样的日历周）。对于ID为99的客户，第一周开始时为2017-11-12，因为这是他的第一个订单（OrderId 6）的日期。

我需要将表的每个顺序分配给公共表Periods的相应索引。期间[0]将包含来自客户的第1周的订单，来自客户的第2周的期间[1]等。 OrderId 1 nad OrderId 6将与Periods表的索引相同，因为这两个订单都是在客户的第一周创建的。

期间表包含订单ID必须如下所示：周期= [[1,2,4]，[3,5,6]]

Answer 1

这是你想要的吗？

df['New']=df.groupby('CustomerID').OrderDate.apply(lambda x : (x-x.iloc[0]).dt.days//7)
df.groupby('New').OrderID.apply(list)
Out[1079]: 
New
0    [1, 2, 4]
1    [3, 5, 6]
Name: OrderID, dtype: object

获取您的期间表

df.groupby('New').OrderID.apply(list).tolist()
Out[1080]: [[1, 2, 4], [3, 5, 6]]

更多信息

df
Out[1081]: 
   OrderID  OrderDate  Value  CustomerID  New
0        1 2017-11-01  12.56          23    0
1        2 2017-11-06   1.56          23    0
2        3 2017-11-08   2.67          23    1
3        4 2017-11-12   5.67          99    0
4        5 2017-11-13   7.88          23    1
5        6 2017-11-19   3.78          99    1

基于第一次出现的日期

1 个答案: