Question

我正在尝试创建一个新列，该列是他的第一个订购日期起的星期数。该数据是前30天的数据，因此一周的范围从w1到w4。

输入：

user_id order_date
393   15/03/19
393   16/03/19
393   23/03/19
393   24/03/19
393   25/03/19
393   28/03/19
393  29/03/19
393  30/03/19
393  31/03/19
393  05/04/19
1014    08/12/18
1014    09/12/18
1014    18/12/18
1014    20/12/18
1014    22/12/18
1014    23/12/18
1014    30/12/18

所需的输出：

user_id order_date  week
393 15/03/19       w1
393 16/03/19       w1
393 23/03/19       w2
393 24/03/19       w2
393 25/03/19       w2
393 28/03/19       w2
393 29/03/19       w3
393 30/03/19       w3
393 31/03/19       w3
393 05/04/19       w4
1014    08/12/18    w1
1014    09/12/18    w1
1014    18/12/18    w2
1014    20/12/18    w2
1014    22/12/18    w3
1014    23/12/18    w3
1014    30/12/18    w4

Answer 1

首先确保order_date的类型为datetime：

df['order_date'] = pd.to_datetime(df['order_date'], dayfirst=True)

然后您可以使用：

df['week'] = ((df.order_date - df.groupby('user_id')['order_date'].transform('first')).dt.days // 7) + 1

[输出]

    user_id order_date  week
0       393 2019-03-15     1
1       393 2019-03-16     1
2       393 2019-03-23     2
3       393 2019-03-24     2
4       393 2019-03-25     2
5       393 2019-03-28     2
6       393 2019-03-29     3
7       393 2019-03-30     3
8       393 2019-03-31     3
9       393 2019-04-05     4
10     1014 2018-12-08     1
11     1014 2018-12-09     1
12     1014 2018-12-18     2
13     1014 2018-12-20     2
14     1014 2018-12-22     3
15     1014 2018-12-23     3
16     1014 2018-12-30     4

如果您指定的格式很重要，请使用：

df['week'] = 'w' + df['week'].astype(str)

[输出]

    user_id order_date week
0       393 2019-03-15   w1
1       393 2019-03-16   w1
2       393 2019-03-23   w2
3       393 2019-03-24   w2
4       393 2019-03-25   w2
5       393 2019-03-28   w2
6       393 2019-03-29   w3
7       393 2019-03-30   w3
8       393 2019-03-31   w3
9       393 2019-04-05   w4
10     1014 2018-12-08   w1
11     1014 2018-12-09   w1
12     1014 2018-12-18   w2
13     1014 2018-12-20   w2
14     1014 2018-12-22   w3
15     1014 2018-12-23   w3
16     1014 2018-12-30   w4

Answer 2

这是一个可能的解决方案。

# your data
data = {
    'user_id': [393 ,393 ,393 ,393 ,393 ,393 ,393 ,393 ,393 ,393 ,1014,1014,1014,1014,1014,1014,1014],
    'order_date': ['15/03/19','16/03/19','23/03/19','24/03/19','25/03/19','28/03/19','29/03/19','30/03/19','31/03/19','05/04/19','08/12/18','09/12/18','18/12/18','20/12/18','22/12/18','23/12/18','30/12/18']
}
df = pd.DataFrame(data)

# let's use the datetime package
import datetime
# helper function to convert your string to a datetime object
def convert_to_datetime(in_string):
    year, month, day = [int(v) for v in in_string.split('/')][::-1]
    return datetime.date(year+2000, month, day)
# convert the string to a datetime
df.order_date = df.order_date.apply(convert_to_datetime)

# groupby user id and find the min order_date
df_min = df.groupby('user_id').agg(min).reset_index().rename(columns={'order_date': 'date_of_first_order'})
# merge with the original dateframe
df_with_min = pd.merge(df, df_min, on='user_id')
# get the number of weeks
df_with_min['weeks'] = ((df_with_min.order_date - df_with_min.date_of_first_order).dt.days//7+1)

结果print(df_with_min)为：

    user_id  order_date date_of_first_order  weeks
0       393  2019-03-15          2019-03-15      1
1       393  2019-03-16          2019-03-15      1
2       393  2019-03-23          2019-03-15      2
3       393  2019-03-24          2019-03-15      2
4       393  2019-03-25          2019-03-15      2
5       393  2019-03-28          2019-03-15      2
6       393  2019-03-29          2019-03-15      3
7       393  2019-03-30          2019-03-15      3
8       393  2019-03-31          2019-03-15      3
9       393  2019-04-05          2019-03-15      4
10     1014  2018-12-08          2018-12-08      1
11     1014  2018-12-09          2018-12-08      1
12     1014  2018-12-18          2018-12-08      2
13     1014  2018-12-20          2018-12-08      2
14     1014  2018-12-22          2018-12-08      3
15     1014  2018-12-23          2018-12-08      3
16     1014  2018-12-30          2018-12-08      4

新建一个列，该列是每个用户从第一个订购日期开始的星期几？

2 个答案: