我正在尝试创建一个新列,该列是他的第一个订购日期起的星期数。该数据是前30天的数据,因此一周的范围从w1到w4。
输入:
user_id order_date
393 15/03/19
393 16/03/19
393 23/03/19
393 24/03/19
393 25/03/19
393 28/03/19
393 29/03/19
393 30/03/19
393 31/03/19
393 05/04/19
1014 08/12/18
1014 09/12/18
1014 18/12/18
1014 20/12/18
1014 22/12/18
1014 23/12/18
1014 30/12/18
所需的输出:
user_id order_date week
393 15/03/19 w1
393 16/03/19 w1
393 23/03/19 w2
393 24/03/19 w2
393 25/03/19 w2
393 28/03/19 w2
393 29/03/19 w3
393 30/03/19 w3
393 31/03/19 w3
393 05/04/19 w4
1014 08/12/18 w1
1014 09/12/18 w1
1014 18/12/18 w2
1014 20/12/18 w2
1014 22/12/18 w3
1014 23/12/18 w3
1014 30/12/18 w4
答案 0 :(得分:2)
首先确保order_date
的类型为datetime
:
df['order_date'] = pd.to_datetime(df['order_date'], dayfirst=True)
然后您可以使用:
df['week'] = ((df.order_date - df.groupby('user_id')['order_date'].transform('first')).dt.days // 7) + 1
[输出]
user_id order_date week
0 393 2019-03-15 1
1 393 2019-03-16 1
2 393 2019-03-23 2
3 393 2019-03-24 2
4 393 2019-03-25 2
5 393 2019-03-28 2
6 393 2019-03-29 3
7 393 2019-03-30 3
8 393 2019-03-31 3
9 393 2019-04-05 4
10 1014 2018-12-08 1
11 1014 2018-12-09 1
12 1014 2018-12-18 2
13 1014 2018-12-20 2
14 1014 2018-12-22 3
15 1014 2018-12-23 3
16 1014 2018-12-30 4
如果您指定的格式很重要,请使用:
df['week'] = 'w' + df['week'].astype(str)
[输出]
user_id order_date week
0 393 2019-03-15 w1
1 393 2019-03-16 w1
2 393 2019-03-23 w2
3 393 2019-03-24 w2
4 393 2019-03-25 w2
5 393 2019-03-28 w2
6 393 2019-03-29 w3
7 393 2019-03-30 w3
8 393 2019-03-31 w3
9 393 2019-04-05 w4
10 1014 2018-12-08 w1
11 1014 2018-12-09 w1
12 1014 2018-12-18 w2
13 1014 2018-12-20 w2
14 1014 2018-12-22 w3
15 1014 2018-12-23 w3
16 1014 2018-12-30 w4
答案 1 :(得分:0)
这是一个可能的解决方案。
# your data
data = {
'user_id': [393 ,393 ,393 ,393 ,393 ,393 ,393 ,393 ,393 ,393 ,1014,1014,1014,1014,1014,1014,1014],
'order_date': ['15/03/19','16/03/19','23/03/19','24/03/19','25/03/19','28/03/19','29/03/19','30/03/19','31/03/19','05/04/19','08/12/18','09/12/18','18/12/18','20/12/18','22/12/18','23/12/18','30/12/18']
}
df = pd.DataFrame(data)
# let's use the datetime package
import datetime
# helper function to convert your string to a datetime object
def convert_to_datetime(in_string):
year, month, day = [int(v) for v in in_string.split('/')][::-1]
return datetime.date(year+2000, month, day)
# convert the string to a datetime
df.order_date = df.order_date.apply(convert_to_datetime)
# groupby user id and find the min order_date
df_min = df.groupby('user_id').agg(min).reset_index().rename(columns={'order_date': 'date_of_first_order'})
# merge with the original dateframe
df_with_min = pd.merge(df, df_min, on='user_id')
# get the number of weeks
df_with_min['weeks'] = ((df_with_min.order_date - df_with_min.date_of_first_order).dt.days//7+1)
结果print(df_with_min)
为:
user_id order_date date_of_first_order weeks
0 393 2019-03-15 2019-03-15 1
1 393 2019-03-16 2019-03-15 1
2 393 2019-03-23 2019-03-15 2
3 393 2019-03-24 2019-03-15 2
4 393 2019-03-25 2019-03-15 2
5 393 2019-03-28 2019-03-15 2
6 393 2019-03-29 2019-03-15 3
7 393 2019-03-30 2019-03-15 3
8 393 2019-03-31 2019-03-15 3
9 393 2019-04-05 2019-03-15 4
10 1014 2018-12-08 2018-12-08 1
11 1014 2018-12-09 2018-12-08 1
12 1014 2018-12-18 2018-12-08 2
13 1014 2018-12-20 2018-12-08 2
14 1014 2018-12-22 2018-12-08 3
15 1014 2018-12-23 2018-12-08 3
16 1014 2018-12-30 2018-12-08 4