我在数据框中有一列日期,表示为字符串。我想创建一个表示日期为间隔的列。我的数据看起来像这样
df1=DataFrame({'x1':[1,5,2,6,3,7,4,7,9,10,5,3,2,7,3,8,4,3,7,2,5,5,2,2],'date':['2014-01-01','2014-01-01','2014-01-01','2014-01-01','2014-01-02','2014-01-02','2014-01-03','2014-01-04','2014-01-05','2014-01-05','2014-01-05','2014-01-05','2014-01-06','2014-01-07','2014-01-07','2014-01-08','2014-01-09','2014-01-10','2014-01-10','2014-01-10','2014-01-10','2014-01-11','2014-01-12','2014-01-12']})
date x1
0 2014-01-01 1
1 2014-01-01 5
2 2014-01-01 2
3 2014-01-01 6
4 2014-01-02 3
5 2014-01-02 7
6 2014-01-03 4
7 2014-01-04 7
8 2014-01-05 9
9 2014-01-05 10
10 2014-01-05 5
11 2014-01-05 3
12 2014-01-06 2
13 2014-01-07 7
14 2014-01-07 3
15 2014-01-08 8
16 2014-01-09 4
17 2014-01-10 3
18 2014-01-10 7
19 2014-01-10 2
20 2014-01-10 5
21 2014-01-11 5
22 2014-01-12 2
23 2014-01-12 2
但我希望它看起来像这样
df['level']=['a','a','a','a','a','a','a','a','b','b','b','b','b','b','b','b','c','c','c','c','c','c','c','c']
date x1 level
0 2014-01-01 1 a
1 2014-01-01 5 a
2 2014-01-01 2 a
3 2014-01-01 6 a
4 2014-01-02 3 a
5 2014-01-02 7 a
6 2014-01-03 4 a
7 2014-01-04 7 a
8 2014-01-05 9 b
9 2014-01-05 10 b
10 2014-01-05 5 b
11 2014-01-05 3 b
12 2014-01-06 2 b
13 2014-01-07 7 b
14 2014-01-07 3 b
15 2014-01-08 8 b
16 2014-01-09 4 c
17 2014-01-10 3 c
18 2014-01-10 7 c
19 2014-01-10 2 c
20 2014-01-10 5 c
21 2014-01-11 5 c
22 2014-01-12 2 c
23 2014-01-12 2 c
其中a代表时间间隔['2014-01-01','2014-01-04],b代表['2014-01-05,'2014-01-08'],c代表[' 2014-01-09','2014-01-12']
答案 0 :(得分:1)
一种方法是定义级别掩码并设置级别列值,我已将'日期'列转换为日期时间dtype以便于比较:
In [61]:
df1['date'] = pd.to_datetime(df1['date'])
a_mask = (df1['date']>='2014-01-01') & (df1['date']<='2014-01-04')
b_mask = (df1['date']>='2014-01-05') & (df1['date']<='2014-01-08')
c_mask = (df1['date']>='2014-01-09') & (df1['date']<='2014-01-12')
df1.loc[a_mask, 'level'] = 'a'
df1.loc[b_mask, 'level'] = 'b'
df1.loc[c_mask, 'level'] = 'c'
df1
Out[61]:
date x1 level
0 2014-01-01 1 a
1 2014-01-01 5 a
2 2014-01-01 2 a
3 2014-01-01 6 a
4 2014-01-02 3 a
5 2014-01-02 7 a
6 2014-01-03 4 a
7 2014-01-04 7 a
8 2014-01-05 9 b
9 2014-01-05 10 b
10 2014-01-05 5 b
11 2014-01-05 3 b
12 2014-01-06 2 b
13 2014-01-07 7 b
14 2014-01-07 3 b
15 2014-01-08 8 b
16 2014-01-09 4 c
17 2014-01-10 3 c
18 2014-01-10 7 c
19 2014-01-10 2 c
20 2014-01-10 5 c
21 2014-01-11 5 c
22 2014-01-12 2 c
23 2014-01-12 2 c
答案 1 :(得分:0)
您可以使用bisect
module:
import bisect
dates = ['2014-01-04', '2014-01-08', '2014-01-12']
df1['level'] = df1.date.apply(lambda d: chr(ord('a') + bisect.bisect_left(dates, d)))
>> df1
date x1 level
0 2014-01-01 1 a
1 2014-01-01 5 a
2 2014-01-01 2 a
3 2014-01-01 6 a
4 2014-01-02 3 a
5 2014-01-02 7 a
6 2014-01-03 4 a
7 2014-01-04 7 a
8 2014-01-05 9 b
9 2014-01-05 10 b
10 2014-01-05 5 b
11 2014-01-05 3 b
12 2014-01-06 2 b
13 2014-01-07 7 b
14 2014-01-07 3 b
15 2014-01-08 8 b
16 2014-01-09 4 c
17 2014-01-10 3 c
18 2014-01-10 7 c
19 2014-01-10 2 c
20 2014-01-10 5 c
21 2014-01-11 5 c
22 2014-01-12 2 c
23 2014-01-12 2 c