pandas DataFrame中的日期间隔

时间:2015-07-14 20:49:25

标签: python pandas dataframe

我在数据框中有一列日期,表示为字符串。我想创建一个表示日期为间隔的列。我的数据看起来像这样

df1=DataFrame({'x1':[1,5,2,6,3,7,4,7,9,10,5,3,2,7,3,8,4,3,7,2,5,5,2,2],'date':['2014-01-01','2014-01-01','2014-01-01','2014-01-01','2014-01-02','2014-01-02','2014-01-03','2014-01-04','2014-01-05','2014-01-05','2014-01-05','2014-01-05','2014-01-06','2014-01-07','2014-01-07','2014-01-08','2014-01-09','2014-01-10','2014-01-10','2014-01-10','2014-01-10','2014-01-11','2014-01-12','2014-01-12']})


          date  x1
0   2014-01-01   1
1   2014-01-01   5
2   2014-01-01   2
3   2014-01-01   6
4   2014-01-02   3
5   2014-01-02   7
6   2014-01-03   4
7   2014-01-04   7
8   2014-01-05   9
9   2014-01-05  10
10  2014-01-05   5
11  2014-01-05   3
12  2014-01-06   2
13  2014-01-07   7
14  2014-01-07   3
15  2014-01-08   8
16  2014-01-09   4
17  2014-01-10   3
18  2014-01-10   7
19  2014-01-10   2
20  2014-01-10   5
21  2014-01-11   5
22  2014-01-12   2
23  2014-01-12   2

但我希望它看起来像这样

 df['level']=['a','a','a','a','a','a','a','a','b','b','b','b','b','b','b','b','c','c','c','c','c','c','c','c']

          date  x1 level
 0   2014-01-01   1     a
 1   2014-01-01   5     a
 2   2014-01-01   2     a
 3   2014-01-01   6     a
 4   2014-01-02   3     a
 5   2014-01-02   7     a
 6   2014-01-03   4     a
 7   2014-01-04   7     a
 8   2014-01-05   9     b
 9   2014-01-05  10     b
 10  2014-01-05   5     b
 11  2014-01-05   3     b
 12  2014-01-06   2     b
 13  2014-01-07   7     b
 14  2014-01-07   3     b
 15  2014-01-08   8     b
 16  2014-01-09   4     c
 17  2014-01-10   3     c
 18  2014-01-10   7     c
 19  2014-01-10   2     c
 20  2014-01-10   5     c
 21  2014-01-11   5     c
 22  2014-01-12   2     c
 23  2014-01-12   2     c

其中a代表时间间隔['2014-01-01','2014-01-04],b代表['2014-01-05,'2014-01-08'],c代表[' 2014-01-09','2014-01-12']

2 个答案:

答案 0 :(得分:1)

一种方法是定义级别掩码并设置级别列值,我已将'日期'列转换为日期时间dtype以便于比较:

In [61]:
df1['date'] = pd.to_datetime(df1['date'])
a_mask = (df1['date']>='2014-01-01') & (df1['date']<='2014-01-04')
b_mask = (df1['date']>='2014-01-05') & (df1['date']<='2014-01-08')
c_mask = (df1['date']>='2014-01-09') & (df1['date']<='2014-01-12')
df1.loc[a_mask, 'level'] = 'a'
df1.loc[b_mask, 'level'] = 'b'
df1.loc[c_mask, 'level'] = 'c'
df1

Out[61]:
         date  x1 level
0  2014-01-01   1     a
1  2014-01-01   5     a
2  2014-01-01   2     a
3  2014-01-01   6     a
4  2014-01-02   3     a
5  2014-01-02   7     a
6  2014-01-03   4     a
7  2014-01-04   7     a
8  2014-01-05   9     b
9  2014-01-05  10     b
10 2014-01-05   5     b
11 2014-01-05   3     b
12 2014-01-06   2     b
13 2014-01-07   7     b
14 2014-01-07   3     b
15 2014-01-08   8     b
16 2014-01-09   4     c
17 2014-01-10   3     c
18 2014-01-10   7     c
19 2014-01-10   2     c
20 2014-01-10   5     c
21 2014-01-11   5     c
22 2014-01-12   2     c
23 2014-01-12   2     c

答案 1 :(得分:0)

您可以使用bisect module

进行地图制作
import bisect

dates = ['2014-01-04', '2014-01-08', '2014-01-12']
df1['level'] = df1.date.apply(lambda d: chr(ord('a') + bisect.bisect_left(dates, d)))
>> df1
          date  x1 level
0   2014-01-01   1     a
1   2014-01-01   5     a
2   2014-01-01   2     a
3   2014-01-01   6     a
4   2014-01-02   3     a
5   2014-01-02   7     a
6   2014-01-03   4     a
7   2014-01-04   7     a
8   2014-01-05   9     b
9   2014-01-05  10     b
10  2014-01-05   5     b
11  2014-01-05   3     b
12  2014-01-06   2     b
13  2014-01-07   7     b
14  2014-01-07   3     b
15  2014-01-08   8     b
16  2014-01-09   4     c
17  2014-01-10   3     c
18  2014-01-10   7     c
19  2014-01-10   2     c
20  2014-01-10   5     c
21  2014-01-11   5     c
22  2014-01-12   2     c
23  2014-01-12   2     c