将日期分为大熊猫时间范围

时间:2018-11-14 05:33:04

标签: python-3.x pandas numpy datetime

14  [2018-03-14, 2018-03-13, 2017-03-06, 2017-02-13]
15  [2017-07-26, 2017-06-09, 2017-02-24]
16  [2018-09-06, 2018-07-06, 2018-07-04, 2017-10-20]
17  [2018-10-03, 2018-09-13, 2018-09-12, 2018-08-3]
18  [2017-02-08]

这是我的数据,每个ID都有自己的日期,范围在2017-02-05至2018-06-30之间。我需要将日期分为5个时间范围,每个时间范围为4个月,以便在头4个月中,每个ID都应仅在该时间范围内(从2017-02-05到2017-06-05)具有日期,就像这样

14  [2017-03-06, 2017-02-13]
15  [2017-02-24]
16  [null] # or delete empty rows, it doesn't matter
17  [null]
18  [2017-02-08]

然后从2017年6月5日到2017年10月5日,以此类推,每4个月一次。我也不能使用嵌套的for循环,因为数据太大。这是我到目前为止尝试过的

months_4 = individual_dates.copy()

for _ in months_4['Date']:
    _ = np.where(pd.to_datetime(_) <= pd.to_datetime('2017-9-02'), _, np.datetime64('NaT'))

months_8 = individual_dates.copy()
range_8 = pd.date_range(start='2017-9-02', end='2017-11-02')

for _ in months_8['Date']:
     _ = _[np.isin(_, range_8)]

绝对没有结果,无论什么数据都保持不变

更新:我照你说的做

individual_dates['Date'] = individual_dates['Date'].str.strip('[]').str.split(', ')


df = pd.DataFrame({

'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())), 
'ID' : individual_dates['ClientId'].repeat(individual_dates['Date'].str.len())

})

df

这是结果

Date    ID
0   '2018-06-30T00:00:00.000000000' '2018-06-29T00...   14
1   '2017-03-28T00:00:00.000000000' '2017-03-27T00...   15
2   '2018-03-14T00:00:00.000000000' '2018-03-13T00...   16
3   '2017-12-14T00:00:00.000000000' '2017-03-28T00...   17
4   '2017-05-30T00:00:00.000000000' '2017-05-22T00...   18
5   '2017-03-28T00:00:00.000000000' '2017-03-27T00...   19
6   '2017-03-27T00:00:00.000000000' '2017-03-26T00...   20
7   '2017-12-15T00:00:00.000000000' '2017-11-20T00...   21
8   '2017-07-05T00:00:00.000000000' '2017-07-04T00...   22
9   '2017-12-12T00:00:00.000000000' '2017-04-06T00...   23
10  '2017-05-21T00:00:00.000000000' '2017-05-07T00...   24

1 个答案:

答案 0 :(得分:0)

为了获得更好的性能,我建议将列表转换为列-将其展平,然后用isinboolean indexing进行过滤:

from itertools import chain

df = pd.DataFrame({
    'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())), 
    'ID' : individual_dates['ID'].repeat(individual_dates['Date'].str.len())
})

range_8 = pd.date_range(start='2017-02-05', end='2017-06-05')

df['Date'] = pd.to_datetime(df['Date'])

df = df[df['Date'].isin(range_8)]
print (df)
        Date  ID
0 2017-03-06  14
0 2017-02-13  14
1 2017-02-24  15
4 2017-02-08  18