特定日期的熊猫date_range

时间:2018-08-15 09:09:52

标签: pandas

我想要一个日期范围,其中每个月的日期都与开始日期相同,例如,如果开始日期是2018-05-16,我想获得['2018-09-15,2018 -10-15,...]

我在Python3中具有以下代码:

(pd.date_range(start=date, periods=12, freq='M') \
+ pd.DateOffset(days=datetime.strptime(date, '%Y-%m-%d').day)).strftime('%d-%m-%Y')

当月份中的某天少于 29 时,它可以正常工作,例如date = '2018-08-31'输出:

 array(['01-10-2018', '31-10-2018', '01-12-2018',
'31-12-2018', '31-01-2019', '03-03-2019', 
'31-03-2019', '01-05-2019', '31-05-2019', 
'01-07-2019', '31-07-2019', '31-08-2019'], dtype='|S10')

但是,我希望输出为:

array(['30-09-2018', '31-10-2018', '30-11-2018', 
'31-12-2018', '31-01-2019', '28-02-2019', 
'31-03-2019', '30-04-2019', '31-05-2019', 
'30-06-2019', '31-07-2019', '31-08-2019'], dtype='|S10')

2 个答案:

答案 0 :(得分:0)

更新后的答案:

对于在开始日期(或该月的最后一个可行的日期,考虑到不同的月份和leap年的天数)中给出的每月某个特定日期的每月频率的日期范围,此功能应该有效,至少每月一次:

import pandas as pd

def month_range_day(start=None, periods=None):
    start_date = pd.Timestamp(start).date()
    month_range = pd.date_range(start=start_date, periods=periods, freq='M')
    month_day = month_range.day.values
    month_day[start_date.day < month_day] = start_date.day
    return pd.to_datetime(month_range.year*10000+month_range.month*100+month_day, format='%Y%m%d')

示例1

start_date = '2020-01-31'
month_range_day(start=start_date, periods=12)

输出:

DatetimeIndex(['2020-01-31', '2020-02-29', '2020-03-31', '2020-04-30',
               '2020-05-31', '2020-06-30', '2020-07-31', '2020-08-31',
               '2020-09-30', '2020-10-31', '2020-11-30', '2020-12-31'],
              dtype='datetime64[ns]', freq=None) 

示例2:

start_date = '2019-01-29'
month_range_day(start=start_date, periods=12)

输出:

DatetimeIndex(['2019-01-29', '2019-02-28', '2019-03-29', '2019-04-29',
               '2019-05-29', '2019-06-29', '2019-07-29', '2019-08-29',
               '2019-09-29', '2019-10-29', '2019-11-29', '2019-12-29'],
              dtype='datetime64[ns]', freq=None)

上一个答案:

假设您只需要月末频率,则无需使用pd.DateOffset

import pandas as pd
start_date = '2018-09-01'
pd.date_range(start=start_date, periods=12, freq='M').strftime('%d-%m-%Y')

输出:

Index(['30-09-2018', '31-10-2018', '30-11-2018', '31-12-2018', '31-01-2019',
       '28-02-2019', '31-03-2019', '30-04-2019', '31-05-2019', '30-06-2019',
       '31-07-2019', '31-08-2019'],
      dtype='object')

有关更多详细信息,请查看pandas中的offset aliases。如有必要,更改数据格式和类型应从此处直接进行。

答案 1 :(得分:0)

为什么不仅仅删除第0个元素?

date = '2018-08-31'
(pd.date_range(
    start = date,
    periods = 12+1,
    freq ='M')
).strftime('%d-%m-%Y')[1:]

输出:

Index(['30-09-2018', '31-10-2018', '30-11-2018', '31-12-2018', '31-01-2019',
       '28-02-2019', '31-03-2019', '30-04-2019', '31-05-2019', '30-06-2019',
       '31-07-2019', '31-08-2019'],
  dtype='object')