python如何查找从2019年12月开始的每个月的天数,并在两个日期列之间转发

时间:2019-12-10 20:29:47

标签: python pandas numpy dataframe

我有两个日期列“ StartDate”和“ EndDate”。我想查找从2019年12月起的两个日期之间每个月的天数,然后忽略2019年之前的任何几个月进行计算。每行的StartDate和EndDate可以跨越2年,并且月份重叠,并且Date列也可以为空。

样本数据:

df = {'Id': ['1','2','3','4','5','6','7', '8'],
      'Item': ['A','B','C','D','E','F','G', 'H'],
        'StartDate': ['2019-12-10', '2019-12-01', '2019-10-01', '2020-01-01', '2019-03-01','2019-03-01','2019-10-01', ''],
        'EndDate': ['2020-02-21' ,'2020-01-01','2020-08-31','2020-01-30','2019-12-31','2019-12-31','2020-08-31', '']
        }
df = pd.DataFrame(df,columns= ['Id', 'Item','StartDate','EndDate'])

预期的O / P:

1

以下解决方案部分有效。

df['StartDate'] = pd.to_datetime(df['StartDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])

def days_of_month(x):
    s = pd.date_range(*x, freq='D').to_series()
    return s.resample('M').count().rename(lambda x: x.month)

df1 = df[['StartDate', 'EndDate']].apply(days_of_month, axis=1).fillna(0)

df_final = df[['StartDate', 'EndDate']].join([df['StartDate'].dt.year.rename('Year'), df1])

4 个答案:

答案 0 :(得分:2)

尝试一下:

df.join(df.dropna(axis=0,how='any')
         .apply(lambda x: pd.date_range(x['StartDate'],x['EndDate'], freq='D')
         .to_frame().resample('M').count().loc['2019-12-01':].unstack(), axis=1)[0].fillna(0))

输出:

 Id Item  StartDate    EndDate  2019-12-31 00:00:00  2020-01-31 00:00:00  2020-02-29 00:00:00  2020-03-31 00:00:00  2020-04-30 00:00:00  2020-05-31 00:00:00  2020-06-30 00:00:00  2020-07-31 00:00:00  2020-08-31 00:00:00
0  1    A 2019-12-10 2020-02-21                 22.0                 31.0                 21.0                  0.0                  0.0                  0.0                  0.0                  0.0                  0.0
1  2    B 2019-12-01 2020-01-01                 31.0                  1.0                  0.0                  0.0                  0.0                  0.0                  0.0                  0.0                  0.0
2  3    C 2019-10-01 2020-08-31                 31.0                 31.0                 29.0                 31.0                 30.0                 31.0                 30.0                 31.0                 31.0
3  4    D 2020-01-01 2020-01-30                  0.0                 30.0                  0.0                  0.0                  0.0                  0.0                  0.0                  0.0                  0.0
4  5    E 2019-03-01 2019-12-31                 31.0                  0.0                  0.0                  0.0                  0.0                  0.0                  0.0                  0.0                  0.0
5  6    F 2019-03-01 2019-12-31                 31.0                  0.0                  0.0                  0.0                  0.0                  0.0                  0.0                  0.0                  0.0
6  7    G 2019-10-01 2020-08-31                 31.0                 31.0                 29.0                 31.0                 30.0                 31.0                 30.0                 31.0                 31.0
7  8    H        NaT        NaT                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN

答案 1 :(得分:2)

我们将创建两个大的DataFrame,一个在每个月初,另一个在每个月底。然后,我们将它们相应地裁剪,这给我们提供了一个简单的减法。由于您要添加结束日期,因此我们需要添加1天,然后我们清除所有负日期,该日期应为0。

import pandas as pd

df_s = pd.DataFrame([pd.date_range('2019-12-01', '2020-12-01', freq='MS').to_numpy()],
                     index=df.index)
df_e = df_s + pd.offsets.MonthEnd(1)

df_s = df_s.clip(lower=pd.to_datetime(df.StartDate), axis=0)
df_e = df_e.clip(upper=pd.to_datetime(df.EndDate), axis=0)

res = ((df_e - df_s) + pd.to_timedelta(1, 'd')).clip(lower=pd.to_timedelta(0, 'd'))
res.columns = pd.period_range(start='2019-12', end='2020-12', freq='M')

# So int or float
for col in res.columns:
    res[col] = res[col].dt.days

df = pd.concat([df, res], axis=1)

  Id Item   StartDate     EndDate  2019-12  2020-01  2020-02  2020-03  2020-04  2020-05  2020-06  2020-07  2020-08  2020-09  2020-10  2020-11  2020-12
0  1    A  2019-12-10  2020-02-21     22.0     31.0     21.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0
1  2    B  2019-12-01  2020-01-31     31.0     31.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0
2  3    C  2019-10-01  2020-08-31     31.0     31.0     29.0     31.0     30.0     31.0     30.0     31.0     31.0      0.0      0.0      0.0      0.0
3  4    D  2020-01-01  2020-01-30      0.0     30.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0
4  5    E  2019-03-01  2019-12-31     31.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0
5  6    F  2019-03-01  2019-12-31     31.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0
6  7    G  2019-10-01  2020-08-31     31.0     31.0     29.0     31.0     30.0     31.0     30.0     31.0     31.0      0.0      0.0      0.0      0.0
7  8    H                              NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN

答案 2 :(得分:2)

这是另一种方法,创建全天列表,并计算与广播的重叠:

dates = pd.date_range('2019-12-01', '2020-12-31', freq='D').values

(pd.DataFrame((df.StartDate.values <= dates[:,None]) 
              & (df.EndDate.values >= dates[:,None]),
             index=dates)
   .resample('M')
   .sum()
   .T
)

输出:

      2019-12-31 00:00:00    2020-01-31 00:00:00    2020-02-29 00:00:00    2020-03-31 00:00:00    2020-04-30 00:00:00    2020-05-31 00:00:00    2020-06-30 00:00:00    2020-07-31 00:00:00    2020-08-31 00:00:00    2020-09-30 00:00:00    2020-10-31 00:00:00    2020-11-30 00:00:00    2020-12-31 00:00:00
--  ---------------------  ---------------------  ---------------------  ---------------------  ---------------------  ---------------------  ---------------------  ---------------------  ---------------------  ---------------------  ---------------------  ---------------------  ---------------------
 0                     22                     31                     21                      0                      0                      0                      0                      0                      0                      0                      0                      0                      0
 1                     31                      1                      0                      0                      0                      0                      0                      0                      0                      0                      0                      0                      0
 2                     31                     31                     29                     31                     30                     31                     30                     31                     31                      0                      0                      0                      0
 3                      0                     30                      0                      0                      0                      0                      0                      0                      0                      0                      0                      0                      0
 4                     31                      0                      0                      0                      0                      0                      0                      0                      0                      0                      0                      0                      0
 5                     31                      0                      0                      0                      0                      0                      0                      0                      0                      0                      0                      0                      0
 6                     31                     31                     29                     31                     30                     31                     30                     31                     31                      0                      0                      0                      0
 7                      0                      0                      0                      0                      0                      0                      0                      0                      0                      0                      0                      0                      0

答案 3 :(得分:1)

使用相同的代码,将coerce添加到to_datetimedropna并更改rename部分

df['StartDate'] = pd.to_datetime(df['StartDate'], errors='coerce')
df['EndDate'] = pd.to_datetime(df['EndDate'], errors='coerce')

def days_of_month(x):
    s = pd.date_range(*x, freq='D').to_series()
    return s.resample('M').count().rename(lambda x: x.to_period(freq='M'))

df1 = (df[['StartDate', 'EndDate']].dropna().apply(days_of_month, axis=1)
                                   .reindex(df.index).fillna(0))

df_final = df.join(df1)

Out[1205]:
  Id Item  StartDate    EndDate  2019-03  2019-04  2019-05  2019-06  2019-07  \
0  1    A 2019-12-10 2020-02-21      0.0      0.0      0.0      0.0      0.0
1  2    B 2019-12-01 2020-01-01      0.0      0.0      0.0      0.0      0.0
2  3    C 2019-10-01 2020-08-31      0.0      0.0      0.0      0.0      0.0
3  4    D 2020-01-01 2020-01-30      0.0      0.0      0.0      0.0      0.0
4  5    E 2019-03-01 2019-12-31     31.0     30.0     31.0     30.0     31.0
5  6    F 2019-03-01 2019-12-31     31.0     30.0     31.0     30.0     31.0
6  7    G 2019-10-01 2020-08-31      0.0      0.0      0.0      0.0      0.0
7  8    H        NaT        NaT      0.0      0.0      0.0      0.0      0.0

   2019-08  2019-09  2019-10  2019-11  2019-12  2020-01  2020-02  2020-03  \
0      0.0      0.0      0.0      0.0     22.0     31.0     21.0      0.0
1      0.0      0.0      0.0      0.0     31.0      1.0      0.0      0.0
2      0.0      0.0     31.0     30.0     31.0     31.0     29.0     31.0
3      0.0      0.0      0.0      0.0      0.0     30.0      0.0      0.0
4     31.0     30.0     31.0     30.0     31.0      0.0      0.0      0.0
5     31.0     30.0     31.0     30.0     31.0      0.0      0.0      0.0
6      0.0      0.0     31.0     30.0     31.0     31.0     29.0     31.0
7      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0

   2020-04  2020-05  2020-06  2020-07  2020-08
0      0.0      0.0      0.0      0.0      0.0
1      0.0      0.0      0.0      0.0      0.0
2     30.0     31.0     30.0     31.0     31.0
3      0.0      0.0      0.0      0.0      0.0
4      0.0      0.0      0.0      0.0      0.0
5      0.0      0.0      0.0      0.0      0.0
6     30.0     31.0     30.0     31.0     31.0
7      0.0      0.0      0.0      0.0      0.0