将每日数据汇总到每周数据框中

时间:2020-02-11 13:26:16

标签: python pandas

我正在使用每周数据框,我想总结其中的每日数据。

我有两个数据框:

  • df:由start_date组成,start_date是一周中开始的日期。哇,这是一年中的第几周。 (每周数据)
  • df_school_vac:这是法国学校放假的日期(每日数据)

以下是我想要的,但不包含FALSE和TRUE的双引号。 enter image description here

df:

{'start_date': {0: Timestamp('2018-11-05 00:00:00'),
  1: Timestamp('2018-11-12 00:00:00'),
  2: Timestamp('2018-11-19 00:00:00'),
  3: Timestamp('2018-11-26 00:00:00'),
  4: Timestamp('2018-12-03 00:00:00'),
  5: Timestamp('2018-12-10 00:00:00'),
  6: Timestamp('2018-12-17 00:00:00'),
  7: Timestamp('2018-12-24 00:00:00'),
  8: Timestamp('2018-12-31 00:00:00'),
  9: Timestamp('2019-01-07 00:00:00'),
  10: Timestamp('2019-01-14 00:00:00'),
  11: Timestamp('2019-01-21 00:00:00'),
  12: Timestamp('2019-01-28 00:00:00')},
 'woy': {0: 45,
  1: 46,
  2: 47,
  3: 48,
  4: 49,
  5: 50,
  6: 51,
  7: 52,
  8: 1,
  9: 2,
  10: 3,
  11: 4,
  12: 5}}

df_school_vac:

{'timestamp_area_A': {0: Timestamp('2018-12-22 00:00:00'),
  1: Timestamp('2018-12-23 00:00:00'),
  2: Timestamp('2018-12-24 00:00:00'),
  3: Timestamp('2018-12-25 00:00:00'),
  4: Timestamp('2018-12-26 00:00:00'),
  5: Timestamp('2018-12-27 00:00:00'),
  6: Timestamp('2018-12-28 00:00:00'),
  7: Timestamp('2018-12-29 00:00:00'),
  8: Timestamp('2018-12-30 00:00:00'),
  9: Timestamp('2018-12-31 00:00:00'),
  10: Timestamp('2019-01-01 00:00:00'),
  11: Timestamp('2019-01-02 00:00:00'),
  12: Timestamp('2019-01-03 00:00:00'),
  13: Timestamp('2019-01-04 00:00:00'),
  14: Timestamp('2019-01-05 00:00:00'),
  15: Timestamp('2019-01-06 00:00:00')},
 'vacation_name': {0: 'Vacances de Noël',
  1: 'Vacances de Noël',
  2: 'Vacances de Noël',
  3: 'Vacances de Noël',
  4: 'Vacances de Noël',
  5: 'Vacances de Noël',
  6: 'Vacances de Noël',
  7: 'Vacances de Noël',
  8: 'Vacances de Noël',
  9: 'Vacances de Noël',
  10: 'Vacances de Noël',
  11: 'Vacances de Noël',
  12: 'Vacances de Noël',
  13: 'Vacances de Noël',
  14: 'Vacances de Noël',
  15: 'Vacances de Noël'},
 'woy': {0: 51,
  1: 51,
  2: 52,
  3: 52,
  4: 52,
  5: 52,
  6: 52,
  7: 52,
  8: 52,
  9: 1,
  10: 1,
  11: 1,
  12: 1,
  13: 1,
  14: 1,
  15: 1}}

1 个答案:

答案 0 :(得分:1)

请考虑在 df_school_vac 上进行Grouper()汇总以获取星期一的每周开始计数,然后使用周级别 df 进行左加入merge: >

agg_df = (df_school_vac.groupby(['vacation_name', 
                                 pd.Grouper(key='timestamp_area_A', freq='W-MON')])
                       .count()
                       .reset_index()
                       .set_axis(['holiday_school_name', 'start_date', 'holiday_school_count'], 
                                 axis='columns', inplace=False)
         )


final_df = (pd.merge(df, agg_df, how='left', on=['start_date'])
              .assign(holiday_school = lambda x: np.where(pd.isnull(x['holiday_school_name']), 
                                                          False, True))
           )

print(final_df)

#    start_date  woy holiday_school_name  holiday_school_count  holiday_school
# 0  2018-11-05   45                 NaN                   NaN           False
# 1  2018-11-12   46                 NaN                   NaN           False
# 2  2018-11-19   47                 NaN                   NaN           False
# 3  2018-11-26   48                 NaN                   NaN           False
# 4  2018-12-03   49                 NaN                   NaN           False
# 5  2018-12-10   50                 NaN                   NaN           False
# 6  2018-12-17   51                 NaN                   NaN           False
# 7  2018-12-24   52    Vacances de Noel                   3.0            True
# 8  2018-12-31    1    Vacances de Noel                   7.0            True
# 9  2019-01-07    2    Vacances de Noel                   6.0            True
# 10 2019-01-14    3                 NaN                   NaN           False
# 11 2019-01-21    4                 NaN                   NaN           False
# 12 2019-01-28    5                 NaN                   NaN           False