使用日期列值中的日期范围创建数据框

时间:2021-01-08 16:49:42

标签: python pandas

给定以下数据框:

<头>
2018-01 2018-02 2018-03 2018-04 2018-05 2018-06 2018-07
p1 y y y y y
p2 y y y y

我想返回连续“y”天的开始日期和结束日期如下:

<头>
开始日期 结束日期
p1 20180201 20180331
p1 20180501 20180731
p2 20180101 20180228
p2 20180401 20180531

1 个答案:

答案 0 :(得分:1)

假设您正在从 excel 加载数据:

import pandas as pd

# Input data prep
data = pd.read_excel('data.xlsx')
data = data.T
data.reset_index(inplace=True)

# Setting the proper header
new_header = data.iloc[0]
data = data[1:]
data.columns = new_header

# Easy to work with 1 and 0 for consecutives with cumsum
data = data.fillna(0)
data = data.replace("y", 1)


df_result = pd.DataFrame() # Store your desired table

for column in data.columns[1:]: # per person iteration
    df_temp = data[["person", column]]

    df_temp['consecutive'] = (df_temp[column].diff(1) != 0).cumsum()
    df_temp = df_temp[df_temp[column] > 0]

    df_temp = pd.DataFrame({
        'person': column,
        'start_date': df_temp.groupby('consecutive')["person"].first(),
        'end_date': df_temp.groupby('consecutive')["person"].last()
    }).reset_index(drop=True)

    df_result = df_result.append(df_temp)

# First and last day of month
df_result['start_date'] = df_result['start_date'].values.astype('datetime64[M]')
df_result['end_date'] = pd.to_datetime(df_result['end_date']) + MonthEnd(1)
print(df_result)