我有一个数据帧,其中有几个人的开始日期和结束日期:
# output df
df_output = pd.DataFrame([
["John", "2018-08-03", "2018-08-05", "['2018-08-03', '2018-08-04', '2018-08-05']"],
["Jack", "2018-08-20", "2018-08-21", "['2018-08-20', '2018-08-21']"]
])
df_output.columns = ["name", "start_day", "finish_day", "date_range"]
我想为每个人创建一个日期范围(我想要一个包含日期范围的pd.Series):
select id, min(b) as b,
max(case when seqnum = 1 then c end) as [1],
max(case when seqnum = 1 then d end) as [2],
max(case when seqnum = 2 then c end) as [3],
max(case when seqnum = 2 then d end) as [4],
max(case when seqnum = 3 then c end) as [5],
max(case when seqnum = 3 then d end) as [6]
from (select t.*, row_number() over (partition by id order by id) as seqnum
from t
) t
group by id;
我不知道如何创建范围。
有什么想法吗?
答案 0 :(得分:1)
具有挑战性和有趣性的一个!我认为以下代码段非常接近您的要求,尽管形状与您要求的确切输出略有不同。但是,输出的重组形状确实包含日期范围,名称和结束日期。
import pandas as pd
df_input = pd.DataFrame([["John", "2018-08-03", "2018-08-05"],["Jack", "2018-08-20", "2018-08-21"]], columns=['Name','Start_Date','End_Date'])
df_input['Start_Date'] = pd.to_datetime(df_input['Start_Date'], format='%Y-%m-%d')
df_input['End_Date'] = pd.to_datetime(df_input['End_Date'], format='%Y-%m-%d')
df_input.set_index('Start_Date', inplace=True)
def reindex_by_date(df_input):
dates = pd.date_range(df_input.index.min(), df_input['End_Date'].min())
return df_input.reindex(dates).ffill()
finaldf = df_input.groupby('Name').apply(reindex_by_date)
finaldf