我有一个如下所示的df:
President Start Date End Date
B Clinton 1992-01-01 1999-12-31
G Bush 2000-01-01 2007-12-31
B Obama 2008-01-01 2015-12-31
D Trump 2016-01-01 2019-12-31 # not too far away!!
我想创建另一个df,类似这样
timestamp President
1992-01-01 B Clinton
1992-01-02 B Clinton
...
2000-01-01 G Bush
...
基本上我想创建一个数据框,其索引为时间戳,然后根据另一个df的两列上的条件选择其内容。
我觉得大熊猫内部有一种方法可以做到这一点,但我不确定如何做到。我尝试使用np.piecewise
,但似乎对我而言很难产生条件。我该怎么办?
答案 0 :(得分:4)
这是另一个unnesting问题
df['New']=[pd.date_range(x,y).tolist() for x , y in zip (df.StartDate,df.EndDate)]
unnesting(df,['New'])
仅供参考,我已在此处粘贴函数
def unnesting(df, explode):
idx=df.index.repeat(df[explode[0]].str.len())
df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
df1.index=idx
return df1.join(df.drop(explode,1),how='left')
答案 1 :(得分:2)
您可以使用pd.date_range从起始值和结束值创建日期范围。确保开始日期和结束日期为日期时间格式。
s = df.set_index('President').apply(lambda x: pd.Series(pd.date_range(x['Start Date'], x['End Date'])), axis = 1).stack().reset_index(1, drop = True)
new_df = pd.DataFrame(s.index.values, index=s, columns = ['President'] )
President
1992-01-01 B Clinton
1992-01-02 B Clinton
1992-01-03 B Clinton
1992-01-04 B Clinton
1992-01-05 B Clinton
1992-01-06 B Clinton
1992-01-07 B Clinton
1992-01-08 B Clinton
1992-01-09 B Clinton
答案 2 :(得分:0)
也许您可以使用PeriodIndex
而不是DatetimeIndex
,因为您要处理的是规则间隔的时间间隔,即年。
# create a list of PeriodIndex objects with annual frequency
p_idxs = [pd.period_range(start, end, freq='A') for idx, (start, end) in df[['Start Date', 'End Date']].iterrows()]
# for each PeriodIndex create a DataFrame where
# the number of president instances matches the length of the PeriodIndex object
df_list = []
for pres, p_idx in zip(df['President'].tolist(), p_idxs):
df_ = pd.DataFrame(data=len(p_idx)*[pres], index=p_idx)
df_list.append(df_)
# concatenate everything to get the desired output
df_desired = pd.concat(df_list, axis=0)