有效加入Python数据帧时间序列

时间:2016-09-27 21:46:39

标签: python pandas dataframe time-series

我有以下2个数据框:

day
        date     val
11740 2016-01-04  1.3970
11741 2016-01-05  1.3991
11742 2016-01-06  1.4084
11743 2016-01-07  1.4061

df
        Adj_Close         Close        Date          High           Low
182  12927.200195  12927.200195  2016-01-04  12928.900391      12748.50   
181  12920.099609  12920.099609  2016-01-05  12954.900391  12839.799805   
180  12726.799805  12726.799805  2016-01-06  12854.599609  12701.700195   
179  12448.200195  12448.200195  2016-01-07  12661.200195  12439.099609 

我有一个俗气的循环,通过“加入”共同日期来约会日期以创建新的数据框(new_df)。

new_df = pd.DataFrame(columns=['date', 'close', 'fx', 'usd'])

for indexFx, rowFx in day.iterrows():
    for indexSt, rowSt in df.iterrows(): #this is not efficient 
        fxDate = str(rowFx.date)[:10] #only keep data component not time
        if str(rowSt['Date']) == fxDate:

            dateObj = datetime.strptime(rowSt.Date,'%Y-%m-%d')
            row = [dateObj, rowSt.Close,rowFx.val, float(rowSt.Close) * float(rowFx.val)]
            new_df.loc[len(new_df)] = row

我知道有一种有效的方法来Python化这个循环。有人可以帮忙吗?

谢谢

1 个答案:

答案 0 :(得分:1)

pd.concat([day.set_index('date'), df.set_index('Date')], axis=1)

>>>

               val     Adj_Close         Close          High  \
2016-01-04  1.3970  12927.200195  12927.200195  12928.900391
2016-01-05  1.3991  12920.099609  12920.099609  12954.900391
2016-01-06  1.4084  12726.799805  12726.799805  12854.599609
2016-01-07  1.4061  12448.200195  12448.200195  12661.200195

                     Low
2016-01-04  12748.500000
2016-01-05  12839.799805
2016-01-06  12701.700195
2016-01-07  12439.099609

根据您是否需要内部或外部联接,您可以使用join='inner'join='outer'指定该联接。