使每日pandas DataFrame接收相同的Weekly(重新采样)DataFrame值

时间:2018-04-30 02:37:39

标签: python pandas dataframe resampling stocks

鉴于以下每日价格DataFrame:

             open   high    low  close  volume
date                                          
2017-11-01  44.66  44.75  43.56  43.56    1000
2017-11-03  43.56  43.74  42.19  42.93    2500
2017-11-06  43.15  43.43  42.45  42.66    2000
2017-11-07  42.40  42.70  41.19  42.25    1500
2017-11-08  42.50  43.50  41.77  43.26     200
2017-11-09  43.46  43.46  41.94  43.00    5000
2017-11-10  43.75  43.75  40.60  41.02     500
2017-11-13  41.60  42.01  40.03  41.90     125
2017-11-14  42.05  43.21  41.67  41.90    1000
2017-11-16  41.98  42.48  41.63  41.96    1200
2017-11-17  41.87  42.69  41.71  42.36    1250
2017-11-21  42.70  43.10  42.15  42.30     800
2017-11-22  42.30  42.38  40.92  41.19     300
2017-11-23  41.11  41.69  40.96  41.21       0
2017-11-24  41.26  41.40  40.35  40.37    2000
2017-11-27  40.28  40.36  39.10  39.80    3000
2017-11-28  40.23  40.40  39.50  40.04     500

我重新采样到Weekly DataFrame(使用本文末尾提供的功能):

             open   high    low  close  volume
date                                          
2017-10-30  44.66  44.75  42.19  42.93    3500
2017-11-06  43.15  43.75  40.60  41.02    9200
2017-11-13  41.60  43.21  40.03  42.36    3575
2017-11-20  42.70  43.10  40.35  40.37    3100
2017-11-27  40.28  40.40  39.10  40.04    3500

我希望我能够"重新采样"每日DataFrame使用来自Weekly的数据。它应如下所示:

             open   high    low  close  volume
date                                          
2017-11-01  44.66  44.75  42.19  42.93    3500
2017-11-03  44.66  44.75  42.19  42.93    3500
2017-11-06  43.15  43.75  40.60  41.02    9200
2017-11-07  43.15  43.75  40.60  41.02    9200
2017-11-08  43.15  43.75  40.60  41.02    9200
2017-11-09  43.15  43.75  40.60  41.02    9200
2017-11-10  43.15  43.75  40.60  41.02    9200
2017-11-13  41.60  43.21  40.03  42.36    3575
2017-11-14  41.60  43.21  40.03  42.36    3575
2017-11-16  41.60  43.21  40.03  42.36    3575
2017-11-17  41.60  43.21  40.03  42.36    3575
2017-11-21  42.70  43.10  40.35  40.37    3100
2017-11-22  42.70  43.10  40.35  40.37    3100
2017-11-23  42.70  43.10  40.35  40.37    3100
2017-11-24  42.70  43.10  40.35  40.37    3100
2017-11-27  40.28  40.40  39.10  40.04    3500
2017-11-28  40.28  40.40  39.10  40.04    3500

如果它有帮助,这是我用来制作每周(第二)数据帧的函数:

def sampleWeekly(dfDaily):
    weeklySampler = dfDaily.resample("W", label='left', loffset=pd.DateOffset(days=1))
    dfWeekly = weeklySampler.agg({"open":"first", "high":"max", "low":"min", "close":"last", "volume":"sum"})
    dfWeekly = dfWeekly.loc[:, ("open","high","low","close","volume")]
    return dfWeekly

如果有人能帮助我找到一种聪明/有效的方法来创建第三个数据帧,我真的很感激。谢谢!

3 个答案:

答案 0 :(得分:3)

这应该像使用groupbypd.Grouper}和transform一样简单,如下所示:

df.groupby(pd.Grouper(level=0, freq='W')). \
   transform({"open":"first", \
              "high":"max", \
              "low":"min", \
              "close":"last", \
              "volume":"sum"})

... as,根据transform documentation,它应该可以传递dict of column names -> functions (or list of functions),就像你在函数中使用agg方法一样。目前这导致TypeError,根据this open issue,此问题尚未解决。

与此同时,一个解决方案就是使用resampleagg,然后使用pd.merge_asof一个空的数据框(拥有你的原始指数)以达到目标结果。

pd.merge_asof(pd.DataFrame(index=df.index), \
              df.resample('W'). \
                 agg({"open":"first", \
                      "high":"max", \
                      "low":"min", \
                      "close":"last", \
                      "volume":"sum"}), \
              left_index=True, right_index=True, direction="forward")

#              high  close   open    low  volume
# date                                          
# 2017-11-01  44.75  42.93  44.66  42.19    3500
# 2017-11-03  44.75  42.93  44.66  42.19    3500
# 2017-11-06  43.75  41.02  43.15  40.60    9200
# 2017-11-07  43.75  41.02  43.15  40.60    9200
# 2017-11-08  43.75  41.02  43.15  40.60    9200
# 2017-11-09  43.75  41.02  43.15  40.60    9200
# 2017-11-10  43.75  41.02  43.15  40.60    9200
# 2017-11-13  43.21  42.36  41.60  40.03    3575
# 2017-11-14  43.21  42.36  41.60  40.03    3575
# 2017-11-16  43.21  42.36  41.60  40.03    3575
# 2017-11-17  43.21  42.36  41.60  40.03    3575
# 2017-11-21  43.10  40.37  42.70  40.35    3100
# 2017-11-22  43.10  40.37  42.70  40.35    3100
# 2017-11-23  43.10  40.37  42.70  40.35    3100
# 2017-11-24  43.10  40.37  42.70  40.35    3100
# 2017-11-27  40.40  40.04  40.28  39.10    3500
# 2017-11-28  40.40  40.04  40.28  39.10    3500

答案 1 :(得分:3)

您可以使用combine_firstwhereffill

dfweekly.combine_first(dfdaily)\
        .where(dfweekly.notnull())\
        .ffill()

Ouptut:

             open   high    low  close  volume
date                                          
2017-10-30  44.66  44.75  42.19  42.93  3500.0
2017-11-01  44.66  44.75  42.19  42.93  3500.0
2017-11-03  44.66  44.75  42.19  42.93  3500.0
2017-11-06  43.15  43.75  40.60  41.02  9200.0
2017-11-07  43.15  43.75  40.60  41.02  9200.0
2017-11-08  43.15  43.75  40.60  41.02  9200.0
2017-11-09  43.15  43.75  40.60  41.02  9200.0
2017-11-10  43.15  43.75  40.60  41.02  9200.0
2017-11-13  41.60  43.21  40.03  42.36  3575.0
2017-11-14  41.60  43.21  40.03  42.36  3575.0
2017-11-16  41.60  43.21  40.03  42.36  3575.0
2017-11-17  41.60  43.21  40.03  42.36  3575.0
2017-11-20  42.70  43.10  40.35  40.37  3100.0
2017-11-21  42.70  43.10  40.35  40.37  3100.0
2017-11-22  42.70  43.10  40.35  40.37  3100.0
2017-11-23  42.70  43.10  40.35  40.37  3100.0
2017-11-24  42.70  43.10  40.35  40.37  3100.0
2017-11-27  40.28  40.40  39.10  40.04  3500.0
2017-11-28  40.28  40.40  39.10  40.04  3500.0

更新

dfweekly.combine_first(dfdaily)\
        .where(dfweekly.notnull())\
        .ffill().reindex(dfdaily.index)

答案 2 :(得分:2)

还有pandas.merge_asof()

import pandas as pd

pd.merge_asof(dfDaily.reset_index()[['date']], dfWeekly.reset_index(),
    on='date', direction='forward').set_index('date')

             open   high    low  close  volume
date                                          
2017-11-01  44.66  44.75  42.19  42.93    3500
2017-11-03  44.66  44.75  42.19  42.93    3500
2017-11-06  43.15  43.75  40.60  41.02    9200
2017-11-07  43.15  43.75  40.60  41.02    9200
2017-11-08  43.15  43.75  40.60  41.02    9200
2017-11-09  43.15  43.75  40.60  41.02    9200
2017-11-10  43.15  43.75  40.60  41.02    9200
2017-11-13  41.60  43.21  40.03  42.36    3575
2017-11-14  41.60  43.21  40.03  42.36    3575
2017-11-16  41.60  43.21  40.03  42.36    3575
2017-11-17  41.60  43.21  40.03  42.36    3575
2017-11-21  42.70  43.10  40.35  40.37    3100
2017-11-22  42.70  43.10  40.35  40.37    3100
2017-11-23  42.70  43.10  40.35  40.37    3100
2017-11-24  42.70  43.10  40.35  40.37    3100
2017-11-27  40.28  40.40  39.10  40.04    3500
2017-11-28  40.28  40.40  39.10  40.04    3500