Question

我正在寻找一个矢量化解决方案来计算具有日期偏移的移动平均线。我有一个不规则间隔的产品成本系列成本和每个值我想计算前三个值的平均值，日期偏移为45天。例如，如果这是我的输入数据帧：

    In [1]: df
    Out [1]:
        ActCost OrDate
   0    8       2015-01-01
   1    5       2015-02-04
   2    10      2015–02-11
   3    1       2015-02-11
   4    10      2015-03-11
   5    18      2015-03-15
   6    20      2015-05-18
   7    25      2015-05-23
   8    8       2015-06-11
   9    5       2015-10-09
  10    15      2015-11-02
  12    18      2015-12-20

输出结果为：

    In[2]: df
    Out[2]:
        ActCost OrDate      EstCost
   0    8       2015-01-01  NaN
   1    5       2015-02-04  NaN
   2    10      2015–02-11  NaN
   3    1       2015-02-11  NaN
   4    10      2015-03-11  NaN
   5    18      2015-03-15  NaN
   6    20      2015-05-18  9.67  # mean(index 3:5)
   7    25      2015-05-23  9.67  # mean(index 3:5)
   8    8       2015-06-11  9.67  # mean(index 3:5) 
   9    5       2015-10-09  17.67 # mean(index 6:8)
  10    15      2015-11-02  17.67 # mean(index 6:8)
  12    18      2015-12-20  12.67 # mean(index 7:9)

我目前的解决方案如下：

    for index, row in df.iterrows():
        orDate=row['OrDate']
        costsLanded = orDate - timedelta(45)
        if costsLanded <= np.min(df.OrDate):
            df.loc[index,'EstCost']=np.nan
            break
        if len(dfID[df.OrDate <= costsLanded]) < 3:
            df.loc[index,'EstCost'] = np.nan
            break
        df.loc[index,'EstCost']=np.mean(df[‘ActShipCost'][df.OrDate <=         
                                           costsLanded].head(3))

我的代码有效，但速度很慢，我有数百万个这样的时间序列。我希望有人可以就如何加快这个过程给我一些建议。我想最好的办法是对操作进行矢量化，但我不确定如何实现它。非常感谢你的帮助！

Answer 1

尝试这样的事情：

#Set up DatetimeIndex (easier to just load in data with index as OrDate)
df = df.set_index('OrDate', drop=True)
df.index = pd.DatetimeIndex(df.index)
df.index.name = 'OrDate'

#Save original timestamps for later
idx = df.index

#Make timeseries with regular daily interval
df = df.resample('d').first()

#Take the moving mean with window size of 45 days
df = df.rolling(window=45, min_periods=0).mean()

#Grab the values for the original timestamp and put the index back
df = df.ix[idx].reset_index()

Answer 2

如果我理解正确，我认为你想要的只是

df.resample('45D').agg('mean')

带有时间偏移大熊猫的移动平均线

2 个答案: