分组计算与系列计算

时间:2019-05-27 22:42:06

标签: python series rolling-computation

我有一系列的摘录:

Dates
1988-01-01        NaN
1988-01-04     257.40
1988-01-05     259.80
1988-01-06     258.60
1988-01-07     262.85
1988-01-08     240.75
1988-01-11     247.70
1988-01-12     246.35
1988-01-13     246.25
1988-01-14     247.45
1988-01-15     251.50
...  
2019-03-01    2805.00
2019-03-04    2791.50
2019-03-05    2791.50
2019-03-06    2771.50
2019-03-07    2750.00
2019-03-08    2747.00
2019-03-11    2789.00
2019-03-12    2797.25
2019-03-13    2819.50
2019-03-14    2812.25
2019-03-15    2829.75
Length: 8141, dtype: float64

我需要在工作日之前(即星期一,星期二等)执行该系列的40周移动平均值。

我尝试了几种方法,但只有一种成功了。

werTarget = werTarget.fillna(method='ffill')
i = 0
while i < 5: # for Monday to Friday, do each weekday separately
    tmpTarget = werTarget[werTarget.index.weekday==i]
    tmpIntmdInd = tmpTarget / tmpTarget.rolling(window=40).mean()
    if i == 0:
        IntmdInd = tmpIntmdInd
    else:
        holdindx = IntmdInd
        i = i + 1

花了两个多小时才完成,当我绘制它时,每个数据点都是它自己的线。

结果,我需要一个系列,而且速度肯定要快得多:其中一些系列比这更长,而实际上有数千个。

我尝试使用更简洁的内容

werTarget = werTarget.fillna(method='ffill')
IntmdInd = werTarget.groupby('weekday').rolling(window=40).mean()

但这会导致错误

Traceback (most recent call last):

  File "<ipython-input-16-1d4ba482ec32>", line 1, in <module>
    runfile('C:/MyFile.py', wdir='C:/MyDir')

  File "C:\Users\Admin\Anaconda2\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
    execfile(filename, namespace)

  File "C:\Users\Admin\Anaconda2\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/MyFile.py", line 62, in <module>
    werGraph(sp,werOne)

  File "C:/MyFile.py", line 44, in werGraph
    IntmdInd = werIntmdInd(werRat)

  File "C:/MyFile.py", line 34, in werIntmdInd
    IntmdInd = werTarget.groupby('weekday').rolling(window=75).mean()

  File "C:\Users\Admin\Anaconda2\lib\site-packages\pandas\core\generic.py", line 7632, in groupby
    observed=observed, **kwargs)

  File "C:\Users\Admin\Anaconda2\lib\site-packages\pandas\core\groupby\groupby.py", line 2110, in groupby
    return klass(obj, by, **kwds)

  File "C:\Users\Admin\Anaconda2\lib\site-packages\pandas\core\groupby\groupby.py", line 360, in __init__
    mutated=self.mutated)

  File "C:\Users\Admin\Anaconda2\lib\site-packages\pandas\core\groupby\grouper.py", line 578, in _get_grouper
    raise KeyError(gpr)

KeyError: 'weekday'

有人知道解决方案吗?

1 个答案:

答案 0 :(得分:0)

我不确定错误在哪里,因为我几乎使用了问题中的代码。 我将通过pandas_datareader的一些数据进行演示

CREATE TRIGGER TRG_MainTable
ON MainTable
AFTER INSERT AS
BEGIN
    INSERT INTO MainTable_BACKUP 
        SELECT * 
        FROM INSERTED

    -- UPDATE INSERTED SET BackupRecordId = ??? somehow...
END

然后我将索引转换为日期时间,获取工作日,并对分组数据执行滚动平均值

>>> import pandas_datareader as pdr
>>> import pandas as pd  # version 0.24.2
>>>
>>> start = pd.to_datetime('2017-01-01')#datetime(2015, 2, 9)
>>> end = pd.to_datetime('2019-01-01')
>>> f = pdr.data.DataReader('F', 'iex', start, end)
>>> f.head()

               open     high      low    close    volume
date
2017-01-03  10.4286  10.7705  10.3687  10.7619  40510821
2017-01-04  10.9158  11.3432  10.8902  11.2577  77638075
2017-01-05  11.2919  11.3005  10.7961  10.9158  75628443
2017-01-06  10.9414  10.9756  10.8047  10.9072  40315887
2017-01-09  10.9329  10.9927  10.7961  10.7961  39438393