如何在过去3年中每天使用熊猫计算移动平均值

时间:2019-12-09 16:44:08

标签: python pandas numpy

我有一个大型数据集,需要计算每个日期3年的滚动回报。我是大熊猫的新手,无法理解如何使用大熊猫。下面是我的示例数据框。

       nav_date     price
1989  2019-11-29    25.02
2338  2019-11-28    25.22
1991  2019-11-27    25.11
1988  2019-11-26    24.98
1990  2019-11-25    25.06
1978  2019-11-22    24.73
1984  2019-11-21    24.84
1985  2019-11-20    24.90
1980  2019-11-19    24.78
1971  2019-11-18    24.67
1975  2019-11-15    24.69
1970  2019-11-14    24.64
1962  2019-11-13    24.58
1977  2019-11-11    24.73
1976  2019-11-08    24.72
1987  2019-11-07    24.93
1983  2019-11-06    24.84
1979  2019-11-05    24.74
1981  2019-11-04    24.79
1974  2019-11-01    24.68
2337  2019-10-31    24.66
1966  2019-10-30    24.59
1957  2019-10-29    24.47
1924  2019-10-25    24.06
2336  2019-10-24    24.06
1929  2019-10-23    24.10
1923  2019-10-22    24.05
1940  2019-10-18    24.20
1921  2019-10-17    24.05
1890  2019-10-16    23.77
1882  2019-10-15    23.70
1868  2019-10-14    23.52
1860  2019-10-11    23.45
1846  2019-10-10    23.30
1862  2019-10-09    23.46
2335  2019-10-07    23.08
1837  2019-10-04    23.18
1863  2019-10-03    23.47
1873  2019-10-01    23.57
1894  2019-09-30    23.80
1901  2019-09-27    23.88
1916  2019-09-26    24.00
1885  2019-09-25    23.73
1919  2019-09-24    24.04
1925  2019-09-23    24.06
1856  2019-09-20    23.39
1724  2019-09-19    22.22
1773  2019-09-18    22.50
1763  2019-09-17    22.45
1811  2019-09-16    22.83
1825  2019-09-13    22.98
1806  2019-09-12    22.79
1817  2019-09-11    22.90
1812  2019-09-09    22.84
1797  2019-09-06    22.72
1777  2019-09-05    22.52
1776  2019-09-04    22.51
2334  2019-09-03    22.42
1815  2019-08-30    22.88
1798  2019-08-29    22.73
1820  2019-08-28    22.93
1830  2019-08-27    23.05
1822  2019-08-26    22.95
1770  2019-08-23    22.48
1737  2019-08-22    22.30
1794  2019-08-21    22.66
2333  2019-08-20    22.86
1821  2019-08-19    22.93
1819  2019-08-16    22.92
1814  2019-08-14    22.88

但是我可以用简单的python来做到这一点,但是执行时间太长。在python中,我喜欢这样-

start_date = '2019-10-31'
end_date = '2016-10-31' #For 3 years
years = 3

# Now look at each price for all the dates between start_date and end_date for 3 year and #calculate the CAGR and then do the average.

total_returns = 0
for n in range(int((start_date - end_date).days)):
    sd = start_date - relativedelta(days=n)
    ed = sd - relativedelta(years=years)
    returns = (((price_dict['sd']/price_dict['ed']) ** (1 / years)) - 1) * 100
    total_returns+=returns
roll_return = total_returns/int((start_date - end_date).days)

我确信使用pandas可以获得一些相同的输出,而不会进行太多的迭代,因为它变得太慢了,并且花费了太多的时间来执行。预先感谢。

1 个答案:

答案 0 :(得分:0)

您没有表现出预期的结果...无论如何,这只是一个例子,我想您会理解我的方法的。

df = pd.DataFrame({
    'nav_date': (
        '2019-11-29',
        '2018-11-29',
        '2017-11-29',
        '2016-11-29',
        '2019-11-28',
        '2018-11-28',
        '2017-11-28',
        '2016-11-28',
    ),
    'price': (
        25.02,  # <- example of your price(2019-11-29)
        25.11,
        25.06,
        26.50,  # <- example of your price(2016-11-29)
        30.51,
        30.41,
        30.31,
        30.21,
    ),
})


df['year'] = ''
# parse year from date string
df['year'] = df['nav_date'].apply(lambda x: x[0:4])
# parse date without year
df['nav_date'] = df['nav_date'].apply(lambda x: x[5:])
# years to columns, prices to rows
df = df.pivot(index='nav_date', columns='year', values='price')
df = pd.DataFrame(df.to_records())
# value calculation by columns...
df['2019'] = ((df['2019'] / df['2016'] * (1 / 3)) - 1) * 100
# df['2018'] = blablabla...
print(df)

结果:

  nav_date   2016   2017   2018       2019
0    11-28  30.21  30.31  30.41 -66.335650
1    11-29  26.50  25.06  25.11 -68.528302  # <- your expected value

因此,您每天都有dataframe的计算值,您可以轻松地执行任何操作(avg() / max() / min() /只需进行任何操作)

希望这会有所帮助。