我有一个大型数据集,需要计算每个日期3年的滚动回报。我是大熊猫的新手,无法理解如何使用大熊猫。下面是我的示例数据框。
nav_date price
1989 2019-11-29 25.02
2338 2019-11-28 25.22
1991 2019-11-27 25.11
1988 2019-11-26 24.98
1990 2019-11-25 25.06
1978 2019-11-22 24.73
1984 2019-11-21 24.84
1985 2019-11-20 24.90
1980 2019-11-19 24.78
1971 2019-11-18 24.67
1975 2019-11-15 24.69
1970 2019-11-14 24.64
1962 2019-11-13 24.58
1977 2019-11-11 24.73
1976 2019-11-08 24.72
1987 2019-11-07 24.93
1983 2019-11-06 24.84
1979 2019-11-05 24.74
1981 2019-11-04 24.79
1974 2019-11-01 24.68
2337 2019-10-31 24.66
1966 2019-10-30 24.59
1957 2019-10-29 24.47
1924 2019-10-25 24.06
2336 2019-10-24 24.06
1929 2019-10-23 24.10
1923 2019-10-22 24.05
1940 2019-10-18 24.20
1921 2019-10-17 24.05
1890 2019-10-16 23.77
1882 2019-10-15 23.70
1868 2019-10-14 23.52
1860 2019-10-11 23.45
1846 2019-10-10 23.30
1862 2019-10-09 23.46
2335 2019-10-07 23.08
1837 2019-10-04 23.18
1863 2019-10-03 23.47
1873 2019-10-01 23.57
1894 2019-09-30 23.80
1901 2019-09-27 23.88
1916 2019-09-26 24.00
1885 2019-09-25 23.73
1919 2019-09-24 24.04
1925 2019-09-23 24.06
1856 2019-09-20 23.39
1724 2019-09-19 22.22
1773 2019-09-18 22.50
1763 2019-09-17 22.45
1811 2019-09-16 22.83
1825 2019-09-13 22.98
1806 2019-09-12 22.79
1817 2019-09-11 22.90
1812 2019-09-09 22.84
1797 2019-09-06 22.72
1777 2019-09-05 22.52
1776 2019-09-04 22.51
2334 2019-09-03 22.42
1815 2019-08-30 22.88
1798 2019-08-29 22.73
1820 2019-08-28 22.93
1830 2019-08-27 23.05
1822 2019-08-26 22.95
1770 2019-08-23 22.48
1737 2019-08-22 22.30
1794 2019-08-21 22.66
2333 2019-08-20 22.86
1821 2019-08-19 22.93
1819 2019-08-16 22.92
1814 2019-08-14 22.88
但是我可以用简单的python来做到这一点,但是执行时间太长。在python中,我喜欢这样-
start_date = '2019-10-31'
end_date = '2016-10-31' #For 3 years
years = 3
# Now look at each price for all the dates between start_date and end_date for 3 year and #calculate the CAGR and then do the average.
total_returns = 0
for n in range(int((start_date - end_date).days)):
sd = start_date - relativedelta(days=n)
ed = sd - relativedelta(years=years)
returns = (((price_dict['sd']/price_dict['ed']) ** (1 / years)) - 1) * 100
total_returns+=returns
roll_return = total_returns/int((start_date - end_date).days)
我确信使用pandas可以获得一些相同的输出,而不会进行太多的迭代,因为它变得太慢了,并且花费了太多的时间来执行。预先感谢。
答案 0 :(得分:0)
您没有表现出预期的结果...无论如何,这只是一个例子,我想您会理解我的方法的。
df = pd.DataFrame({
'nav_date': (
'2019-11-29',
'2018-11-29',
'2017-11-29',
'2016-11-29',
'2019-11-28',
'2018-11-28',
'2017-11-28',
'2016-11-28',
),
'price': (
25.02, # <- example of your price(2019-11-29)
25.11,
25.06,
26.50, # <- example of your price(2016-11-29)
30.51,
30.41,
30.31,
30.21,
),
})
df['year'] = ''
# parse year from date string
df['year'] = df['nav_date'].apply(lambda x: x[0:4])
# parse date without year
df['nav_date'] = df['nav_date'].apply(lambda x: x[5:])
# years to columns, prices to rows
df = df.pivot(index='nav_date', columns='year', values='price')
df = pd.DataFrame(df.to_records())
# value calculation by columns...
df['2019'] = ((df['2019'] / df['2016'] * (1 / 3)) - 1) * 100
# df['2018'] = blablabla...
print(df)
结果:
nav_date 2016 2017 2018 2019
0 11-28 30.21 30.31 30.41 -66.335650
1 11-29 26.50 25.06 25.11 -68.528302 # <- your expected value
因此,您每天都有dataframe
的计算值,您可以轻松地执行任何操作(avg()
/ max()
/ min()
/只需进行任何操作)
希望这会有所帮助。