我想进行多列操作(即下面的correlate
)以及在先前的计算中使用结果的操作(即下面的diff
计算),而不使用for
循环并使用本地熊猫的功能如groupby
和agg
。这可能吗?
import pandas as pd
import datetime
import numpy as np
np.random.seed(0)
df = pd.DataFrame({'date': [datetime.datetime(2010,1,1)+datetime.timedelta(days=i*15)
for i in range(0,100)],
'invested': np.random.random(100)*1e6,
'return': np.random.random(100),
'side': np.random.choice([-1, 1], 100)})
df['year'] = df['date'].apply(lambda x: x.year)
# want to get rid of the for loop below
ret_year = []
for year in list(list(df['year'].unique())):
df_this_year = df[df['year'] == year]
min_short = df_this_year[df_this_year['side'] == -1]['return'].max()
min_long = df_this_year[df_this_year['side'] == -1]['return'].min()
min_diff = min_long - min_short
avg_inv = df_this_year['invested'].mean()
corr = np.correlate(df_this_year['invested'], df_this_year['return'])[0]
ret_year.append({'year': year, 'min_short': min_short, 'min_long': min_long,
'min_diff': min_diff, 'avg_inv': avg_inv, 'corr': corr})
print(pd.DataFrame(ret_year))
结果:
avg_inv corr min_diff min_long min_short year
0 590766.254452 8.821215e+06 -0.664752 0.297437 0.962189 2010
1 490224.532564 6.122306e+06 -0.900289 0.019193 0.919483 2011
2 438330.806563 4.768964e+06 -0.929680 0.069167 0.998847 2012
3 373038.880789 4.677380e+06 -0.779678 0.164694 0.944372 2013
4 416817.752705 5.014249e+04 0.000000 0.434417 0.434417 2014
以下是一些类似的问题,但不完全相同:
答案 0 :(得分:2)
不要迭代for循环,而是利用pandas groupby
+ apply
。通过将日期列放入索引并按年pd.TimeGrouper('A')
- ' A'是年度的熊猫日期偏移别名。
def calculate(x):
min_short = x.loc[x['side'] == -1, 'return'].max()
min_long = x.loc[x['side'] == -1, 'return'].min()
min_diff = min_long - min_short
avg_inv = x['invested'].mean()
corr = np.correlate(x['invested'], x['return'])[0]
return pd.Series([avg_inv, corr, min_diff, min_long, min_short],
index=['avg_inv','corr','min_diff','min_long','min_short'])
df.groupby(pd.TimeGrouper('A')).apply(calculate).to_period('A')
avg_inv corr min_diff min_long min_short
date
2010 590766.254452 8.821215e+06 -0.664752 0.297437 0.962189
2011 490224.532564 6.122306e+06 -0.900289 0.019193 0.919483
2012 438330.806563 4.768964e+06 -0.929680 0.069167 0.998847
2013 373038.880789 4.677380e+06 -0.779678 0.164694 0.944372
2014 416817.752705 5.014249e+04 0.000000 0.434417 0.434417