我有两个时间序列数据,A和B列。
我正在计算A列上不同持续时间的滚动移动平均值。 例如(5,10,15,20)。
我想为每个平均列分配权重,以使权重和平均列的总和与B列具有最大的相关性。换句话说,如何在python中实现像优化一样的excel。
请查看示例代码,并提出前进的方向。
import pandas as pd
import numpy as np
dates = pd.date_range('20130101', periods=100)
df = pd.DataFrame(np.random.randn(100, 2), index=dates, columns=list('AB'))
df['sma_5']=df['A'].rolling(5).mean()
df['sma_10']=df['A'].rolling(10).mean()
df['sma_15']=df['A'].rolling(15).mean()
df['sma_20']=df['A'].rolling(20).mean()
w=[0.25,0.25,0.25,0.25]
df['B_friend'']=
w[0]*df['sma_5']+w[1]*df['sma_10']+w[2]*df['sma_15']+w[3]*df['sma_20']
需要优化权重“ w”以最大化相关性。
df['B'].corr(df['B_friend'])
谢谢。
答案 0 :(得分:1)
scipy.optimize.minimize
函数看起来像您所需要的:https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html#scipy.optimize.minimize
代码看起来像这样:
import pandas as pd
import numpy as np
import scipy.optimize as opt
dates = pd.date_range('20130101', periods=100)
df = pd.DataFrame(np.random.randn(100, 2), index=dates, columns=list('AB'))
df['sma_5']=df['A'].rolling(5).mean()
df['sma_10']=df['A'].rolling(10).mean()
df['sma_15']=df['A'].rolling(15).mean()
df['sma_20']=df['A'].rolling(20).mean()
def fun(x):
w = x
B_friend=w[0]*df['sma_5']+w[1]*df['sma_10']+w[2]*df['sma_15']+w[3]*df['sma_20']
# -np.abs(corr) instead of just corrr is used
# in order to turn a maximization problem into a
# minimization problem
return -np.abs(df['B'].corr(B_friend))
w=[0.25,0.25,0.25,0.25]
opt.minimize(fun, w)