我正在尝试将每20天滚动窗口最后一天的前三个PC系数,得分,特征值和残差(原始数据-重构)存储在20个具有不同经验和108个CDS指数的每日收益数据集上交易日。我正试图在没有scikit.learn的帮助下自己编写整个算法,以用于教学目的。 不幸的是,我似乎无法将PC的系数存储在我先前初始化的DataFrames coeff1roll,coeff2roll,coeff3roll中。其他一切似乎都正常。
请帮助我,并随时让我注意我的程序中的任何其他错误。
这是我的代码:
d = data
t = 20
scores1roll = np.zeros((n-t)) # scores
scores2roll = np.zeros((n-t))
scores3roll = np.zeros((n-t))
coeff1roll = pd.DataFrame(data = np.zeros((n-t, m))) # loadings
coeff2roll = pd.DataFrame(data = np.zeros((n-t, m)))
coeff3roll = pd.DataFrame(data = np.zeros((n-t, m)))
eigvalroll1 = np.zeros((n-t)) # eigenvalues
eigvalroll2 = np.zeros((n-t))
eigvalroll3 = np.zeros((n-t))
res_roll = pd.DataFrame(data = np.zeros((n-t, m))) # Residuals
# Loop with PCA step by step
for i in range(n-t):
droll = d.iloc[i:i+t-1,:] # window of data
r,c = droll.shape
meanroll = np.mean(droll, axis=0) # mean by column
stddevroll = np.sqrt(np.var(droll, axis=0)) # standard deviation by col
dstdrollss = (droll - meanroll)/stddevroll
# mean should be 0 and std dev 1
cor_mat = (dstdrollss.T.dot(dstdrollss))/(r - 1) # correlation matrix
eigvalues, eigvectors = np.linalg.eig(cor_mat)
eig_pairs = [(np.abs(eigval[i]), eigvec[:, i]) for i in
range(len(eigval))] # associating each eigvec to the correspondent eigval
eig_pairs.sort(reverse=True) # sorting according to abs val of eigenval
loadingsroll = np.hstack((eig_pairs[0][1].reshape(20, 1),
eig_pairs[1][1].reshape(20, 1),
eig_pairs[2][1].reshape(20, 1))) # selecting first 3
scoresroll = dstdrollss.dot(loadingsroll) # projections on new directions
coeff1roll.iloc[i, :] = loadingsroll[:, 0] # renamed coeff
coeff2roll.iloc[i, :] = loadingsroll[:, 1]
coeff3roll.iloc[i, :] = loadingsroll[:, 2]
scores1roll[i] = scoresroll.iloc[0,0]
scores2roll[i] = scoresroll.iloc[0,1]
scores3roll[i] = scoresroll.iloc[0,2]
eigvalroll1[i] = eigvalues[0]
eigvalroll2[i] = eigvalues[1]
eigvalroll3[i] = eigvalues[2]
d_hatroll = scoresroll.dot(loadingsroll.T) # back to raw normalized data
d_rawroll = d_hatroll * stddevroll.values + meanroll.values # raw data
c = dict(zip(d_rawroll.columns, droll.columns))
resrolling = droll.subtract(d_rawroll.rename(columns=c),
axis='column') # calculate residuals
res_roll.iloc[i, :] = resrolling.iloc[0, :].values
我希望coeff1roll,coeff2roll,coeff3roll的每个数据点具有不同的值,但是循环返回的值相等的列恰好是最近20天窗口中最近一天的系数的值:
coeff1roll.head()
0 1 2 ... 17 18 19
0 0.223961 0.223526 0.223531 ... 0.223876 0.223852 0.22368
1 0.223961 0.223526 0.223531 ... 0.223876 0.223852 0.22368
2 0.223961 0.223526 0.223531 ... 0.223876 0.223852 0.22368
3 0.223961 0.223526 0.223531 ... 0.223876 0.223852 0.22368
4 0.223961 0.223526 0.223531 ... 0.223876 0.223852 0.22368
[5 rows x 20 columns]