pykalman的多元回归?

时间:2019-03-18 19:17:39

标签: python regression kalman-filter pykalman

我正在寻找一种方法,使用pykalman从1到N个回归变量来概括回归。最初我们不会为在线回归而烦恼-我只希望一个玩具示例为2个回归变量而不是1个回归变量设置卡尔曼滤波器,即Y = c1 * x1 + c2 * x2 + const。

对于单回归器情况,以下代码有效。我的问题是如何更改过滤器设置,使其适用于两个回归器:

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    from pykalman import KalmanFilter

    if __name__ == "__main__":
        file_name = '<path>\KalmanExample.txt'
        df = pd.read_csv(file_name, index_col = 0)
        prices = df[['ETF', 'ASSET_1']] #, 'ASSET_2']]

        delta = 1e-5
        trans_cov = delta / (1 - delta) * np.eye(2)
        obs_mat = np.vstack( [prices['ETF'], 
                            np.ones(prices['ETF'].shape)]).T[:, np.newaxis]

        kf = KalmanFilter(
            n_dim_obs=1,
            n_dim_state=2,
            initial_state_mean=np.zeros(2),
            initial_state_covariance=np.ones((2, 2)),
            transition_matrices=np.eye(2),
            observation_matrices=obs_mat,
            observation_covariance=1.0,
            transition_covariance=trans_cov
        )

        state_means, state_covs = kf.filter(prices['ASSET_1'].values)

        # Draw slope and intercept...
        pd.DataFrame(
            dict(
                slope=state_means[:, 0],
                intercept=state_means[:, 1]
            ), index=prices.index
        ).plot(subplots=True)
        plt.show()

示例文件KalmanExample.txt包含以下数据:

Date,ETF,ASSET_1,ASSET_2
2007-01-02,176.5,136.5,141.0
2007-01-03,169.5,115.5,143.25
2007-01-04,160.5,111.75,143.5
2007-01-05,160.5,112.25,143.25
2007-01-08,161.0,112.0,142.5
2007-01-09,155.5,110.5,141.25
2007-01-10,156.5,112.75,141.25
2007-01-11,162.0,118.5,142.75
2007-01-12,161.5,117.0,142.5
2007-01-15,160.0,118.75,146.75
2007-01-16,156.5,119.5,146.75
2007-01-17,155.0,120.5,145.75
2007-01-18,154.5,124.5,144.0
2007-01-19,155.5,126.0,142.75
2007-01-22,157.5,124.5,142.5
2007-01-23,161.5,124.25,141.75
2007-01-24,164.5,125.25,142.75
2007-01-25,164.0,126.5,143.0
2007-01-26,161.5,128.5,143.0
2007-01-29,161.5,128.5,140.0
2007-01-30,161.5,129.75,139.25
2007-01-31,161.5,131.5,137.5
2007-02-01,164.0,130.0,137.0
2007-02-02,156.5,132.0,128.75
2007-02-05,156.0,131.5,132.0
2007-02-06,159.0,131.25,130.25
2007-02-07,159.5,136.25,131.5
2007-02-08,153.5,136.0,129.5
2007-02-09,154.5,138.75,128.5
2007-02-12,151.0,136.75,126.0
2007-02-13,151.5,139.5,126.75
2007-02-14,155.0,169.0,129.75
2007-02-15,153.0,169.5,129.75
2007-02-16,149.75,166.5,128.0
2007-02-19,150.0,168.5,130.0

单回归器情况提供以下输出,而对于两回归器情况,我需要第二个“斜率”图代表C2。

enter image description here

1 个答案:

答案 0 :(得分:1)

答案进行了修改,以反映我对问题的修正理解。

如果我正确理解,您希望将可观察的输出变量Y = ETF建模为两个可观察值的线性组合; ASSET_1, ASSET_2

此回归的系数应视为系统状态,即ETF = x1*ASSET_1 + x2*ASSET_2 + x3,其中x1x2分别是系数资产1和2,而{{1} }是截距。这些系数被假定为缓慢发展

下面给出了实现此目标的代码,请注意,这只是扩展了现有示例,使其具有更多的回归变量。

还请注意,通过使用x3参数可以得到完全不同的结果。如果将其设置为较大值(远离零),则系数将变化更快,并且回归的重建将接近完美。如果将其设置为很小(非常接近零),则系数将发展得更慢,回归的重建也将变得不那么完美。您可能需要研究期望最大化算法-supported by pykalman

代码:

delta

情节:

States evolving

Reconstruction of regressand