我一直在努力实现卡尔曼滤波器来搜索二维数据集中的异常。非常类似于我在这里找到的优秀帖子。作为下一步,我想预测下一个值将落入的置信区间(例如,对于最低值和最高值的95%置信度)。所以除了下面的行之外,我还要预测。 d希望能够生成两条额外的线,这些线代表95%的信心,即下一个值将高于底线或低于天花板。
我假设我想要使用由卡尔曼滤波器生成的每个预测返回的不确定性协方差矩阵(P),但我不确定它是否正确。任何指导或参考如何做到这一点将不胜感激!
上面的帖子中的代码会随着时间的推移生成一组测量值,并使用卡尔曼滤波器来平滑结果。
import numpy as np
import matplotlib.pyplot as plt
def kalman_xy(x, P, measurement, R,
motion = np.matrix('0. 0. 0. 0.').T,
Q = np.matrix(np.eye(4))):
"""
Parameters:
x: initial state 4-tuple of location and velocity: (x0, x1, x0_dot, x1_dot)
P: initial uncertainty convariance matrix
measurement: observed position
R: measurement noise
motion: external motion added to state vector x
Q: motion noise (same shape as P)
"""
return kalman(x, P, measurement, R, motion, Q,
F = np.matrix('''
1. 0. 1. 0.;
0. 1. 0. 1.;
0. 0. 1. 0.;
0. 0. 0. 1.
'''),
H = np.matrix('''
1. 0. 0. 0.;
0. 1. 0. 0.'''))
def kalman(x, P, measurement, R, motion, Q, F, H):
'''
Parameters:
x: initial state
P: initial uncertainty convariance matrix
measurement: observed position (same shape as H*x)
R: measurement noise (same shape as H)
motion: external motion added to state vector x
Q: motion noise (same shape as P)
F: next state function: x_prime = F*x
H: measurement function: position = H*x
Return: the updated and predicted new values for (x, P)
See also http://en.wikipedia.org/wiki/Kalman_filter
This version of kalman can be applied to many different situations by
appropriately defining F and H
'''
# UPDATE x, P based on measurement m
# distance between measured and current position-belief
y = np.matrix(measurement).T - H * x
S = H * P * H.T + R # residual convariance
K = P * H.T * S.I # Kalman gain
x = x + K*y
I = np.matrix(np.eye(F.shape[0])) # identity matrix
P = (I - K*H)*P
# PREDICT x, P based on motion
x = F*x + motion
P = F*P*F.T + Q
return x, P
def demo_kalman_xy():
x = np.matrix('0. 0. 0. 0.').T
P = np.matrix(np.eye(4))*1000 # initial uncertainty
N = 20
true_x = np.linspace(0.0, 10.0, N)
true_y = true_x**2
observed_x = true_x + 0.05*np.random.random(N)*true_x
observed_y = true_y + 0.05*np.random.random(N)*true_y
plt.plot(observed_x, observed_y, 'ro')
result = []
R = 0.01**2
for meas in zip(observed_x, observed_y):
x, P = kalman_xy(x, P, meas, R)
result.append((x[:2]).tolist())
kalman_x, kalman_y = zip(*result)
plt.plot(kalman_x, kalman_y, 'g-')
plt.show()
demo_kalman_xy()
答案 0 :(得分:4)
1-sigma interval的2D泛化是置信椭圆,其由等式(x-mx).T P^{-1}.(x-mx)==1
表征,其中x
是参数2D-Vector,mx
2D均值或椭圆中心和P^{-1}
逆协方差矩阵。请参阅此answer了解如何绘制一个。与sigma间隔一样,椭圆区域对应于真值位于其中的固定概率。通过使用因子n
(缩放间隔长度或椭圆半径)进行缩放,可以达到更高的置信度。请注意,因素n
在一维和二维中具有不同的概率:
|`n` | 1D-Intverval | 2D Ellipse |
==================================
1 | 68.27% | 39.35%
2 | 95.5% | 86.47%
3 | 99.73% | 98.89%
在2D中计算这些值有点涉及,遗憾的是我没有公开引用它。
答案 1 :(得分:1)
如果您希望95%的间隔预测下一个值将会落入,那么您需要预测间隔而不是置信区间(http://en.wikipedia.org/wiki/Prediction_interval)。
对于2-D(3-D)数据,椭圆的半轴(椭球)可以通过计算数据的协方差矩阵的特征值并调整半轴的大小来找到必要的预测概率。
请参阅Prediction ellipse and prediction ellipsoid获取Python代码以计算95%预测椭圆或椭圆体。 这可能有助于您计算数据的预测椭圆。
答案 2 :(得分:0)
因为您的统计数据当然来自样本,因此总体统计量大于2西格玛标准差的概率为0.5。因此,如果你没有应用2x标准差的上置信因子,我会考虑考虑你是否有一个很好的预测值,你期望下一个度量低于概率0.95。该因子的大小将取决于用于推导0.5种群概率的样本大小。用于推导协方差矩阵的样本量越小,导出0.95概率的因子越大,群体0.95统计量小于分解的样本统计量。