我是卡尔曼滤波器的新手,并试图将其用于预测缺失值以及从GPS数据(经度和纬度)中获得平滑的观察结果。
我正在使用pykalman,我的代码块如下所示:
data = data[['Lat', 'Lon']]
measurements = np.asarray(data, dtype='float')
measurements_masked = np.ma.masked_invalid(measurements)
# initial state of the form [x0, x0_dot, x1, x1_dot]
initial_state_mean = [
measurements[0, 0],
0,
measurements[0, 1],
0
]
initial_state_covariance = [[ 10, 0, 0, 0],
[ 0, 1, 0, 0],
[ 0, 0, 1, 0],
[ 0, 0, 0, 1]]
# transition matrix to estimate new position given old position
transition_matrix = [
[1, 1, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 1],
[0, 0, 0, 1]
]
observation_matrix = [
[1, 0, 0, 0],
[0, 0, 1, 0]
]
kf = KalmanFilter(
transition_matrices=transition_matrix,
observation_matrices=observation_matrix,
initial_state_mean=initial_state_mean,
)
filtered_state_means = np.zeros((len(measurements), 4))
filtered_state_covariances = np.zeros((len(measurements), 4, 4))
for i in range(len(measurements)):
if i == 0:
filtered_state_means[i] = initial_state_mean
filtered_state_covariances[i] = initial_state_covariance
else:
filtered_state_means[i], filtered_state_covariances[i] = (
kf.filter_update(
filtered_state_means[i-1],
filtered_state_covariances[i-1],
observation = measurements_masked[i])
)
其中数据是从中提取纬度和经度的熊猫数据框。
这种逻辑正确吗?另外,我想做的是获取更接近缺失观测值的观测值,以预测缺失值。例如,如果在10个样本的数组中缺少第5个,第6个和第7个观测值,则使用第4个样本预测第5个,使用第8个样本预测第7个,并取第5个和第7个的平均值来预测第6个更有意义
这种方法有意义吗?如果是,如何使用pykalman?如果不是,那么在缺少数组中许多连续值的地方该怎么做才能更准确地预测缺失值?
答案 0 :(得分:1)
我认为卡尔曼滤波器非常适合您想要的东西。以下是一些虚拟数据的示例,其中我屏蔽了(隐藏)了来自过滤器的一些样本/测量值。如您所见,KF很好地重建了中间缺失的3个点。 KF将考虑以下事实:更接近特定时间戳的观测值与通过估计的动力学估算该时间戳最相关。
这有点乐观,因为输入数据与KF中所做的假设完全吻合(物体以恒定速度运动)。请注意,当速度实际变化时,KF也应该工作良好。我已经在pykalman
库中的https://stackoverflow.com/a/43568887/4988601上发布了以前的较长答案,这可能有助于理解KF的工作原理。
import numpy as np
import matplotlib.pyplot as plt
from pykalman import KalmanFilter
# Some dummy values, assume we're heading in straightline
# at constant speed
lat_ideal = np.array(range(10))
lon_ideal = np.array(lat_ideal*3.5 + 10)
lat = lat_ideal + np.random.uniform(-0.5, 0.5, 10)
lon = lon_ideal + np.random.uniform(-0.5, 0.5, 10)
# Assing some indexes as missing
measurementMissingIdx = [False, False, False, False, True, True, True, False, False, False]
# Create the starte measurement matrix and mark some of the time-steps
# (rows) as missing (masked)
measurements = np.ma.asarray([lat, lon]).transpose()
measurements[measurementMissingIdx] = np.ma.masked
# Kalman filter settings:
# state vector is [lat, lat_dot, lon, lon_dot]
Transition_Matrix=[[1,1,0,0],[0,1,0,0],[0,0,1,1],[0,0,0,1]]
Observation_Matrix=[[1,0,0,0],[0,0,1,0]]
initial_state_mean = [measurements[0, 0], 0,
measurements[0, 1], 0]
kf=KalmanFilter(transition_matrices=Transition_Matrix,
observation_matrices =Observation_Matrix,
em_vars=['initial_state_covariance', 'initial_state_mean'
'transition_covariance', 'observation_covariance'])
kf.em(measurements, n_iter=5)
# Increase observation co-variance
kf.observation_covariance = kf.observation_covariance*10
(smoothed_state_means, smoothed_state_covariances) = kf.smooth(measurements)
plt.plot(lat_ideal,lon_ideal,'sb', label='ideal values', markerfacecolor='none')
plt.plot(measurements[:,0],measurements[:,1],'og',label='input measurements', markerfacecolor='none')
plt.plot(smoothed_state_means[:,0],smoothed_state_means[:,2],'xr',label='kalman output')
plt.xlabel("Latitude")
plt.ylabel("Longitude")
legend = plt.legend(loc=2)
plt.title("Constant Velocity Kalman Filter")
plt.show()
哪个产生以下图形: