Question

我有一个矩阵，其中一列包含数据（每秒一个样本），另一列包含时间戳，以秒为单位。有几秒钟，数据不会从最后一个数据发生变化，因此不会出现在向量上。我想在时间间隔（例如30秒）中应用一个函数，例如简单的均值。但是对于这个我必须计算缺少的秒数。最好的方法是什么？

首先创建一个包含重复元素的矩阵（我还希望包含缺失秒的正确时间戳 - 最难的部分），然后再计算均值;

或

使用循环（我认为最糟糕的方式）在插入缺失样本时计算平均值;

提前致谢！

ps。：OR是否可以将函数应用于识别和自动引入（通过重复）缺失数据的数据？

Answer 1

您可以使用diff和sum的组合来通过加权平均来包含“缺失”条目：

% time step
step = 1;

% Example data (with repeated elements)
A = [...
     1 80.6
     2 79.8
     3 40.3
     4 40.3
     5 81.9
     6 83.6
     7 83.7
     8 95.4
     9 14.8
    10 14.8
    11 14.8
    12 14.8
    13 14.8
    14 44.3];

% Example data, with the repeated elements removed
B = [...
     1 80.6
     2 79.8
     3 40.3     
     5 81.9
     6 83.6
     7 83.7
     8 95.4
     9 14.8    
    14 44.3];

% The correct value
M = mean(A(:,2))

% The correct value for the pruned data
D = diff(B(:,1));
W = [D/step; 1]; 
M = sum( W .* B(:,2))/sum(W)

结果：

M1 =
    5.027857142857141e+001
M2 =
    5.027857142857143e+001

或者，您可以通过运行长度编码从缩写A重新创建完整的向量B。你可以这样有效地做到这一点：

W = [diff(B(:,1))/step; 1];
idx([cumsum([true; W(W>0)])]) = true;

A_new = [ (B(1,1):step:B(end,1)).'  B(cumsum(idx(1:find(idx,1,'last')-1)),2) ];

Answer 2

您可以为每个样本提供一个反映实际代表的样本数量的权重。可以使用diff：

计算此类权重

data = [1 1; 0 2; 3 5; 4 7]; % Example data. Second column is timestamp

weights = diff([data(:,2); data(end,2)+1]); % We assume the last sample
% only represents itself
result = sum(data(:,1).*weights)/sum(weights);

MATLAB：使用不断增加的时间戳重复向量中的元素

2 个答案: