使用NAN-MATLAB填充缺少的时间戳数据行

时间:2016-12-28 14:50:05

标签: arrays matlab timestamp time-series

我有一个数据集,其中缺少一些时间戳。我到目前为止编写了代码,

x = table2dataset(Testing_data);
T1 = x(:,1);              
C1 =dataset2cell(T1);
formatIn = 'yyyy-mm-dd HH:MM:SS';
t1= datenum(C1,formatIn);

% Creating 10 minutes of time interval;
avg = 10/60/24;        
tnew = [t1(1):avg:t1(end)]';
indx = round((t1-t1(1))/avg) + 1;
ynew = NaN(length(tnew),1);
ynew(indx)=t1;

% replacing missing time with NaN
t = datetime(ynew,'ConvertFrom','datenum');                 
formatIn = 'yyyy-mm-dd HH:MM:SS';
DateVector = datevec(ynew,formatIn);
dt = datestr(ynew,'yyyy-mm-dd HH:MM:SS');
ds = string(dt);

测试数据有三个参数,

     Time                       x          y
2009-04-10 02:00:00.000         1         0.1
2009-04-10 02:10:00.000         2         0.2
2009-04-10 02:30:00.000         3         0.3
2009-04-10 02:50:00.000         4         0.4

现在您可以看到,对于10分钟的间隔,缺少时间戳(2:20和2:40),所以我想添加该时间戳。然后,我希望xy值为NAN。所以我的输出就像是,

       Time                     x          y
2009-04-10 02:00:00.000         1         0.1
2009-04-10 02:10:00.000         2         0.2
2009-04-10 02:20:00.000         NaN       NaN
2009-04-10 02:30:00.000         3         0.3    
2009-04-10 02:40:00.000         NaN       NaN
2009-04-10 02:50:00.000         4         0.4

从我的代码中可以看出,我只能添加带有时间戳的NaN,但现在想要获取我想要的相应x和y值。

请注意我上面的格式有超过3000个数据行,我想对我的所有值执行相同的操作。

2 个答案:

答案 0 :(得分:0)

你的问题似乎是矛盾的;你说你可以插入NaN来代替缺少的时间字符串,但是在预期输出的例子中你写了时间字符串。

你也提到缺少时间戳(2:20)但是,如果时间步长是10分钟,在你的示例数据中还有另一个缺失的时间戳(2:40)

假设:

  • 你真的想插入缺少时间的刺痛
  • 您想要管理所有缺失的时间戳

您可以按如下方式修改代码:

  • 不需要ynew时间
  • 应使用tnew时间代替ynew
  • 要在NaNx列中插入y值,您需要:
    • dataset
    • 中提取它们
    • 创建两个新数组,将它们初始化为NaN
    • 将原始xy数据插入indx
    • 标识的位置

在下面,您可以找到代码的更新版本。

  • xy数据存储在x_datay_data数组
  • 新的xy数据存储在x_data_newy_data_new数组

在脚本的末尾,生成两个表:第一个使用时间string生成,第二个生成为cellarray。

代码中的注释应标识修改。

x = table2dataset(Testing_data);
T1 = x(:,1);
% Get X data from the table
x_data=x(:,2)
% Get Y data from the table
y_data=x(:,3)

C1 =dataset2cell(T1);

formatIn = 'yyyy-mm-dd HH:MM:SS';
t1= datenum(C1(2:end),formatIn)

avg = 10/60/24;        % Creating 10 minutes of time interval;
tnew = [t1(1):avg:t1(end)]'
indx = round((t1-t1(1))/avg) + 1
%
% Not Needed
%
% ynew = NaN(length(tnew),1);
% ynew(indx)=t1;
%
% Create the new X and Y data
%
y_data_new = NaN(length(tnew),1)
y_data_new(indx)=t1

x_data_new=nan(length(tnew),1)
x_data_new(indx)=x_data
y_data_new=nan(length(tnew),1)
y_data_new(indx)=y_data

% t = datetime(ynew,'ConvertFrom','datenum')  % replacing missing time with NAN
%
% Use tnew instead of ynew
%
t = datetime(tnew,'ConvertFrom','datenum')  % replacing missing time with NAN
formatIn = 'yyyy-mm-dd HH:MM:SS'
% DateVector = datevec(y_data_new,formatIn)
% dt = datestr(ynew,'yyyy-mm-dd HH:MM:SS')
%
% Use tnew instead of ynew
%
dt = datestr(tnew,'yyyy-mm-dd HH:MM:SS')
% ds = char(dt)

new_table=table(dt,x_data_new,y_data_new)
new_table_1=table(cellstr(dt),x_data_new,y_data_new)

输出

new_table = 

        dt         x_data_new    y_data_new
    ___________    __________    __________

    [1x19 char]      1           0.1       
    [1x19 char]      2           0.2       
    [1x19 char]    NaN           NaN       
    [1x19 char]      3           0.3       
    [1x19 char]    NaN           NaN       
    [1x19 char]      4           0.4       


new_table_1 = 

            Var1             x_data_new    y_data_new
    _____________________    __________    __________

    '2009-04-10 02:00:00'      1           0.1       
    '2009-04-10 02:10:00'      2           0.2       
    '2009-04-10 02:20:00'    NaN           NaN       
    '2009-04-10 02:30:00'      3           0.3       
    '2009-04-10 02:40:00'    NaN           NaN       
    '2009-04-10 02:50:00'      4           0.4   

希望这有帮助。

Qapla'

答案 1 :(得分:0)

这个例子与接受的答案并没有太大的不同,但恕我直言的眼睛更容易一些。但是,它支持大于1步的间隙,并且更通用,因为它做的假设更少。

它适用于普通单元格数组而不是原始表数据,因此转换由您决定(我在R2010a上进行转换,因此无法对其进行测试)

% Example data with intentional gaps of varying size
old_data = {'2009-04-10 02:00:00.000'  1   0.1
            '2009-04-10 02:10:00.000'  2   0.2
            '2009-04-10 02:30:00.000'  3   0.3
            '2009-04-10 02:50:00.000'  4   0.4
            '2009-04-10 03:10:00.000'  5   0.5
            '2009-04-10 03:20:00.000'  6   0.6
            '2009-04-10 03:50:00.000'  7   0.7}


% Convert textual dates to numbers we can work with more easily
old_dates = datenum(old_data(:,1));

% Nominal step size is the minimum of all differences
deltas = diff(old_dates);
nominal_step = min(deltas);

% Generate new date numbers with constant step
new_dates = old_dates(1) : nominal_step : old_dates(end);

% Determine where the gaps in the data are, and how big they are,
% taking into account rounding error
step_gaps = abs(deltas - nominal_step) > 10*eps;
gap_sizes = round( deltas(step_gaps) / nominal_step - 1);

% Create new data structure with constant-step time stamps, 
% initially with the data of interest all-NAN
new_size = size(old_data,1) + sum(gap_sizes);
new_data = [cellstr( datestr(new_dates, 'yyyy-mm-dd HH:MM:SS') ),...
            repmat({NaN}, new_size, 2)];

% Compute proper locations of the old data in the new data structure, 
% again, taking into account rounding error
day = 86400; % (seconds in a day)
new_datapoint = ismember(round(new_dates * day), ...
                         round(old_dates * day));

% Insert the old data at the right locations
new_data(new_datapoint, 2:3) = data(:, 2:3)

输出是:

old_data = 
    '2009-04-10 02:00:00.000'    [1]    [0.100000000000000]
    '2009-04-10 02:10:00.000'    [2]    [0.200000000000000]
    '2009-04-10 02:30:00.000'    [3]    [0.300000000000000]
    '2009-04-10 02:50:00.000'    [4]    [0.400000000000000]
    '2009-04-10 03:10:00.000'    [5]    [0.500000000000000]
    '2009-04-10 03:20:00.000'    [6]    [0.600000000000000]
    '2009-04-10 03:50:00.000'    [7]    [0.700000000000000]

new_data = 
    '2009-04-10 02:00:00'    [  1]    [0.100000000000000]
    '2009-04-10 02:10:00'    [  2]    [0.200000000000000]
    '2009-04-10 02:20:00'    [NaN]    [              NaN]
    '2009-04-10 02:30:00'    [  3]    [0.300000000000000]
    '2009-04-10 02:40:00'    [NaN]    [              NaN]
    '2009-04-10 02:50:00'    [  4]    [0.400000000000000]
    '2009-04-10 03:00:00'    [NaN]    [              NaN]
    '2009-04-10 03:10:00'    [  5]    [0.500000000000000]
    '2009-04-10 03:20:00'    [  6]    [0.600000000000000]
    '2009-04-10 03:30:00'    [NaN]    [              NaN]
    '2009-04-10 03:40:00'    [NaN]    [              NaN]
    '2009-04-10 03:50:00'    [  7]    [0.700000000000000]