我有一个数据集,其中缺少一些时间戳。我到目前为止编写了代码,
x = table2dataset(Testing_data);
T1 = x(:,1);
C1 =dataset2cell(T1);
formatIn = 'yyyy-mm-dd HH:MM:SS';
t1= datenum(C1,formatIn);
% Creating 10 minutes of time interval;
avg = 10/60/24;
tnew = [t1(1):avg:t1(end)]';
indx = round((t1-t1(1))/avg) + 1;
ynew = NaN(length(tnew),1);
ynew(indx)=t1;
% replacing missing time with NaN
t = datetime(ynew,'ConvertFrom','datenum');
formatIn = 'yyyy-mm-dd HH:MM:SS';
DateVector = datevec(ynew,formatIn);
dt = datestr(ynew,'yyyy-mm-dd HH:MM:SS');
ds = string(dt);
测试数据有三个参数,
Time x y
2009-04-10 02:00:00.000 1 0.1
2009-04-10 02:10:00.000 2 0.2
2009-04-10 02:30:00.000 3 0.3
2009-04-10 02:50:00.000 4 0.4
现在您可以看到,对于10分钟的间隔,缺少时间戳(2:20和2:40),所以我想添加该时间戳。然后,我希望x
和y
值为NAN
。所以我的输出就像是,
Time x y
2009-04-10 02:00:00.000 1 0.1
2009-04-10 02:10:00.000 2 0.2
2009-04-10 02:20:00.000 NaN NaN
2009-04-10 02:30:00.000 3 0.3
2009-04-10 02:40:00.000 NaN NaN
2009-04-10 02:50:00.000 4 0.4
从我的代码中可以看出,我只能添加带有时间戳的NaN
,但现在想要获取我想要的相应x和y值。
请注意我上面的格式有超过3000个数据行,我想对我的所有值执行相同的操作。
答案 0 :(得分:0)
你的问题似乎是矛盾的;你说你可以插入NaN
来代替缺少的时间字符串,但是在预期输出的例子中你写了时间字符串。
你也提到缺少时间戳(2:20)但是,如果时间步长是10分钟,在你的示例数据中还有另一个缺失的时间戳(2:40)
假设:
您可以按如下方式修改代码:
ynew
时间tnew
时间代替ynew
NaN
和x
列中插入y
值,您需要:
dataset
NaN
x
和y
数据插入indx
在下面,您可以找到代码的更新版本。
x
和y
数据存储在x_data
和y_data
数组x
和y
数据存储在x_data_new
和y_data_new
数组在脚本的末尾,生成两个表:第一个使用时间string
生成,第二个生成为cellarray。
代码中的注释应标识修改。
x = table2dataset(Testing_data);
T1 = x(:,1);
% Get X data from the table
x_data=x(:,2)
% Get Y data from the table
y_data=x(:,3)
C1 =dataset2cell(T1);
formatIn = 'yyyy-mm-dd HH:MM:SS';
t1= datenum(C1(2:end),formatIn)
avg = 10/60/24; % Creating 10 minutes of time interval;
tnew = [t1(1):avg:t1(end)]'
indx = round((t1-t1(1))/avg) + 1
%
% Not Needed
%
% ynew = NaN(length(tnew),1);
% ynew(indx)=t1;
%
% Create the new X and Y data
%
y_data_new = NaN(length(tnew),1)
y_data_new(indx)=t1
x_data_new=nan(length(tnew),1)
x_data_new(indx)=x_data
y_data_new=nan(length(tnew),1)
y_data_new(indx)=y_data
% t = datetime(ynew,'ConvertFrom','datenum') % replacing missing time with NAN
%
% Use tnew instead of ynew
%
t = datetime(tnew,'ConvertFrom','datenum') % replacing missing time with NAN
formatIn = 'yyyy-mm-dd HH:MM:SS'
% DateVector = datevec(y_data_new,formatIn)
% dt = datestr(ynew,'yyyy-mm-dd HH:MM:SS')
%
% Use tnew instead of ynew
%
dt = datestr(tnew,'yyyy-mm-dd HH:MM:SS')
% ds = char(dt)
new_table=table(dt,x_data_new,y_data_new)
new_table_1=table(cellstr(dt),x_data_new,y_data_new)
输出
new_table =
dt x_data_new y_data_new
___________ __________ __________
[1x19 char] 1 0.1
[1x19 char] 2 0.2
[1x19 char] NaN NaN
[1x19 char] 3 0.3
[1x19 char] NaN NaN
[1x19 char] 4 0.4
new_table_1 =
Var1 x_data_new y_data_new
_____________________ __________ __________
'2009-04-10 02:00:00' 1 0.1
'2009-04-10 02:10:00' 2 0.2
'2009-04-10 02:20:00' NaN NaN
'2009-04-10 02:30:00' 3 0.3
'2009-04-10 02:40:00' NaN NaN
'2009-04-10 02:50:00' 4 0.4
希望这有帮助。
Qapla'
答案 1 :(得分:0)
这个例子与接受的答案并没有太大的不同,但恕我直言的眼睛更容易一些。但是,它支持大于1步的间隙,并且更通用,因为它做的假设更少。
它适用于普通单元格数组而不是原始表数据,因此转换由您决定(我在R2010a上进行转换,因此无法对其进行测试)
% Example data with intentional gaps of varying size
old_data = {'2009-04-10 02:00:00.000' 1 0.1
'2009-04-10 02:10:00.000' 2 0.2
'2009-04-10 02:30:00.000' 3 0.3
'2009-04-10 02:50:00.000' 4 0.4
'2009-04-10 03:10:00.000' 5 0.5
'2009-04-10 03:20:00.000' 6 0.6
'2009-04-10 03:50:00.000' 7 0.7}
% Convert textual dates to numbers we can work with more easily
old_dates = datenum(old_data(:,1));
% Nominal step size is the minimum of all differences
deltas = diff(old_dates);
nominal_step = min(deltas);
% Generate new date numbers with constant step
new_dates = old_dates(1) : nominal_step : old_dates(end);
% Determine where the gaps in the data are, and how big they are,
% taking into account rounding error
step_gaps = abs(deltas - nominal_step) > 10*eps;
gap_sizes = round( deltas(step_gaps) / nominal_step - 1);
% Create new data structure with constant-step time stamps,
% initially with the data of interest all-NAN
new_size = size(old_data,1) + sum(gap_sizes);
new_data = [cellstr( datestr(new_dates, 'yyyy-mm-dd HH:MM:SS') ),...
repmat({NaN}, new_size, 2)];
% Compute proper locations of the old data in the new data structure,
% again, taking into account rounding error
day = 86400; % (seconds in a day)
new_datapoint = ismember(round(new_dates * day), ...
round(old_dates * day));
% Insert the old data at the right locations
new_data(new_datapoint, 2:3) = data(:, 2:3)
输出是:
old_data =
'2009-04-10 02:00:00.000' [1] [0.100000000000000]
'2009-04-10 02:10:00.000' [2] [0.200000000000000]
'2009-04-10 02:30:00.000' [3] [0.300000000000000]
'2009-04-10 02:50:00.000' [4] [0.400000000000000]
'2009-04-10 03:10:00.000' [5] [0.500000000000000]
'2009-04-10 03:20:00.000' [6] [0.600000000000000]
'2009-04-10 03:50:00.000' [7] [0.700000000000000]
new_data =
'2009-04-10 02:00:00' [ 1] [0.100000000000000]
'2009-04-10 02:10:00' [ 2] [0.200000000000000]
'2009-04-10 02:20:00' [NaN] [ NaN]
'2009-04-10 02:30:00' [ 3] [0.300000000000000]
'2009-04-10 02:40:00' [NaN] [ NaN]
'2009-04-10 02:50:00' [ 4] [0.400000000000000]
'2009-04-10 03:00:00' [NaN] [ NaN]
'2009-04-10 03:10:00' [ 5] [0.500000000000000]
'2009-04-10 03:20:00' [ 6] [0.600000000000000]
'2009-04-10 03:30:00' [NaN] [ NaN]
'2009-04-10 03:40:00' [NaN] [ NaN]
'2009-04-10 03:50:00' [ 7] [0.700000000000000]