Matlab日期两组数据之间不匹配。救命啊!
请原谅问题的简单性,但这是我的第一天;
我正在处理两组时间序列:1)自1977年以来S& P 500的价格(每日收盘和日期)和2)自1977年以来的债券收益率(每日收盘价和日期)。
问题是,几个月后日期不再相互一致(可能债券市场有一天关闭,股票市场开盘等)所以我有两个不再正确对齐的数据集。在我开始询问如何更换间隙之前(当我到达那个桥时我将使用平均值),我需要知道如何让matlab调整两个证券的日期,以便我至少知道差距在哪里对于每个证券,即一个证券在哪个日期错过价格。我正在考虑创建一个我自己的(或使用其中一个证券的日期)日历列,然后使用它作为基准日期列,以标注最终输出并将价格与新数据相匹配...也许这是错误的思考方式,但任何帮助将不胜感激:)
答案 0 :(得分:9)
基本上,您希望根据日期作为关键字对两个数据集执行完整的outer merge。
请考虑以下示例:
%# vector of dates (serial datetime)
days = datenum( num2str((1:31)','2011-10-%02d') ); %'# one month (October 2011)
%# lets build two datasets similar to what you described
idx1 = rand(size(days)) > 0.2; %# randomly pick dates for 1st
M1 = [days(idx1) rand(sum(idx1),2)*1000]; %# sotcks: days,opening,closing
idx2 = rand(size(days)) > 0.5; %# randomly pick dates for 2nd
M2 = [days(idx2) rand(sum(idx2),2)*1000]; %# bonds: days,opening,closing
%# get the full range of dates, and convert them to indices starting at 1
[allDays,~,ind] = unique( [M1(:,1);M2(:,1)] );
indM1 = ind(1:size(M1,1));
indM2 = ind(size(M1,1)+1:end);
%# merge the two datasets (days,opening,closing,opening,closing)
M = nan(numel(allDays),size(M1,2)+size(M2,2)-1);
M(:,1) = allDays; %# available days from both
M(indM1,2:3) = M1(:,2:3); %# insert 1st dataset values
M(indM2,4:5) = M2(:,2:3); %# insert 2nd dataset values
%# final merged dataset formatted
C = [cellstr(datestr(M(:,1),'yyyy-mm-dd')) num2cell(M(:,2:end))]
结果:
C =
'2011-10-01' [ NaN] [ NaN] [332.5714] [241.5017]
'2011-10-03' [941.9189] [ 86.8151] [ NaN] [ NaN]
'2011-10-04' [655.9138] [429.3973] [ NaN] [ NaN]
'2011-10-05' [451.9457] [257.2828] [853.0636] [243.1452]
'2011-10-06' [839.6974] [297.5554] [ NaN] [ NaN]
'2011-10-07' [532.6235] [424.8584] [ NaN] [ NaN]
'2011-10-09' [553.8871] [119.2073] [ NaN] [ NaN]
'2011-10-11' [680.0655] [495.0669] [442.3979] [154.1594]
'2011-10-13' [367.1899] [706.4072] [904.3555] [956.4164]
'2011-10-14' [ NaN] [ NaN] [ 33.1794] [935.6614]
'2011-10-15' [239.2906] [243.5734] [ NaN] [ NaN]
'2011-10-16' [578.9235] [785.0701] [532.4265] [818.7144]
'2011-10-17' [866.8871] [ 74.0896] [716.4973] [728.2618]
'2011-10-18' [406.7768] [393.8834] [179.3018] [175.8117]
'2011-10-19' [112.6151] [ 3.3941] [336.5329] [360.3710]
'2011-10-20' [443.8458] [220.6769] [ NaN] [ NaN]
'2011-10-21' [ NaN] [ NaN] [187.7129] [188.7900]
'2011-10-22' [300.1844] [ 1.3006] [ NaN] [ NaN]
'2011-10-23' [401.3869] [189.1797] [ NaN] [ NaN]
'2011-10-24' [833.3636] [142.4841] [321.9272] [ 1.1984]
'2011-10-25' [ NaN] [ NaN] [403.8567] [316.4195]
'2011-10-26' [403.6287] [268.0760] [ NaN] [ NaN]
'2011-10-27' [390.1759] [174.8921] [ NaN] [ NaN]
'2011-10-28' [ NaN] [ NaN] [548.5663] [699.6170]
'2011-10-29' [360.4489] [138.6490] [ 48.7386] [625.2552]
'2011-10-30' [140.2554] [598.8856] [552.7321] [543.0622]
'2011-10-31' [260.1302] [901.0579] [274.8114] [439.0372]
合并后的结果包含两个数据集的开盘价/收盘价。当其中一个在特定日期不可用时,它将被NaN
替换。请注意结果中有一些未表示的天数,这是因为这两天的数据集都没有列出价格。
或者,您可以从统计工具箱(专为此类情况设计)中查看dataset
类。使用相同的例子:
%# build dataset object for the two sets
varNames1 = {'days' 'stock_open' 'stock_close'};
varNames2 = {'days' 'bond_open' 'bond_close'};
d1 = dataset([M1, varNames1]);
d2 = dataset([M2, varNames2]);
%# join on days (full-outer join)
d = join(d1,d2, 'keys','days', 'type','fullouter', 'MergeKeys',true);
d.days = datestr(d.days,'yyyy-mm-dd'); %# format the days column as string
结果:
d =
days stock_open stock_close bond_open bond_close
2011-10-01 NaN NaN 332.57 241.5
2011-10-03 941.92 86.815 NaN NaN
2011-10-04 655.91 429.4 NaN NaN
2011-10-05 451.95 257.28 853.06 243.15
2011-10-06 839.7 297.56 NaN NaN
2011-10-07 532.62 424.86 NaN NaN
2011-10-09 553.89 119.21 NaN NaN
2011-10-11 680.07 495.07 442.4 154.16
2011-10-13 367.19 706.41 904.36 956.42
2011-10-14 NaN NaN 33.179 935.66
2011-10-15 239.29 243.57 NaN NaN
2011-10-16 578.92 785.07 532.43 818.71
2011-10-17 866.89 74.09 716.5 728.26
2011-10-18 406.78 393.88 179.3 175.81
2011-10-19 112.62 3.3941 336.53 360.37
2011-10-20 443.85 220.68 NaN NaN
2011-10-21 NaN NaN 187.71 188.79
2011-10-22 300.18 1.3006 NaN NaN
2011-10-23 401.39 189.18 NaN NaN
2011-10-24 833.36 142.48 321.93 1.1984
2011-10-25 NaN NaN 403.86 316.42
2011-10-26 403.63 268.08 NaN NaN
2011-10-27 390.18 174.89 NaN NaN
2011-10-28 NaN NaN 548.57 699.62
2011-10-29 360.45 138.65 48.739 625.26
2011-10-30 140.26 598.89 552.73 543.06
2011-10-31 260.13 901.06 274.81 439.04
假设您有以下两个包含数据的文件:
10/6/1977 7.72 7.72
10/7/1977 7.73 7.73
10/11/1977 7.77 7.77
10/12/1977 7.79 7.79
10/13/1977 7.79 7.79
10/14/1977 7.79 7.79
10/17/1977 7.79 7.79
10/18/1977 7.8 7.8
10/06/77 95.68 96.05
10/07/77 96.05 95.97
10/10/77 95.97 95.75
10/11/77 95.75 94.93
10/12/77 94.82 94.04
10/13/77 94.04 93.46
10/14/77 93.46 93.56
10/17/77 93.56 93.47
您可以使用TEXTSCAN功能读取数据:
%# read bonds data
fid = fopen('bonds.csv','rt');
C = textscan(fid, '%s %f %f', 'Delimiter',' ', 'CollectOutput',true);
fclose(fid);
M1 = [datenum(C{1},'mm/dd/yyyy') C{2}];
%# read stocks data
fid = fopen('stocks.csv','rt');
C = textscan(fid, '%s %f %f', 'Delimiter',' ', 'CollectOutput',true);
fclose(fid);
M2 = [datenum(C{1},'mm/dd/yy') C{2}];
现在您可以使用上面相同的代码(从“获取完整的日期范围......”开始,或使用DATASET类)。加入后,这给了我:
C =
'1977-10-06' [7.72] [7.72] [95.68] [96.05]
'1977-10-07' [7.73] [7.73] [96.05] [95.97]
'1977-10-10' [ NaN] [ NaN] [95.97] [95.75]
'1977-10-11' [7.77] [7.77] [95.75] [94.93]
'1977-10-12' [7.79] [7.79] [94.82] [94.04]
'1977-10-13' [7.79] [7.79] [94.04] [93.46]
'1977-10-14' [7.79] [7.79] [93.46] [93.56]
'1977-10-17' [7.79] [7.79] [93.56] [93.47]
'1977-10-18' [ 7.8] [ 7.8] [ NaN] [ NaN]
答案 1 :(得分:3)
如果您只使用其中一个系列中的日期,则可能会出现问题,因为每个系列中的日期可能都有另一个中缺少的日期。我要做的是从一个干净的3列矩阵开始,该矩阵包含日期范围内的所有工作日。 This post on the Mathworks blog可以提供有关如何操作的一些见解。然后使用两个数据系列中的值填充另外两列。通过这种方式,您可以确保所有值都在矩阵中,如果您决定添加更多数据,这将使您的生活变得更加简单。
至于填写缺失的日期,您可以使用:the 1-D interpolate function