Matlab日期两组数据之间不匹配。救命!

时间:2011-10-27 16:24:02

标签: matlab date join merge dataset

Matlab日期两组数据之间不匹配。救命啊!

请原谅问题的简单性,但这是我的第一天;

我正在处理两组时间序列:1)自1977年以来S& P 500的价格(每日收盘和日期)和2)自1977年以来的债券收益率(每日收盘价和日期)。

问题是,几个月后日期不再相互一致(可能债券市场有一天关闭,股票市场开盘等)所以我有两个不再正确对齐的数据集。在我开始询问如何更换间隙之前(当我到达那个桥时我将使用平均值),我需要知道如何让matlab调整两个证券的日期,以便我至少知道差距在哪里对于每个证券,即一个证券在哪个日期错过价格。我正在考虑创建一个我自己的(或使用其中一个证券的日期)日历列,然后使用它作为基准日期列,以标注最终输出并将价格与新数据相匹配...也许这是错误的思考方式,但任何帮助将不胜感激:)

2 个答案:

答案 0 :(得分:9)

基本上,您希望根据日期作为关键字对两个数据集执行完整的outer merge

请考虑以下示例:

%# vector of dates (serial datetime)
days = datenum( num2str((1:31)','2011-10-%02d') );   %'# one month (October 2011)

%# lets build two datasets similar to what you described
idx1 = rand(size(days)) > 0.2;                %# randomly pick dates for 1st
M1 = [days(idx1) rand(sum(idx1),2)*1000];     %# sotcks: days,opening,closing
idx2 = rand(size(days)) > 0.5;                %# randomly pick dates for 2nd
M2 = [days(idx2) rand(sum(idx2),2)*1000];     %# bonds: days,opening,closing

%# get the full range of dates, and convert them to indices starting at 1
[allDays,~,ind] = unique( [M1(:,1);M2(:,1)] );
indM1 = ind(1:size(M1,1));
indM2 = ind(size(M1,1)+1:end);

%# merge the two datasets (days,opening,closing,opening,closing)
M = nan(numel(allDays),size(M1,2)+size(M2,2)-1);
M(:,1) = allDays;                   %# available days from both
M(indM1,2:3) = M1(:,2:3);           %# insert 1st dataset values
M(indM2,4:5) = M2(:,2:3);           %# insert 2nd dataset values

%# final merged dataset formatted
C = [cellstr(datestr(M(:,1),'yyyy-mm-dd')) num2cell(M(:,2:end))]

结果:

C = 
    '2011-10-01'    [     NaN]    [     NaN]    [332.5714]    [241.5017]
    '2011-10-03'    [941.9189]    [ 86.8151]    [     NaN]    [     NaN]
    '2011-10-04'    [655.9138]    [429.3973]    [     NaN]    [     NaN]
    '2011-10-05'    [451.9457]    [257.2828]    [853.0636]    [243.1452]
    '2011-10-06'    [839.6974]    [297.5554]    [     NaN]    [     NaN]
    '2011-10-07'    [532.6235]    [424.8584]    [     NaN]    [     NaN]
    '2011-10-09'    [553.8871]    [119.2073]    [     NaN]    [     NaN]
    '2011-10-11'    [680.0655]    [495.0669]    [442.3979]    [154.1594]
    '2011-10-13'    [367.1899]    [706.4072]    [904.3555]    [956.4164]
    '2011-10-14'    [     NaN]    [     NaN]    [ 33.1794]    [935.6614]
    '2011-10-15'    [239.2906]    [243.5734]    [     NaN]    [     NaN]
    '2011-10-16'    [578.9235]    [785.0701]    [532.4265]    [818.7144]
    '2011-10-17'    [866.8871]    [ 74.0896]    [716.4973]    [728.2618]
    '2011-10-18'    [406.7768]    [393.8834]    [179.3018]    [175.8117]
    '2011-10-19'    [112.6151]    [  3.3941]    [336.5329]    [360.3710]
    '2011-10-20'    [443.8458]    [220.6769]    [     NaN]    [     NaN]
    '2011-10-21'    [     NaN]    [     NaN]    [187.7129]    [188.7900]
    '2011-10-22'    [300.1844]    [  1.3006]    [     NaN]    [     NaN]
    '2011-10-23'    [401.3869]    [189.1797]    [     NaN]    [     NaN]
    '2011-10-24'    [833.3636]    [142.4841]    [321.9272]    [  1.1984]
    '2011-10-25'    [     NaN]    [     NaN]    [403.8567]    [316.4195]
    '2011-10-26'    [403.6287]    [268.0760]    [     NaN]    [     NaN]
    '2011-10-27'    [390.1759]    [174.8921]    [     NaN]    [     NaN]
    '2011-10-28'    [     NaN]    [     NaN]    [548.5663]    [699.6170]
    '2011-10-29'    [360.4489]    [138.6490]    [ 48.7386]    [625.2552]
    '2011-10-30'    [140.2554]    [598.8856]    [552.7321]    [543.0622]
    '2011-10-31'    [260.1302]    [901.0579]    [274.8114]    [439.0372]

合并后的结果包含两个数据集的开盘价/收盘价。当其中一个在特定日期不可用时,它将被NaN替换。请注意结果中有一些未表示的天数,这是因为这两天的数据集都没有列出价格。


或者,您可以从统计工具箱(专为此类情况设计)中查看dataset类。使用相同的例子:

%# build dataset object for the two sets
varNames1 = {'days' 'stock_open' 'stock_close'};
varNames2 = {'days' 'bond_open' 'bond_close'};
d1 = dataset([M1, varNames1]);
d2 = dataset([M2, varNames2]);

%# join on days (full-outer join)
d = join(d1,d2, 'keys','days', 'type','fullouter', 'MergeKeys',true);
d.days = datestr(d.days,'yyyy-mm-dd');   %# format the days column as string

结果:

d = 
    days          stock_open    stock_close    bond_open    bond_close
    2011-10-01       NaN           NaN         332.57        241.5    
    2011-10-03    941.92        86.815            NaN          NaN    
    2011-10-04    655.91         429.4            NaN          NaN    
    2011-10-05    451.95        257.28         853.06       243.15    
    2011-10-06     839.7        297.56            NaN          NaN    
    2011-10-07    532.62        424.86            NaN          NaN    
    2011-10-09    553.89        119.21            NaN          NaN    
    2011-10-11    680.07        495.07          442.4       154.16    
    2011-10-13    367.19        706.41         904.36       956.42    
    2011-10-14       NaN           NaN         33.179       935.66    
    2011-10-15    239.29        243.57            NaN          NaN    
    2011-10-16    578.92        785.07         532.43       818.71    
    2011-10-17    866.89         74.09          716.5       728.26    
    2011-10-18    406.78        393.88          179.3       175.81    
    2011-10-19    112.62        3.3941         336.53       360.37    
    2011-10-20    443.85        220.68            NaN          NaN    
    2011-10-21       NaN           NaN         187.71       188.79    
    2011-10-22    300.18        1.3006            NaN          NaN    
    2011-10-23    401.39        189.18            NaN          NaN    
    2011-10-24    833.36        142.48         321.93       1.1984    
    2011-10-25       NaN           NaN         403.86       316.42    
    2011-10-26    403.63        268.08            NaN          NaN    
    2011-10-27    390.18        174.89            NaN          NaN    
    2011-10-28       NaN           NaN         548.57       699.62    
    2011-10-29    360.45        138.65         48.739       625.26    
    2011-10-30    140.26        598.89         552.73       543.06    
    2011-10-31    260.13        901.06         274.81       439.04   

编辑:

假设您有以下两个包含数据的文件:

bonds.csv

10/6/1977 7.72 7.72
10/7/1977 7.73 7.73
10/11/1977 7.77 7.77
10/12/1977 7.79 7.79
10/13/1977 7.79 7.79
10/14/1977 7.79 7.79
10/17/1977 7.79 7.79
10/18/1977 7.8 7.8

stocks.csv

10/06/77 95.68 96.05
10/07/77 96.05 95.97
10/10/77 95.97 95.75
10/11/77 95.75 94.93
10/12/77 94.82 94.04
10/13/77 94.04 93.46
10/14/77 93.46 93.56
10/17/77 93.56 93.47

您可以使用TEXTSCAN功能读取数据:

%# read bonds data
fid = fopen('bonds.csv','rt');
C = textscan(fid, '%s %f %f', 'Delimiter',' ', 'CollectOutput',true);
fclose(fid);
M1 = [datenum(C{1},'mm/dd/yyyy') C{2}];

%# read stocks data
fid = fopen('stocks.csv','rt');
C = textscan(fid, '%s %f %f', 'Delimiter',' ', 'CollectOutput',true);
fclose(fid);
M2 = [datenum(C{1},'mm/dd/yy') C{2}];

现在您可以使用上面相同的代码(从“获取完整的日期范围......”开始,或使用DATASET类)。加入后,这给了我:

C = 
    '1977-10-06'    [7.72]    [7.72]    [95.68]    [96.05]
    '1977-10-07'    [7.73]    [7.73]    [96.05]    [95.97]
    '1977-10-10'    [ NaN]    [ NaN]    [95.97]    [95.75]
    '1977-10-11'    [7.77]    [7.77]    [95.75]    [94.93]
    '1977-10-12'    [7.79]    [7.79]    [94.82]    [94.04]
    '1977-10-13'    [7.79]    [7.79]    [94.04]    [93.46]
    '1977-10-14'    [7.79]    [7.79]    [93.46]    [93.56]
    '1977-10-17'    [7.79]    [7.79]    [93.56]    [93.47]
    '1977-10-18'    [ 7.8]    [ 7.8]    [  NaN]    [  NaN]

答案 1 :(得分:3)

如果您只使用其中一个系列中的日期,则可能会出现问题,因为每个系列中的日期可能都有另一个中缺少的日期。我要做的是从一个干净的3列矩阵开始,该矩阵包含日期范围内的所有工作日。 This post on the Mathworks blog可以提供有关如何操作的一些见解。然后使用两个数据系列中的值填充另外两列。通过这种方式,您可以确保所有值都在矩阵中,如果您决定添加更多数据,这将使您的生活变得更加简单。

至于填写缺失的日期,您可以使用:the 1-D interpolate function