R:将不完整的时间序列转换为矩阵

时间:2016-11-07 20:20:33

标签: r matrix time-series

我有一个月度时间序列,应该转换成矩阵。 (6年,但过去4个月不见了!)

       Jan      Feb      Mar      Apr      May      Jun      Jul      Aug      Sep      Oct      Nov      Dec
1 62.17416 66.34748 69.58154 64.92033 70.51246 70.83153 74.19475 72.29366 77.97062 71.37903 67.08182 69.10599
2 59.41197 61.44036 68.97711 72.43558 69.92715 74.20776 79.19150 71.48650 77.56661 67.44390 67.05218 66.43690
3 58.79278 63.26696 70.97837 71.79423 72.04037 71.61184 72.64526 70.38479 72.43492 64.42046 66.87742 66.31345
4 58.06906 60.35554 70.91920 65.76937 68.85464 68.77921 70.09373 66.03685 71.38322 66.42558 67.62292 70.25223
5 60.59774 64.66411 73.27853 73.11031 73.57735 73.77766 78.45258 74.64698 79.07911 75.84108 75.82478 79.42484
6 69.30907 70.80211 78.38037 77.63537 79.67396 80.15397 80.45571 76.67389 

时间序列的最后一年缺少一些值。 转换为矩阵时,该函数使用时间序列中的第一个替换缺失值。

> matrix(xx,nrow=6,ncol=12,byrow = TRUE)
         [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]     [,9]    [,10]    [,11]    [,12]
[1,] 62.17416 66.34748 69.58154 64.92033 70.51246 70.83153 74.19475 72.29366 77.97062 71.37903 67.08182 69.10599
[2,] 59.41197 61.44036 68.97711 72.43558 69.92715 74.20776 79.19150 71.48650 77.56661 67.44390 67.05218 66.43690
[3,] 58.79278 63.26696 70.97837 71.79423 72.04037 71.61184 72.64526 70.38479 72.43492 64.42046 66.87742 66.31345
[4,] 58.06906 60.35554 70.91920 65.76937 68.85464 68.77921 70.09373 66.03685 71.38322 66.42558 67.62292 70.25223
[5,] 60.59774 64.66411 73.27853 73.11031 73.57735 73.77766 78.45258 74.64698 79.07911 75.84108 75.82478 79.42484
[6,] 69.30907 70.80211 78.38037 77.63537 79.67396 80.15397 80.45571 76.67389 62.17416 66.34748 69.58154 64.92033

如何避免这种情况或用“NA”替换最后的值?

1 个答案:

答案 0 :(得分:1)

假设xx是最后注释中给出的时间序列。

1)然后我们可以将其转换为矩阵,而不知道有多少缺少的项目如下:

tt <- time(xx)
tapply(xx, list(floor(tt), cycle(tt)), c)

给出这个6 x 12矩阵:

            1        2        3        4        5        6        7        8
2011 62.17416 66.34748 69.58154 64.92033 70.51246 70.83153 74.19475 72.29366
2012 59.41197 61.44036 68.97711 72.43558 69.92715 74.20776 79.19150 71.48650
2013 58.79278 63.26696 70.97837 71.79423 72.04037 71.61184 72.64526 70.38479
2014 58.06906 60.35554 70.91920 65.76937 68.85464 68.77921 70.09373 66.03685
2015 60.59774 64.66411 73.27853 73.11031 73.57735 73.77766 78.45258 74.64698
2016 69.30907 70.80211 78.38037 77.63537 79.67396 80.15397 80.45571 76.67389
            9       10       11       12
2011 77.97062 71.37903 67.08182 69.10599
2012 77.56661 67.44390 67.05218 66.43690
2013 72.43492 64.42046 66.87742 66.31345
2014 71.38322 66.42558 67.62292 70.25223
2015 79.07911 75.84108 75.82478 79.42484
2016       NA       NA       NA       NA

2)另一种可能性是将长度强制转换为12的下一个最高倍数,以便它完全填充矩阵并重新形成它。这仅适用于要添加的NA最终的情况,如问题中的情况。

m <- xx # copy xx in case we still need it later
freq <- frequency(xx)  # 12
length(m) <- freq * ceiling(length(m) / freq)  # pads with NAs to achieve length
m <- matrix(m,, freq, byrow = TRUE) # reshape m into a matrix

,并提供:

> m
         [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
[1,] 62.17416 66.34748 69.58154 64.92033 70.51246 70.83153 74.19475 72.29366
[2,] 59.41197 61.44036 68.97711 72.43558 69.92715 74.20776 79.19150 71.48650
[3,] 58.79278 63.26696 70.97837 71.79423 72.04037 71.61184 72.64526 70.38479
[4,] 58.06906 60.35554 70.91920 65.76937 68.85464 68.77921 70.09373 66.03685
[5,] 60.59774 64.66411 73.27853 73.11031 73.57735 73.77766 78.45258 74.64698
[6,] 69.30907 70.80211 78.38037 77.63537 79.67396 80.15397 80.45571 76.67389
         [,9]    [,10]    [,11]    [,12]
[1,] 77.97062 71.37903 67.08182 69.10599
[2,] 77.56661 67.44390 67.05218 66.43690
[3,] 72.43492 64.42046 66.87742 66.31345
[4,] 71.38322 66.42558 67.62292 70.25223
[5,] 79.07911 75.84108 75.82478 79.42484
[6,]       NA       NA       NA       NA

注意:可重复形式的输入时间序列xx为:

xx <- ts(c(62.17416, 66.34748, 69.58154, 64.92033, 70.51246, 
70.83153, 74.19475, 72.29366, 77.97062, 71.37903, 67.08182, 69.10599, 
59.41197, 61.44036, 68.97711, 72.43558, 69.92715, 74.20776, 79.1915, 
71.4865, 77.56661, 67.4439, 67.05218, 66.4369, 58.79278, 63.26696, 
70.97837, 71.79423, 72.04037, 71.61184, 72.64526, 70.38479, 72.43492, 
64.42046, 66.87742, 66.31345, 58.06906, 60.35554, 70.9192, 65.76937, 
68.85464, 68.77921, 70.09373, 66.03685, 71.38322, 66.42558, 67.62292, 
70.25223, 60.59774, 64.66411, 73.27853, 73.11031, 73.57735, 73.77766, 
78.45258, 74.64698, 79.07911, 75.84108, 75.82478, 79.42484, 69.30907, 
70.80211, 78.38037, 77.63537, 79.67396, 80.15397, 80.45571, 76.67389
), start = c(2011, 1), freq = 12)