移动窗口方法聚合数据

时间:2019-09-20 02:59:20

标签: r loops aggregate

我有下面的矩阵:

 mat<- matrix(c(1,0,0,0,0,0,1,0,0,0,0,0,0,0,2,0,
       2,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,
       0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,
       0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,
       0,0,0,0,1,0,0,1,0,1,1,0,0,1,0,1,
       1,1,0,0,0,0,0,0,1,0,1,2,1,0,0,0), nrow=16, ncol=6)
 dimnames(mat)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"), 
              c("1", "2", "3", "4", "5", "6"))

我需要使用移动窗口方法聚合列。首先,窗口大小将为2,以使窗口由2列组成。此汇总采用行总和。窗口将移动一步,然后再次求和。对于提供的示例数据帧,要聚合的第一列是列1&2,第二个窗口将合并列2&3,然后3&4,然后4&5和5&6。

这些结果(每个聚合的行总和)被放入一个矩阵。在这个矩阵中,行是保守的,列现在代表每个聚合的结果。

接下来,移动窗口的大小将增加为3。这样一来,将3列数据合并(求和)。同样,窗口会移动1步。对于所提供的示例数据帧,要聚合的第一列为列1-2-3,第二个窗口将合并列2-3-4,然后合并3-4-5、4-5-6。结果放入单独的矩阵中。

移动窗口的大小将继续增加,直到该窗口成为所有列的大小为止。在此示例中,最大的窗口合并了所有6个图。

在给定mat以上的示例矩阵的情况下,下面是窗口大小2和3的结果矩阵。列根据添加的列来命名。

#Window length =2 
mat1<- matrix( c(3,0,0,0,1,0,1,0,0,0,0,0,0,0,2,0,
         2,0,1,1,2,0,0,0,0,0,0,0,0,0,1,0,
         0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,
         0,1,0,0,1,1,0,1,0,1,1,0,0,1,0,1,
         1,1,0,0,1,0,0,1,1,1,2,2,1,1,0,1), nrow=16)
dimnames(mat1)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"), 
              c("1_2", "2_3", "3_4", "4_5", "5_6"))

 #Window length 3
 mat8<- matrix( c(3,0,1,1,2,0,1,0,0,0,0,0,0,0,3,0,
         2,1,1,1,2,1,0,0,0,0,0,0,0,0,1,0,
         0,1,1,1,2,1,0,1,0,1,1,0,0,1,0,1,
         1,2,0,0,1,1,0,1,1,1,2,2,1,1,0,1), nrow=16)
 dimnames(mat8)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"), 
              c("1_2_3", "2_3_4", "3_4_5", "4_5_6"))

在我的示例中,我有6列,因此总共有5个结果矩阵。如果我有600列数据,我认为循环是迭代大型数据集的最有效方法。

1 个答案:

答案 0 :(得分:2)

这是基数R中的一种方式

lapply(seq_len(ncol(mat) - 1), function(j) do.call(cbind, 
   lapply(seq_len(ncol(mat) - j), function(i) rowSums(mat[, i:(i + j)]))))


#[[1]]
#  [,1] [,2] [,3] [,4] [,5]
#a    3    2    0    0    1
#c    0    0    1    1    1
#f    0    1    1    0    0
#h    0    1    1    0    0
#i    1    2    1    1    1
#j    0    0    1    1    0
#l    1    0    0    0    0
#m    0    0    0    1    1
#p    0    0    0    0    1
#q    0    0    0    1    1
#s    0    0    0    1    2
#t    0    0    0    0    2
#u    0    0    0    0    1
#v    0    0    0    1    1
#x    3    1    0    0    0
#z    0    0    0    1    1

#[[2]]
#  [,1] [,2] [,3] [,4]
#a    3    2    0    1
#c    0    1    1    2
#f    1    1    1    0
#h    1    1    1    0
#i    2    2    2    1
#j    0    1    1    1
#l    1    0    0    0
#m    0    0    1    1
#p    0    0    0    1
#q    0    0    1    1
#s    0    0    1    2
#t    0    0    0    2
#u    0    0    0    1
#v    0    0    1    1
#x    3    1    0    0
#z    0    0    1    1
#....

由于这是滚动操作,因此我们也可以使用rollapply中的zoo并使用可变的窗口宽度

lapply(2:ncol(mat), function(j)
    t(zoo::rollapply(seq_len(ncol(mat)), j, function(x) rowSums(mat[,x]))))