我有下面的矩阵:
mat<- matrix(c(1,0,0,0,0,0,1,0,0,0,0,0,0,0,2,0,
2,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,
0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,
0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,1,0,0,1,0,1,1,0,0,1,0,1,
1,1,0,0,0,0,0,0,1,0,1,2,1,0,0,0), nrow=16, ncol=6)
dimnames(mat)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
c("1", "2", "3", "4", "5", "6"))
我需要使用移动窗口方法聚合列。首先,窗口大小将为2,以使窗口由2列组成。此汇总采用行总和。窗口将移动一步,然后再次求和。对于提供的示例数据帧,要聚合的第一列是列1&2,第二个窗口将合并列2&3,然后3&4,然后4&5和5&6。
这些结果(每个聚合的行总和)被放入一个矩阵。在这个矩阵中,行是保守的,列现在代表每个聚合的结果。
接下来,移动窗口的大小将增加为3。这样一来,将3列数据合并(求和)。同样,窗口会移动1步。对于所提供的示例数据帧,要聚合的第一列为列1-2-3,第二个窗口将合并列2-3-4,然后合并3-4-5、4-5-6。结果放入单独的矩阵中。
移动窗口的大小将继续增加,直到该窗口成为所有列的大小为止。在此示例中,最大的窗口合并了所有6个图。
在给定mat
以上的示例矩阵的情况下,下面是窗口大小2和3的结果矩阵。列根据添加的列来命名。
#Window length =2
mat1<- matrix( c(3,0,0,0,1,0,1,0,0,0,0,0,0,0,2,0,
2,0,1,1,2,0,0,0,0,0,0,0,0,0,1,0,
0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,
0,1,0,0,1,1,0,1,0,1,1,0,0,1,0,1,
1,1,0,0,1,0,0,1,1,1,2,2,1,1,0,1), nrow=16)
dimnames(mat1)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
c("1_2", "2_3", "3_4", "4_5", "5_6"))
#Window length 3
mat8<- matrix( c(3,0,1,1,2,0,1,0,0,0,0,0,0,0,3,0,
2,1,1,1,2,1,0,0,0,0,0,0,0,0,1,0,
0,1,1,1,2,1,0,1,0,1,1,0,0,1,0,1,
1,2,0,0,1,1,0,1,1,1,2,2,1,1,0,1), nrow=16)
dimnames(mat8)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
c("1_2_3", "2_3_4", "3_4_5", "4_5_6"))
在我的示例中,我有6列,因此总共有5个结果矩阵。如果我有600列数据,我认为循环是迭代大型数据集的最有效方法。
答案 0 :(得分:2)
这是基数R中的一种方式
lapply(seq_len(ncol(mat) - 1), function(j) do.call(cbind,
lapply(seq_len(ncol(mat) - j), function(i) rowSums(mat[, i:(i + j)]))))
#[[1]]
# [,1] [,2] [,3] [,4] [,5]
#a 3 2 0 0 1
#c 0 0 1 1 1
#f 0 1 1 0 0
#h 0 1 1 0 0
#i 1 2 1 1 1
#j 0 0 1 1 0
#l 1 0 0 0 0
#m 0 0 0 1 1
#p 0 0 0 0 1
#q 0 0 0 1 1
#s 0 0 0 1 2
#t 0 0 0 0 2
#u 0 0 0 0 1
#v 0 0 0 1 1
#x 3 1 0 0 0
#z 0 0 0 1 1
#[[2]]
# [,1] [,2] [,3] [,4]
#a 3 2 0 1
#c 0 1 1 2
#f 1 1 1 0
#h 1 1 1 0
#i 2 2 2 1
#j 0 1 1 1
#l 1 0 0 0
#m 0 0 1 1
#p 0 0 0 1
#q 0 0 1 1
#s 0 0 1 2
#t 0 0 0 2
#u 0 0 0 1
#v 0 0 1 1
#x 3 1 0 0
#z 0 0 1 1
#....
由于这是滚动操作,因此我们也可以使用rollapply
中的zoo
并使用可变的窗口宽度
lapply(2:ncol(mat), function(j)
t(zoo::rollapply(seq_len(ncol(mat)), j, function(x) rowSums(mat[,x]))))