使用R中的二进制表选择列

时间:2019-03-09 15:05:40

标签: r

我有一个数据框x,其顺序如下:

             date       c1   c2   c3  c4    c5    c6   c7   c8   c9
             Jan-08     12   23   12  11    10    1    49   34   23    
             Feb-08     14   33   11  11    20    11   29   44   23    

以此类推...

我还有另一个具有的二进制矩阵

                     1    3    6
              1      0    0    1
              2      0    0    0 
              3      0    1    0
              4      1    0    0
              5      0    1    0 
              6      1    0    0  
              7      0    0    0
              8      1    1    0
              9      0    1    1

我想看一下我的二进制矩阵,并为我的二进制矩阵中的每一列创建一个新表,以便新表仅容纳数据帧x中二进制表中为1的列。 因此,我们将在此处创建3个数据帧,即data_frame_1,data_frame_3和data_frame_6,其中data_frame_1的格式为

                     date    c4    c6     c8        
                     Jan-08  11    1      34 
                     Feb-08  11    11     44

data_frame_3将是

                     date    c3    c5     c8   c9        
                     Jan-08  12    10     34   23 
                     Feb-08  11    20     44   23

2 个答案:

答案 0 :(得分:1)

使用lapply,我们可以遍历二进制矩阵mat的列,并将二进制矩阵转换为逻辑向量,该逻辑向量用于对x数据帧的列进行子集化。

lapply(1:ncol(mat), function(i) cbind(x[1], x[-1][as.logical(mat[, i])]))

#[[1]]
#    date c4 c6 c8
#1 Jan-08 11  1 34
#2 Feb-08 11 11 44

#[[2]]
#    date c3 c5 c8 c9
#1 Jan-08 12 10 34 23
#2 Feb-08 11 20 44 23

#[[3]]
#    date c1 c9
#1 Jan-08 12 23
#2 Feb-08 14 23

答案 1 :(得分:0)

您可以使用apply遍历二进制矩阵bin的列,子集数据帧dat

# create test data
set.seed(1)
dat <- as.data.frame(matrix(rnorm(18), nrow=2))
colnames(dat) <- paste0('c', 1:9)

dat
#           c1         c2         c3        c4         c5        c6         c7          c8
# 1 -0.6264538 -0.8356286  0.3295078 0.4874291  0.5757814 1.5117812 -0.6212406  1.12493092
# 2  0.1836433  1.5952808 -0.8204684 0.7383247 -0.3053884 0.3898432 -2.2146999 -0.04493361
#            c9
# 1 -0.01619026
# 2  0.94383621

bin <- matrix(sample(0:1, 27, replace = TRUE), nrow = 9)

bin
#       [,1] [,2] [,3]
#  [1,]    1    1    0
#  [2,]    0    0    0
#  [3,]    1    0    0
#  [4,]    0    1    1
#  [5,]    1    1    1
#  [6,]    1    0    0
#  [7,]    1    1    1
#  [8,]    1    0    0
#  [9,]    1    0    0

# subset columns of dat, using binary vector columns defined in bin;
# drop = FALSE is included to prevent any columns with only a single "1" from
# being cast to a vector
apply(bin, 2, function(x) { dat[, as.logical(x), drop = FALSE] })
# [[1]]
#           c1         c3         c5        c6         c7          c8          c9
# 1 -0.6264538  0.3295078  0.5757814 1.5117812 -0.6212406  1.12493092 -0.01619026
# 2  0.1836433 -0.8204684 -0.3053884 0.3898432 -2.2146999 -0.04493361  0.94383621
# 
# [[2]]
#           c1        c4         c5         c7
# 1 -0.6264538 0.4874291  0.5757814 -0.6212406
# 2  0.1836433 0.7383247 -0.3053884 -2.2146999
# 
# [[3]]
#          c4         c5         c7
# 1 0.4874291  0.5757814 -0.6212406
# 2 0.7383247 -0.3053884 -2.2146999
#