Question

我有物种（列）数据帧的样本（行）。以及另一个数据帧中的列，用于将样本编码为组。我想选择所有组中的所有样本都具有非零值的所有列。

物种框架：

structure(list(Otu000132 = c(0L, 56L, 30L, 52L, 1L, 4L, 31L, 4L, 17L, 9L, 4L), 
               Otu000144 = c(191L, 14L, 58L, 137L, 127L, 222L, 26L, 175L, 133L, 107L, 43L),
               Otu000146 = c(0L, 0L, 0L, 0L, 16L, 62L, 41L, 16L, 60L, 32L, 0L), 
               Otu000147 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
               Otu000151 = c(2L, 9L, 4L, 1L, 0L, 4L, 4L, 2L, 3L, 0L, 0L),
               Otu000162 = c(2L, 1L, 0L, 0L, 1L, 1L, 0L, 2L, 1L, 0L, 0L), 
               Otu000164 = c(2L, 0L, 1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
               Otu000174 = c(0L, 0L, 3L, 1L, 0L, 2L, 0L, 1L, 2L, 1L, 0L), 
               Otu000176 = c(1L, 9L, 0L, 1L, 2L, 5L, 3L, 3L, 8L, 2L, 2L), 
               Otu000186 = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L),
               Otu000190 = c(1L, 1L, 1L, 0L, 0L, 5L, 1L, 2L, 7L, 0L, 0L)),
          .Names = c("Otu000132", "Otu000144", "Otu000146", "Otu000147", 
                     "Otu000151", "Otu000162", "Otu000164", "Otu000174", 
                     "Otu000176", "Otu000186", "Otu000190"),
          row.names = 30:40, class = "data.frame")

分组框架：

structure(c(30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
            40, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3), 
          .Dim = c(11L, 2L))

期望的输出：

structure(list(Otu000132 = c(0L, 56L, 30L, 52L, 1L, 4L, 31L, 4L, 17L, 9L, 4L), 
               Otu000144 = c(191L, 14L, 58L, 137L, 127L, 222L, 26L, 175L, 133L, 107L, 43L), 
               Otu000151 = c(2L, 9L, 4L, 1L, 0L, 4L, 4L, 2L, 3L, 0L, 0L), 
               Otu000176 = c(1L, 9L, 0L, 1L, 2L, 5L, 3L, 3L, 8L, 2L, 2L),
               Otu000190 = c(1L, 1L, 1L, 0L, 0L, 5L, 1L, 2L, 7L, 0L, 0L)), 
          .Names = c("Otu000132", "Otu000144",  "Otu000151", 
                     "Otu000176", "Otu000190"),
          row.names = 30:40, class = "data.frame")

我觉得这应该是我可以用dplyr select做的事情，但我无法理解。任何人都有建议让我在路上开始吗？

Answer 1

这确实可以通过 dplyr 完成，并且以相当简单的方式完成。正如其他人所指出的那样，＆＃34; Otu000146＆＃34;不符合您描述的标准，不会包含在最终的列选择中。

GatewayPorts yes

Answer 2

我们split第一列分组数据集（'gp1'）由第二列（gp1[,2]）到list，循环遍历list，行子集通过将其行名称与list元素进行匹配来获取物种数据集，获取逻辑矩阵（x1==0）的列总和，检查是否大于0，比较每个{{1在list中使用&的元素，否定（Reduce）索引将TRUE更改为FALSE（反之亦然）以将物种数据集的列子集化。

Answer 3

您可以使用dplyr或仅使用基本功能：

species = merge(species, group, by.x=c("row.names"), by.y=c("V1"))

#Find the lowest values in each grouping
check = aggregate(species[,c("Otu000132", "Otu000144", "Otu000146", 
                   "Otu000147", "Otu000151", "Otu000162", "Otu000164", 
                   "Otu000174", "Otu000176", "Otu000186", "Otu000190")], 
                    by=list(species$V2), min)

#sum across the groupings
vars = apply(check, 2, function(x) sum(x))

#retain variables where sum > 1, indicating at least one grouping has full observations
vars = vars[vars!=0]

#extract the variable names
vars = names(vars)[-1]

#subset dataset to select variables identified above
out = species[vars]

out
#   Otu000132 Otu000144 Otu000151 Otu000176 Otu000190
#1          0       191         2         1         1
#2         56        14         9         9         1
#3         30        58         4         0         1
#4         52       137         1         1         0
#5          1       127         0         2         0
#6          4       222         4         5         5
#7         31        26         4         3         1
#8          4       175         2         3         2
#9         17       133         3         8         7
#10         9       107         0         2         0
#11         4        43         0         2         0

从数据框中选择样本组非零的列

3 个答案: