仅选择包含值>的列5来自数据框

时间:2016-03-06 20:48:52

标签: r select

对R来说很新,所以这是一场斗争:我有一个从csv导入的数据框。第一列包含行名称(基因),第二列包含组分配(如果基因在第1组或第4组等)。接下来的100列包含基因途径测量(范围-20至+20)。我只想选择组1中的行,然后仅显示包含至少1个大于10的值的组1行的列。

示例数据:

NAME    Group   path1   path2   path3   path4   path5
gene1   8   -19.1   -26.6   3.0 0.8 -5.1
gene2   1   -2.8    22.8    -1.2    20.8    -9.6
gene3   4   -5.4    -4.0    2.7 5.8 -6.8
gene4   1   -9.9    -24.6   7.3 -2.1    -18.9
gene5   2   -4.7    -9.4    -3.1    0.6 -10.1
gene6   1   14.0    -5.8    -1.6    -2.5    -18.7
gene7   5   -6.4    -3.8    2.0 -2.1    -8.6
gene8   1   -9.9    -4.8    5.2 2.0 -17.5

我尝试过这种方法,但无法将其与我的数据相匹配 Subset columns in R with specific values

任何帮助将不胜感激!

2 个答案:

答案 0 :(得分:1)

使用tidyrdplyr重新塑造数据,以简化操作。它会将您的colname放在一列中。然后过滤组和值。

library(tidyr)
library(dplyr)
DT %>% 
  gather("Path", "value", -NAME, -Group) %>%
  filter(Group == 1, value > 10)
#>    NAME Group  Path value
#> 1 gene6     1 path1  14.0
#> 2 gene2     1 path2  22.8
#> 3 gene2     1 path4  20.8

如果您想要所有行Group == 1以及所选列中的所有值,只需保留列的名称和子集

library(tidyr)
library(dplyr)
colname <- DT %>% 
  gather("Path", "value", -NAME, -Group) %>%
  filter(Group == 1, value > 10) %>%
  select(Path)

DT[DT$Group == 1, c("NAME", "Group", colname$Path)]
#>    NAME Group path1 path2 path4
#> 2 gene2     1  -2.8  22.8  20.8
#> 4 gene4     1  -9.9 -24.6  -2.1
#> 6 gene6     1  14.0  -5.8  -2.5
#> 8 gene8     1  -9.9  -4.8   2.0

答案 1 :(得分:0)

保持在基地R内,并利用您链接到我们可以做的问题

## Data
df <- data.frame(NAME = c("gene1","gene2","gene3","gene4"),
                             Group = c(8,1,4,1),
                             path1 = c(-19.1, -2.8, -5.4, -9.9),
                             path2 = c(-26.6, 22.8, -4, -24.6))

drops <- c("NAME", "Group")
keeps <- names(df)[!names(df) %in% drops]

## Subset the data by the groups of interest first
df_1 <- df[df$Group == 1,]

## This next step is similar to your linked question, 
## it just uses `any` in place of `all`, and only on a subset of the columns

cbind(df_1[, drops], do.call(cbind, lapply(df_1[, keeps], function(x){ if(any(x >= 5)) return(x) }) ) )

## Or alternatively, 
df_1[, c(drops, do.call(c, sapply(keeps, function(x) if(any(df[, x] >= 5)) return(x)))) ]

给出了

   NAME Group path2
2 gene2     1  22.8
4 gene4     1 -24.6