对R来说很新,所以这是一场斗争:我有一个从csv导入的数据框。第一列包含行名称(基因),第二列包含组分配(如果基因在第1组或第4组等)。接下来的100列包含基因途径测量(范围-20至+20)。我只想选择组1中的行,然后仅显示包含至少1个大于10的值的组1行的列。
示例数据:
NAME Group path1 path2 path3 path4 path5
gene1 8 -19.1 -26.6 3.0 0.8 -5.1
gene2 1 -2.8 22.8 -1.2 20.8 -9.6
gene3 4 -5.4 -4.0 2.7 5.8 -6.8
gene4 1 -9.9 -24.6 7.3 -2.1 -18.9
gene5 2 -4.7 -9.4 -3.1 0.6 -10.1
gene6 1 14.0 -5.8 -1.6 -2.5 -18.7
gene7 5 -6.4 -3.8 2.0 -2.1 -8.6
gene8 1 -9.9 -4.8 5.2 2.0 -17.5
我尝试过这种方法,但无法将其与我的数据相匹配 Subset columns in R with specific values
任何帮助将不胜感激!
答案 0 :(得分:1)
使用tidyr
和dplyr
重新塑造数据,以简化操作。它会将您的colname放在一列中。然后过滤组和值。
library(tidyr)
library(dplyr)
DT %>%
gather("Path", "value", -NAME, -Group) %>%
filter(Group == 1, value > 10)
#> NAME Group Path value
#> 1 gene6 1 path1 14.0
#> 2 gene2 1 path2 22.8
#> 3 gene2 1 path4 20.8
如果您想要所有行Group == 1
以及所选列中的所有值,只需保留列的名称和子集
library(tidyr)
library(dplyr)
colname <- DT %>%
gather("Path", "value", -NAME, -Group) %>%
filter(Group == 1, value > 10) %>%
select(Path)
DT[DT$Group == 1, c("NAME", "Group", colname$Path)]
#> NAME Group path1 path2 path4
#> 2 gene2 1 -2.8 22.8 20.8
#> 4 gene4 1 -9.9 -24.6 -2.1
#> 6 gene6 1 14.0 -5.8 -2.5
#> 8 gene8 1 -9.9 -4.8 2.0
答案 1 :(得分:0)
保持在基地R内,并利用您链接到我们可以做的问题
## Data
df <- data.frame(NAME = c("gene1","gene2","gene3","gene4"),
Group = c(8,1,4,1),
path1 = c(-19.1, -2.8, -5.4, -9.9),
path2 = c(-26.6, 22.8, -4, -24.6))
drops <- c("NAME", "Group")
keeps <- names(df)[!names(df) %in% drops]
## Subset the data by the groups of interest first
df_1 <- df[df$Group == 1,]
## This next step is similar to your linked question,
## it just uses `any` in place of `all`, and only on a subset of the columns
cbind(df_1[, drops], do.call(cbind, lapply(df_1[, keeps], function(x){ if(any(x >= 5)) return(x) }) ) )
## Or alternatively,
df_1[, c(drops, do.call(c, sapply(keeps, function(x) if(any(df[, x] >= 5)) return(x)))) ]
给出了
NAME Group path2
2 gene2 1 22.8
4 gene4 1 -24.6