我从一个看起来像这样的df开始:
sample_id target_id length eff_length est_counts tpm class
1 SRR3884838C (A)n 69 70 0 0.00000 0
2 SRR3884838C (AC)n 69 70 0 0.00000 0
3 SRR3884838C (AG)n 69 70 0 0.00000 0
4 SRR3884838C (AT)n 69 70 5 15.98870 0
我想使用dplyr的select函数来仅选择以字母C结尾的sample_id,以及target_id和tpm。
示例数据:
> dput(droplevels(head(te,4)))
structure(list(sample_id = structure(c(1L, 1L, 1L, 1L), .Label = "SRR3884838C", class = "factor"),
target_id = structure(1:4, .Label = c("(A)n", "(AC)n", "(AG)n",
"(AT)n"), class = "factor"), length = c(69L, 69L, 69L, 69L
), eff_length = c(70L, 70L, 70L, 70L), est_counts = c(0,
0, 0, 5), tpm = c(0, 0, 0, 15.9887), class = c(0L, 0L, 0L,
0L)), .Names = c("sample_id", "target_id", "length", "eff_length",
"est_counts", "tpm", "class"), row.names = c(NA, 4L), class = "data.frame")
我尝试使用以下内容:
teC <- select(te, (sample_id, ends_with("C")), target_id, tpm)
这给了我sample_id,target_id和tpm,但不会只选择以C结尾的sample_id,例如:
sample_id target_id tpm
9759 SRR3884843CxS Tigger15a 0.00000e+00
9760 SRR3884843CxS Tigger16a 0.00000e+00
9761 SRR3884843CxS Tigger16b 0.00000e+00
我选择做错了吗?我能够毫无问题地从教程网站处理示例数据。
答案 0 :(得分:2)
select
用于按名称保存变量(读取:列),这是您使用sample_id
,target_id
和tmp
执行的操作。如果您想根据sample_id
中的值进一步过滤,请添加filter
:
teC <- te %>% select(sample_id, target_id, tpm) %>% filter(grepl("C$", sample_id))
正则表达式"C$"
将匹配以“C”结尾的字符串; "CxS$"
将匹配以“CxS”结尾的字符串;和"(C|CxS)$"
将匹配。