Question

我从一个看起来像这样的df开始：

  sample_id        target_id length     eff_length est_counts    tpm   class
1 SRR3884838C      (A)n     69          70          0        0.00000     0
2 SRR3884838C     (AC)n     69          70          0        0.00000     0
3 SRR3884838C     (AG)n     69          70          0        0.00000     0
4 SRR3884838C     (AT)n     69          70          5        15.98870    0

我想使用dplyr的select函数来仅选择以字母C结尾的sample_id，以及target_id和tpm。

示例数据：

> dput(droplevels(head(te,4)))
structure(list(sample_id = structure(c(1L, 1L, 1L, 1L), .Label = "SRR3884838C", class = "factor"), 
target_id = structure(1:4, .Label = c("(A)n", "(AC)n", "(AG)n", 
"(AT)n"), class = "factor"), length = c(69L, 69L, 69L, 69L
), eff_length = c(70L, 70L, 70L, 70L), est_counts = c(0, 
0, 0, 5), tpm = c(0, 0, 0, 15.9887), class = c(0L, 0L, 0L, 
0L)), .Names = c("sample_id", "target_id", "length", "eff_length", 
"est_counts", "tpm", "class"), row.names = c(NA, 4L), class = "data.frame")

我尝试使用以下内容：

  teC <- select(te, (sample_id, ends_with("C")), target_id, tpm)

这给了我sample_id，target_id和tpm，但不会只选择以C结尾的sample_id，例如：

      sample_id        target_id  tpm
9759  SRR3884843CxS   Tigger15a   0.00000e+00
9760  SRR3884843CxS   Tigger16a   0.00000e+00
9761  SRR3884843CxS   Tigger16b   0.00000e+00

我选择做错了吗？我能够毫无问题地从教程网站处理示例数据。

Answer 1

select用于按名称保存变量（读取：列），这是您使用sample_id，target_id和tmp执行的操作。如果您想根据sample_id中的值进一步过滤，请添加filter：

teC <- te %>% select(sample_id, target_id, tpm) %>% filter(grepl("C$", sample_id))

正则表达式"C$"将匹配以“C”结尾的字符串; "CxS$"将匹配以“CxS”结尾的字符串;和"(C|CxS)$"将匹配。

Dplyr选择ends_with

1 个答案: