从字符串向量中获取子字符串

时间:2015-04-30 16:45:56

标签: r substring

我有一个字符串向量

ids <- c("NM_006690.2_PROBE1","333212.1_PROBE1","7602049CB1_PROBE1","NM_018065.1_PROBE1","1539036CB1_PROBE1","NM_021019.1_PROBE1","1440608CB1_PROBE1","NM_031270.1_PROBE1","613678CB1_PROBE1")

很多讨论已经在这里:extract a substring in R according to a pattern

我想删除dot(.)之后的所有内容,并希望在_之前删除PROBE之后的所有内容。我设法通过

删除了.
read.table(text = ids, sep = ".", as.is = TRUE, fill=TRUE)$V1

我现在介意在_之类的情况下移除PROBE之前的613678CB1_PROBE1,我想要的输出是613678CB1。怎么做。

输出:

"NM_006690", "333212"  , "7602049CB1"  "NM_018065","1539036CB1"  "NM_021019" "1440608CB1"  "NM_031270","613678CB1")

注意:有两个_NM相关联,另一个与PROBE相关联。我希望每件事都被移除_PROBE

2 个答案:

答案 0 :(得分:6)

好像你要求:

gsub("\\..*|_PROBE.*", "", ids)

演示:

gsub("\\..*|_PROBE.*", "", ids)
# [1] "NM_006690"  "333212"     "7602049CB1" "NM_018065"  "1539036CB1"
# [6] "NM_021019"  "1440608CB1" "NM_031270"  "613678CB1" 

答案 1 :(得分:2)

你真的想要这个吗?

ids <- c("NM_006690.2_PROBE1", "333212.1_PROBE1"  , "7602049CB1_PROBE1" , "NM_018065.1_PROBE1",
         "1539036CB1_PROBE1",  "NM_021019.1_PROBE1", "1440608CB1_PROBE1",  "NM_031270.1_PROBE1",
         "613678CB1_PROBE1")
ids <- read.table(text = ids, sep = ".", as.is = TRUE, fill=TRUE)$V1

library(stringr)
ids <- str_replace(ids, "_PROBE1", "")

给你这个:

"NM_006690"  "333212"     "7602049CB1" "NM_018065"  "1539036CB1" "NM_021019"  "1440608CB1" "NM_031270"  "613678CB1"