我正在使用NCBI参考序列登录号,如变量a
:
a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")
要从biomart包中获取信息,我需要在入藏号后删除.1
,.2
等。我通常使用以下代码执行此操作:
b <- sub("..*", "", a)
# [1] "" "" "" "" "" ""
但正如您所看到的,这不是此变量的正确方法。任何人都可以帮我这个吗?
答案 0 :(得分:72)
你只需要逃离这段时间:
a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")
gsub("\\..*","",a)
[1] "NM_020506" "NM_020519" "NM_001030297" "NM_010281" "NM_011419" "NM_053155"
答案 1 :(得分:8)
我们可以假装它们是文件名并删除扩展程序:
tools::file_path_sans_ext(a)
# [1] "NM_020506" "NM_020519" "NM_001030297" "NM_010281" "NM_011419" "NM_053155"
答案 2 :(得分:5)
你可以这样做:
sub("*\\.[0-9]", "", a)
或
library(stringr)
str_sub(a, start=1, end=-3)
答案 3 :(得分:1)
如果字符串应为固定长度,则可以使用substr
中的base R
。但是,我们可以使用.
获取regexpr
的位置,并在substr
substr(a, 1, regexpr("\\.", a)-1)
#[1] "NM_020506" "NM_020519" "NM_001030297" "NM_010281" "NM_011419" "NM_053155"