假设我有一个字符向量ids
,如下所示:
ids <- c("367025001", "CT_341796001", "M13X01692-01", "13C025050901", "13C00699551")
我想搜索每个元素并删除所有字母,所有特殊字符和结束元素时的“01”。所以ids
会变成:
ids_replaced <- c("3670250", "3417960", "1301692", "130250509", "1300699551")
我有点接近,但它没有按照我的意图行事。
gsub("(.*?)(\\d+?)(01$)", "\\2", ids, perl = TRUE)
答案 0 :(得分:2)
您可以使用
gsub("01$|\\D", "", ids)
# [1] "3670250" "3417960" "1301692" "130250509" "1300699551"
identical(gsub("01$|\\D", "", ids), ids_replaced)
# [1] TRUE
Regular Expression Explanation:
01
匹配“01”$
之前\n
,以及字符串的结尾|
或\D
匹配非数字(除了0-9之外)答案 1 :(得分:1)
使用rex可能会使这类任务变得更简单。
ids <- c("367025001", "CT_341796001", "M13X01692-01", "13C025050901", "13C00699551")
re_substitutes(ids,
rex(non_digits %or% list("01", end)),
'',
global = TRUE)
#> [1] "3670250" "3417960" "1301692" "130250509" "1300699551"
答案 2 :(得分:0)