Col
WBU-ARGU*06:03:04
WBU-ARDU*08:01:01
WBU-ARFU*11:03:05
WBU-ARFU*03:456
我有一列有75行变量,例如上面的col。我不太确定如何使用gsub或sub来直到第一个冒号之后的整数。
预期输出:
Col
WBU-ARGU*06:03
WBU-ARDU*08:01
WBU-ARFU*11:03
WBU-ARFU*03:456
我尝试了这个,但似乎不起作用:
gsub("*..:","", df$col)
答案 0 :(得分:3)
以下内容也可能对您有帮助。
sub("([^:]*):([^:]*).*","\\1:\\2",df$dat)
输出如下。
> sub("([^:]*):([^:]*).*","\\1:\\2",df$dat)
[1] "WBU-ARGU*06:03" "WBU-ARDU*08:01" "WBU-ARFU*11:03" "WBU-ARFU*03:456b"
数据帧的输入如下。
dat <- c("WBU-ARGU*06:03:04","WBU-ARDU*08:01:01","WBU-ARFU*11:03:05","WBU-ARFU*03:456b")
df <- data.frame(dat)
说明: 以下内容仅用于说明目的。
sub(" ##using sub for global subtitution function of R here.
([^:]*) ##By mentioning () we are keeping the matched values from vector's element into 1st place of memory(which we could use later), which is till next colon comes it will match everything.
: ##Mentioning letter colon(:) here.
([^:]*) ##By mentioning () making 2nd place in memory for matched values in vector's values which is till next colon comes it will match everything.
.*" ##Mentioning .* to match everything else now after 2nd colon comes in value.
,"\\1:\\2" ##Now mentioning the values of memory holds with whom we want to substitute the element values \\1 means 1st memory place \\2 is second memory place's value.
,df$dat) ##Mentioning df$dat dataframe's dat value.
答案 1 :(得分:2)
您可以使用
df$col <- sub("(\\d:\\d+):\\d+$", "\\1", df$col)
请参见regex demo
详细信息
(\\d:\\d+)
-捕获组1(可通过替换模式中的\1
访问其值):一个数字,冒号和1个以上的数字。:
-冒号\\d+
-1个以上数字$
-字符串的结尾。col <- c("WBU-ARGU*06:03:04","WBU-ARDU*08:01:01","WBU-ARFU*11:03:05","WBU-ARFU*03:456")
sub("(\\d:\\d+):\\d+$", "\\1", col)
## => [1] "WBU-ARGU*06:03" "WBU-ARDU*08:01" "WBU-ARFU*11:03" "WBU-ARFU*03:456"
替代方法:
df$col <- sub("^(.*?:\\d+).*", "\\1", df$col)
请参见regex demo
在这里
^
-字符串的开头(.*?:\\d+)
-第1组:任意0个以上的字符,越少越好(由于懒惰的*?
量词),然后是:
和1个以上的数字.*
-字符串的其余部分。但是,它应与PCRE regex引擎一起使用,并通过perl=TRUE
:
col <- c("WBU-ARGU*06:03:04","WBU-ARDU*08:01:01","WBU-ARFU*11:03:05","WBU-ARFU*03:456")
sub("^(.*?:\\d+).*", "\\1", col, perl=TRUE)
## => [1] "WBU-ARGU*06:03" "WBU-ARDU*08:01" "WBU-ARFU*11:03" "WBU-ARFU*03:456"
请参见R online demo。
答案 2 :(得分:1)
sub("(\\d+:\\d+):\\d+$", "\\1", df$Col)
[1] "WBU-ARGU*06:03" "WBU-ARDU*08:01" "WBU-ARFU*11:03" "WBU-ARFU*03:456"
或者用stringi
匹配您想要的内容(而不是减去您不需要的内容):
stringi::stri_extract_first(df$Col, regex = "[A-Z-\\*]+\\d+:\\d+")
更加简洁stringr
:
stringr::str_extract(df$Col, "[A-Z-\\*]+\\d+:\\d+")
# or
stringr::str_extract(df$Col, "[\\w-*]+\\d+:\\d+")