这是数据框的简短示例:
x<- c("WB (16)","CT (14)WB (15)","ET (13)CITG-TILm (16)EE-SS (17)TN-SE (17)")
我的问题是如何使sub(".*?)", "", x)
(或其他函数)起作用,以使结果成为>
x<-c("WB (16)","WB (15)","TN-SE(17)")
代替
x<-c("","WB (15)")
我得到了不同类型的字母(不仅是WB,CT和TN-SE),例如:
"NBIO(15)" "CITG-TP(08)" "BK-AR(10)"
因此它应该是一个常规功能... 谢谢!
答案 0 :(得分:2)
请您尝试以下。
sub(".*[0-9]+[^)]\\)?([^)$])", "\\1", x)
输出如下。
[1] "WB (16)" "WB (15)" "TN-SE (17)"
输入将如下所示。
> x
[1] "WB (16)" "CT (14)WB (15)"
[3] "ET (13)CITG-TILm (16)EE-SS (17)TN-SE (17)"
说明: 以下内容仅出于解释目的。
sub(" ##Using sub function of Base R here.
##sub works on method of sub(regex_to_match_current_line's_stuff, new_string/variable/value out of matched,regex, variable)
.*[0-9]+[^)]\\) ##Using look ahead method of regex by mentioning .*(everything till) a ) is NOT found then mentioning ) there to cover it too so it will match till a ) which is NOt on end of line.
? ##? this makes sure above regex is matched first and it will move for next regex condition as per look ahead functoianlity.
([^)$])", ##() means in R to put a value into R's memory to remember it kind of place holder in memory, I am mentioning here to keep everything till a ) found at last.
"\\1", ##Substitute whole line with \\1 means first place holder's value.
x) ##Mentioning variable/vector's name here.
答案 1 :(得分:1)
我认为我了解您的需求。这当然适用于您的示例。
sub(".*?([^()]+\\(\\d+\\))$", "\\1", x)
[1] "WB (16)" "WB (15)" "TN-SE (17)"
详细信息::这会在字符串的末尾查找SomeStuff (Numbers)
形式的内容,并丢弃之前的所有内容。 SomeStuff不允许包含括号。