仅在有多个值时如何使用sub on

时间:2018-08-01 00:52:07

标签: r regex dataframe gsub

这是数据框的简短示例:

x<- c("WB (16)","CT (14)WB (15)","ET (13)CITG-TILm (16)EE-SS (17)TN-SE (17)")

我的问题是如何使sub(".*?)", "", x)(或其他函数)起作用,以使结果成为

x<-c("WB (16)","WB (15)","TN-SE(17)")

代替

x<-c("","WB (15)")

我得到了不同类型的字母(不仅是WB,CT和TN-SE),例如:

 "NBIO(15)"    "CITG-TP(08)" "BK-AR(10)" 

因此它应该是一个常规功能... 谢谢!

2 个答案:

答案 0 :(得分:2)

请您尝试以下。

sub(".*[0-9]+[^)]\\)?([^)$])", "\\1", x)

输出如下。

[1] "WB (16)"    "WB (15)"    "TN-SE (17)"

输入将如下所示。

> x
[1] "WB (16)"                                   "CT (14)WB (15)"                           
[3] "ET (13)CITG-TILm (16)EE-SS (17)TN-SE (17)"

说明: 以下内容仅出于解释目的。

sub("                 ##Using sub function of Base R here.
                      ##sub works on method of sub(regex_to_match_current_line's_stuff, new_string/variable/value out of matched,regex, variable)
.*[0-9]+[^)]\\)       ##Using look ahead method of regex by mentioning .*(everything till) a ) is NOT found then mentioning ) there to cover it too so it will match till a ) which is NOt on end of line.
?                     ##? this makes sure above regex is matched first and it will move for next regex condition as per look ahead functoianlity.
([^)$])",             ##() means in R to put a value into R's memory to remember it kind of place holder in memory, I am mentioning here to keep everything till a ) found at last.
"\\1",                ##Substitute whole line with \\1 means first place holder's value.
x)                    ##Mentioning variable/vector's name here.

答案 1 :(得分:1)

认为我了解您的需求。这当然适用于您的示例。

sub(".*?([^()]+\\(\\d+\\))$", "\\1", x)
[1] "WB (16)"    "WB (15)"    "TN-SE (17)"

详细信息::这会在字符串的末尾查找SomeStuff (Numbers)形式的内容,并丢弃之前的所有内容。 SomeStuff不允许包含括号。