使用R

时间:2017-02-27 21:49:19

标签: r regex

我在R中有一个字符数组。有些字符串附加了一个'(数字)'模式。我正在尝试使用正则表达式删除此“(数字)”字符串,但无法弄明白。我可以访问字符串具有空格而不是字符的所有行的行,但必须有一种方法来查找这些数字字符串。

  dat <- c("Alabama-Birmingham", "Arizona State", "Canisius", "UCF", "George Washington", 
             "Green Bay", "Iona", "Louisville (7)", "UMass", "Memphis", "Michigan State", 
             "Milwaukee", "Nebraska", "Niagara", "Northern Kentucky", "Notre Dame (21)", 
             "Quinnipiac", "Siena", "Tulsa", "Washington State", "Wright State", 
             "Xavier")

    rows <- grep(" (.*)", dat)
    fixed <- gsub(" (.*)","",games[rows,])
    dat = fixed

2 个答案:

答案 0 :(得分:2)

首先,你需要逃避括号,最好更具体地了解它们内部的内容

gsub("\\s+\\(\\d+\\)", "", dat)
 [1] "Alabama-Birmingham" "Arizona State"      "Canisius"          
 [4] "UCF"                "George Washington"  "Green Bay"         
 [7] "Iona"               "Louisville"         "UMass"             
[10] "Memphis"            "Michigan State"     "Milwaukee"         
[13] "Nebraska"           "Niagara"            "Northern Kentucky" 
[16] "Notre Dame"         "Quinnipiac"         "Siena"             
[19] "Tulsa"              "Washington State"   "Wright State"      
[22] "Xavier" 

答案 1 :(得分:0)

我们可以使用sub

执行此操作
sub("\\s*\\(.*", "", dat)
#[1] "Alabama-Birmingham" "Arizona State"      "Canisius"          
#[4] "UCF"                "George Washington"  "Green Bay"         
#[7] "Iona"               "Louisville"         "UMass"             
#[10] "Memphis"            "Michigan State"     "Milwaukee"         
#[13] "Nebraska"           "Niagara"            "Northern Kentucky" 
#[16] "Notre Dame"         "Quinnipiac"         "Siena"             
#[19] "Tulsa"              "Washington State"   "Wright State"      
#[22] "Xavier"