根据其开头替换所有字符串

时间:2017-01-27 15:41:10

标签: r regex string

我正在尝试根据字符串的第一个字符更改字符串的元素。我需要这样做,而不是使用整个字符串,因为我经常抓取这些数据,字符串的后半部分经常更改。

这是我的字符串示例:

teams <- structure(c(3L, 14L, 4L, 5L, 15L, 10L, 7L, 2L, 12L, 13L, 9L, 
        8L, 1L, 11L, 6L, 21L, 29L, 17L, 16L, 30L, 23L, 19L, 20L, 25L, 
        22L, 26L, 28L, 27L, 24L, 18L), .Label = c("Dallas Mavericks (13)Â", 
                                                  "Denver Nuggets (8)Â", "Golden State Warriors (1)Â", "Houston Rockets (3)Â", 
                                                  "Los Angeles Clippers (4)Â", "Los Angeles Lakers (15)Â", "Memphis Grizzlies (7)Â", 
                                                  "Minnesota Timberwolves (12)Â", "New Orleans Pelicans (11)Â", 
                                                  "Oklahoma City Thunder (6)Â", "Phoenix Suns (14)Â", "Portland Trail Blazers (9)Â", 
                                                  "Sacramento Kings (10)Â", "San Antonio Spurs (2)Â", "Utah Jazz (5)Â", 
                                                  "Atlanta Hawks (4)Â", "Boston Celtics (3)Â", "Brooklyn Nets (15)Â", 
                                                  "Charlotte Hornets (7)Â", "Chicago Bulls (8)Â", "Cleveland Cavaliers (1)Â", 
                                                  "Detroit Pistons (10)Â", "Indiana Pacers (6)Â", "Miami Heat (14)Â", 
                                                  "Milwaukee Bucks (9)Â", "New York Knicks (11)Â", "Orlando Magic (13)Â", 
                                                  "Philadelphia 76ers (12)Â", "Toronto Raptors (2)Â", "Washington Wizards (5)Â"
        ), class = "factor")

我想为"Golden State Warriors (1)Â"更改GSW,因为我尝试过:

teams <- gsub("Golden", "GSW", teams)

将该字符串转换为"GSW State Warriors (1)Â"仅捕获字符串元素的第一部分而不是整个sting,我还尝试使用sub,以及我在调用{{1}时找到的每个函数(例如?gsubgrep),但显然我并不理解正则表达式。

任何帮助都将不胜感激。

3 个答案:

答案 0 :(得分:3)

我们可以使用sub并将前3个字符捕获为一个组(.^(.{3})),后跟其他字符(.*),并将其替换为该捕获组的反向引用

sub("^(.{3}).*", "\\1", teams)

更新

根据新信息,我们使用正则表达式外观来匹配一个或多个非大写字母([^A-Z]+),这些大写字母位于单词边界((?<=\\b[A-Z]))后面的大写字母后面,并用空格替换它(""

gsub("(?<=\\b[A-Z])[^A-Z]+", "", teams, perl = TRUE)
#[1] "GSW" "SAS" "HR"  "LAC" "UJ"  "OCT" "MG"  "DN"  "PTB" "SK"  "NOP" 
#[12] "MT"  "DM"  "PS"  "LAL" "CC"  "TR"  "BC"  "AH"  "WW"  "IP"  "CH" 
#[23] "CB"  "MB"  "DP"  "NYK" "P"   "OM"  "MH"  "BN" 

答案 1 :(得分:1)

初步回复:

使用正则表达式回答您的问题:

gsub("\\Â .*", "", teams)

## Store in object and print
teams2 <- gsub("\\Â .*", "", teams)
head(teams2)
## [1] "Golden State Warriors" "San Antonio Spurs"     "Houston Rockets"       "Los Angeles Clippers" 
## [5] "Utah Jazz"             "Oklahoma City Thunder"

你走在正确的轨道上,但我的策略是1)找到共同元素(Â,后跟空格的字符),然后2) drop 那个共同的元素。

请注意,如有必要,您可以多次运行gsub次。例如:

teams <- gsub("\\Â .*", "", teams)
teams <- gsub("PATTERN2", "", teams)
teams <- gsub("PATTERN3", "", teams)

等等。

更新

要仅返回字符串的缩写形式,我采用了我在初始帖子中建议的“多gsub”方法,如下所示:

teams <- gsub("\\Â .*", "", teams)
teams <- abbreviate(teams, named = F) #useful function to consider
teams <- gsub("[a-z]", "", teams)
## continue as needed
head(teams)
## [1] "GSW" "SAS" "HR"  "LAC" "UJ"  "OCT"

答案 2 :(得分:1)

以下是使用stringi

的想法
library(stringi) 
sapply(strsplit(stri_replace_last_regex(teams, '\\s+', ''), ' '), function(i)
                                            paste(substring(i, 1, 1), collapse = ''))
 #[1] "GSW" "SAS" "HR"  "LAC" "UJ"  "OCT" "MG"  "DN"  "PTB" "SK"  "NOP" "MT"  "DM"  "PS"  "LAL" "CC"  "TR"  "BC"  "AH"  "WW"  "IP"  "CH"  "CB" 
#[24] "MB"  "DP"  "NYK" "P7"  "OM"  "MH"  "BN" 

或获得所需的输出,

mapply(stri_replace_first_regex, teams, '\\w+', ind)