在R中使用正则表达式

时间:2013-08-13 15:21:09

标签: r regex

我想知道是否有人使用正则表达式来解析R中的文本。在下面的示例中,我想解析字符串并获取帐号,车辆名称和maint类型。

string[0]: 3423423 

string[1]: Nissan

string[2]: Sparkplugs

 string = "This is for Account: 3423423 his vehicle Nissan is going in for Maint: Sparkplugs" 

2 个答案:

答案 0 :(得分:2)

有点笨重但是有效:

string = "This is for Account: 3423423 his vehicle Nissan is going in for Maint: Sparkplugs"
cuts <- c("Account: ", "vehicle ", "Maint: ")

sapply(cuts, function(x){sapply(strsplit(unlist(strsplit(string, x))[2]," "),"[",1)})

   Account:      vehicle       Maint:  
   "3423423"     "Nissan" "Sparkplugs"

答案 1 :(得分:2)

这将为您提供所有匹配,而不仅仅是一个匹配,以及它将允许任何模式。

您定义起点item

string = "This is for Account: 3423423 his vehicle Nissan is going in 
          for Maint: Sparkplugs" 

getter <- function(item, string) {
  g <- gregexpr(paste0(item, "[^ ]+"), string)
  start <- g[[1]] + nchar(item)
  end <- g[[1]] + attr(g[[1]], "match.length") - 1
  res <- mapply(substr, string, start, end)
  names(res) <- NULL
  res
}

account <-getter("Account: ", string)
vehicle <-getter("vehicle ", string)
maint <-getter("Maint: ", string)

或者让它更自动化:

items <- c("Account: ", "vehicle ", "Maint: ")
sapply(items, function(x) getter(x, string))