我想知道是否有人使用正则表达式来解析R中的文本。在下面的示例中,我想解析字符串并获取帐号,车辆名称和maint类型。
string[0]: 3423423
string[1]: Nissan
string[2]: Sparkplugs
string = "This is for Account: 3423423 his vehicle Nissan is going in for Maint: Sparkplugs"
答案 0 :(得分:2)
有点笨重但是有效:
string = "This is for Account: 3423423 his vehicle Nissan is going in for Maint: Sparkplugs"
cuts <- c("Account: ", "vehicle ", "Maint: ")
sapply(cuts, function(x){sapply(strsplit(unlist(strsplit(string, x))[2]," "),"[",1)})
Account: vehicle Maint:
"3423423" "Nissan" "Sparkplugs"
答案 1 :(得分:2)
这将为您提供所有匹配,而不仅仅是一个匹配,以及它将允许任何模式。
您定义起点item
:
string = "This is for Account: 3423423 his vehicle Nissan is going in
for Maint: Sparkplugs"
getter <- function(item, string) {
g <- gregexpr(paste0(item, "[^ ]+"), string)
start <- g[[1]] + nchar(item)
end <- g[[1]] + attr(g[[1]], "match.length") - 1
res <- mapply(substr, string, start, end)
names(res) <- NULL
res
}
account <-getter("Account: ", string)
vehicle <-getter("vehicle ", string)
maint <-getter("Maint: ", string)
或者让它更自动化:
items <- c("Account: ", "vehicle ", "Maint: ")
sapply(items, function(x) getter(x, string))