如何在R中引用之间提取文本

时间:2015-08-31 21:14:50

标签: regex r

我有这个字符串:

"MYDATA[, \"TYUO\"]"

如何在引号之间提取文字?结果只是TYUO

2 个答案:

答案 0 :(得分:6)

使用stringi与lookahead和lookbehind:

> stringi::stri_extract_all_regex(s, '(?<=").*?(?=")')

答案 1 :(得分:5)

qdapRegex's (I coauthored this package with Jason Gray aka @hwnd) rm_between function is born for this:

x <- c("MYDATA[, \"TYUO\"]", 'a "second" with "multiple" quotes')

library(qdapRegex)
rm_between(x, '"', '"', extract=TRUE)

## [[1]]
## [1] "TYUO"
## 
## [[2]]
## [1] "second"   "multiple"

EDIT

@BenBolker asked for a base R solution. This is not as pretty as I had hoped but would get it done in base R:

lapply(regmatches(x, gregexpr('(\").*?(\")', x, perl = TRUE)), function(y) gsub("^\"|\"$", "", y))

## [[1]]
## [1] "TYUO"
## 
## [[2]]
## [1] "second"   "multiple"

I don't like stripping off the leading trailing quotes with an lapply gsub but to try use the standard lookahead/lookbehind the result is not what we want:

regmatches(x, gregexpr("(?<=\")(.*?)(?=\")", x, perl = TRUE))

## [[1]]
## [1] "TYUO"
## 
## [[2]]
## [1] "second"   " with "   "multiple"