我在R中使用正则表达式有一个奇怪的请求。我有一些字符串向量,其中一些有多个尾随句点。我想用空格替换这些时期。这个例子和期望的结果应该清楚我所追求的是什么(也许我需要用我给予的替换参数而不是gsub
的模式参数来攻击它:
示例和尝试:
x <- c("good", "little.bad", "really.ugly......")
gsub("\\.$", " ", x)
#produces this
#[1] "good" "little.bad" "really.ugly..... "
gsub("\\.+$", " ", x)
#produces this
#[1] "good" "little.bad" "really.ugly "
期望的结果
[1] "good" "little.bad" "really.ugly "
所以原始向量(x)的最后一个字符串最后有6个句点,所以我想要6个空格而不触及真实和丑陋之间的时间段。我知道$
看到了结尾,但无法超越这个。
答案 0 :(得分:16)
试试这个:
gsub("\\.(?=\\.*$)", " ", mystring, perl=TRUE)
<强>解释强>
\. # Match a dot
(?= # only if followed by
\.* # zero or more dots
$ # until the end of the string
) # End of lookahead assertion.
答案 1 :(得分:2)
当我等待一个有意义的正则表达式解决方案时,我决定想出一个荒谬的方法来解决这个问题:
messy.sol <- function(x) {
paste(unlist(list(gsub("\\.+$", "", x),
rep(" ", nchar(x) - nchar(gsub("\\.+$", "", x))))),collapse="")
}
sapply(x, messy.sol, USE.NAMES = FALSE)
我会说蒂姆有点漂亮:)
答案 2 :(得分:2)
regmatches
的自由使用帮助我们在这里
x <- c("good", "little.bad", "really.ugly......")
# Get an object with 'match data' to feed into regmatches
# Here we match on any number of periods at the end of a string
out <- regexpr("\\.*$", x)
# On the right hand side we extract the pieces of the strings
# that match our pattern with regmatches and then replace
# all the periods with spaces. Then we use assignment
# to store that into the spots in our strings that match the
# regular expression.
regmatches(x, out) <- gsub("\\.", " ", regmatches(x, out))
x
#[1] "good" "little.bad" "really.ugly "
所以不像单个正则表达式那么干净。但我从来没有真正开始学习perl正则表达式中的那些“前瞻”。