用空格替换尾随句点

时间:2012-08-31 21:29:08

标签: regex r

我在R中使用正则表达式有一个奇怪的请求。我有一些字符串向量,其中一些有多个尾随句点。我想用空格替换这些时期。这个例子和期望的结果应该清楚我所追求的是什么(也许我需要用我给予的替换参数而不是gsub的模式参数来攻击它:

示例和尝试:

x <- c("good", "little.bad", "really.ugly......")
gsub("\\.$", " ", x)
  #produces this
  #[1] "good"              "little.bad"        "really.ugly..... "
gsub("\\.+$", " ", x)
  #produces this
  #[1] "good"         "little.bad"   "really.ugly "

期望的结果

[1] "good"              "little.bad"        "really.ugly      "

所以原始向量(x)的最后一个字符串最后有6个句点,所以我想要6个空格而不触及真实和丑陋之间的时间段。我知道$看到了结尾,但无法超越这个。

3 个答案:

答案 0 :(得分:16)

试试这个:

gsub("\\.(?=\\.*$)", " ", mystring, perl=TRUE)

<强>解释

\.   # Match a dot
(?=  # only if followed by
 \.* # zero or more dots
 $   # until the end of the string
)    # End of lookahead assertion.

答案 1 :(得分:2)

当我等待一个有意义的正则表达式解决方案时,我决定想出一个荒谬的方法来解决这个问题:

messy.sol <- function(x) {
paste(unlist(list(gsub("\\.+$", "", x), 
    rep(" ", nchar(x) -  nchar(gsub("\\.+$", "", x))))),collapse="")
}

sapply(x, messy.sol, USE.NAMES = FALSE)

我会说蒂姆有点漂亮:)

答案 2 :(得分:2)

蒂姆的解决方案显然更好,但我想我会以另一种方式尝试。使用regmatches的自由使用帮助我们在这里

x <- c("good", "little.bad", "really.ugly......")
# Get an object with 'match data' to feed into regmatches
# Here we match on any number of periods at the end of a string
out <- regexpr("\\.*$", x)

# On the right hand side we extract the pieces of the strings
# that match our pattern with regmatches and then replace
# all the periods with spaces.  Then we use assignment
# to store that into the spots in our strings that match the
# regular expression.
regmatches(x, out) <- gsub("\\.", " ", regmatches(x, out))
x
#[1] "good"              "little.bad"        "really.ugly      "

所以不像单个正则表达式那么干净。但我从来没有真正开始学习perl正则表达式中的那些“前瞻”。