一次在字符串中的多个位置插入一个字符

时间:2014-12-22 18:46:19

标签: regex r text gsub

我们说我有一个字符串

"ABCDEFGHI56dfsdfd"

我想要做的是一次在多个位置插入一个空格字符。

例如。我想在随机选择的两个位置插入空格字符,比如说4和8。

所以输出应该是

"ABCD EFGH I56dfsdfd" 

这样做最有效的方法是什么?鉴于字符串中可以包含任何类型的字符(不仅仅是字母表)。

3 个答案:

答案 0 :(得分:6)

这是基于正则表达式的解决方案:

vec <- "ABCDEFGHI56dfsdfd"

# sample two random positions
pos <- sample(nchar(vec), 2)
# [1] 6 4

# generate regex pattern
pat <- paste0("(?=.{", nchar(vec) - pos, "}$)", collapse = "|")
# [1] "(?=.{11}$)|(?=.{13}$)"

# insert spaces at (after) positions
gsub(pat, " ", vec, perl = TRUE)
# [1] "ABCD EF GHI56dfsdfd"

这种方法基于积极的前瞻,例如(?=.{11}$)。在此示例中,在字符串结尾($)之前的11个字符处插入一个空格。

答案 1 :(得分:0)

比斯文的更强暴力:

randomSpaces <- function(txt) {
  pos <- sort(sample(nchar(txt), 2))
  paste(substr(txt, 1, pos[1]), " ", 
        substr(txt, pos[1]+1, pos[2]), " ", 
        substr(txt, pos[2]+1, nchar(txt)), collapse="", sep="")  
}

for (i in 1:10) print(randomSpaces("ABCDEFGHI56dfsdfd"))

## [1] "ABCDEFG HI56 dfsdfd"
## [1] "ABC DEFGHI5 6dfsdfd"
## [1] "AB CDEFGHI56dfsd fd"
## [1] "ABCDEFGHI 5 6dfsdfd"
## [1] "ABCDEF GHI56dfsdf d"
## [1] "ABC DEFGHI56dfsdf d"
## [1] "ABCD EFGHI56dfsd fd"
## [1] "ABCDEFGHI56d fsdfd "
## [1] "AB CDEFGH I56dfsdfd"
## [1] "A BCDE FGHI56dfsdfd"

答案 2 :(得分:0)

根据接受的答案,这是一个简化这种方法的功能:

##insert pattern in string at position
substrins <- function(ins, x, ..., pos=NULL, offset=0){
    stopifnot(is.numeric(pos), 
              is.numeric(offset), 
              !is.null(pos))
    offset <- offset[1]
    pat <- paste0("(?=.{", nchar(x) - pos - (offset-1), "}$)", collapse = "|")
    gsub(pattern = pat, replacement = ins, x = x, ..., perl = TRUE)
}

# insert space at position 10
substrins(" ", "ABCDEFGHI56dfsdfd", pos = 10)
##[1] "ABCDEFGHI 56dfsdfd"

# insert pattern before position 10 (i.e. at position 9)
substrins(" ", "ABCDEFGHI56dfsdfd", pos = 10, offset=-1)
##[1] "ABCDEFGH I56dfsdfd"

# insert pattern after position 10 (i.e. at position 11)
substrins(" ", "ABCDEFGHI56dfsdfd", pos = 10, offset=1)
##[1] "ABCDEFGHI5 6dfsdfd"

现在要做OP想要的事情:

# insert space at position 4 and 8
substrins(" ", "ABCDEFGHI56dfsdfd", pos = c(4,8))
##[1] "ABC DEFG HI56dfsdfd"

# insert space after position 4 and 8 (as per OP's desired output)
substrins(" ", "ABCDEFGHI56dfsdfd", pos = c(4,8), offset=1)
##[1] "ABCD EFGH I56dfsdfd"

要复制另一个更强力的答案:

set.seed(123)
x <- "ABCDEFGHI56dfsdfd"
for (i in 1:10) print(substrins(" ", x, pos = sample(nchar(x), 2)))
##[1] "ABCD EFGHI56d fsdfd"
##[1] "ABCDEF GHI56dfs dfd"
##[1] " ABCDEFGHI56dfsd fd"
##[1] "ABCDEFGH I56dfs dfd"
##[1] "ABCDEFG HI 56dfsdfd"
##[1] "ABCDEFG HI56dfsdf d"
##[1] "ABCDEFGHI 56 dfsdfd"
##[1] "A BCDEFGHI56dfs dfd"
##[1] " ABCD EFGHI56dfsdfd"
##[1] "ABCDE FGHI56dfsd fd"