重新定位并复制R中的字符串

时间:2017-02-04 14:06:38

标签: r

我的目标是重新定位单词并以特定模式复制和粘贴它们。

a = 'blahblah (Peter|Sally|Tom)'
b = 'word (apple|grape|tomato) vocabulary (rice|mice|lice)'
c = 'people person (you|me|us) do not know how (it|them) works'

我可以重新定位放置在'之前的字符串('使用gsub

gsub('\\s*(\\S+)\\s*\\(', '(\\1 ', a)

使用该功能,我可以在下面创建字符串集。

a
[1]'(blahblah Peter|Sally|Tom)'
b
[1]'(word apple|grape|tomato) (vocabulary rice|mice|lice)'
c
[1]'people (person you|me|us) do not know (how it|them) works'

但是,我不知道如何复制'\\1'并将其粘贴到'|'之后

a
[1]'(blahblah Peter|blahblah Sally|blahblah Tom)'
b
[1]'(word apple|word grape|word tomato) (vocabulary rice|vocabulary mice|vocabulary lice)'
c
[1]'people (person you|person me|person us) do not know (how it|how them) works'

有没有办法让这成为可能?

2 个答案:

答案 0 :(得分:3)

我们可以使用strsplit

sapply(strsplit(a, "[| ]|\\(|\\)"), function(x) {
        x1 <- x[nzchar(x)]
        paste0("(", paste(x1[1], x1[-1], collapse="|"), ")")})
#[1] "(blahblah Peter|blahblah Sally|blahblah Tom)"

多个案例

paste(sapply(strsplit(b, "(?<=\\))\\s+", perl = TRUE)[[1]],
      function(x) sapply(strsplit(x,  "[| ]|\\(|\\)"), function(y) { 
          x1 <- y[nzchar(y)]
        paste0("(", paste(x1[1], x1[-1], collapse="|"), ")") })), collapse=' ')
#[1] "(word apple|word grape|word tomato) (vocabulary rice|vocabulary mice|vocabulary lice)"

另一个选项是str_extract

library(stringr)
m1 <- matrix(str_extract_all(b, "\\w+")[[1]], ncol=2)
do.call(sprintf, c(do.call(paste, c(as.data.frame(matrix(paste(m1[1,][col(m1[-1,])],
    m1[-1,]), nrow=2, byrow=TRUE)), sep="|")), list(fmt = "(%s) (%s)")))
#[1] "(word apple|word grape|word tomato) (vocabulary rice|vocabulary mice|vocabulary lice)"

更新

基于OP的帖子中显示的新模式,我们创建了一种更通用的方法

funPaste <- function(str1){
     v1 <- strsplit(str1, "\\s+")[[1]]
     i1 <- grep("\\(", v1)
     v1[i1] <- mapply(function(x,y) paste0("(", paste(x, y, collapse="|"), ")"),
                    v1[i1-1], str_extract_all(v1[i1], "\\w+"))
     paste(v1[-(i1-1)], collapse=" ")
}

funPaste(a)
#[1] "(blahblah Peter|blahblah Sally|blahblah Tom)"
funPaste(b)
#[1] "(word apple|word grape|word tomato) (vocabulary rice|vocabulary mice|vocabulary lice)"
funPaste(c)
#[1] "people (person you|person me|person us) do not know (how it|how them) works"

UPDATE2

我们也可以使用gsubfn

library(gsubfn)
funPaste2 <- function(str1){
    gsubfn("(\\w+)\\s+[(]([^)]+)[)]", function(x,y) 
   paste0("(", paste(x, unlist(strsplit(y, "[|]")), collapse="|"), ")"), str1)
 }

funPaste2(c(a, b, c))
#[1] "(blahblah Peter|blahblah Sally|blahblah Tom)"                                         
#[2] "(word apple|word grape|word tomato) (vocabulary rice|vocabulary mice|vocabulary lice)"
#[3] "people (person you|person me|person us) do not know (how it|how them) works"    

答案 1 :(得分:1)

另一种方法:(尽可能减少正则表达式) - 因为我不太了解:)

c=unlist(strsplit(b, " "))[c(T,F)] # extract all the single words 
# c
# [1] "blahblah"
# [1] "word"       "vocabulary" 
d=unlist(strsplit)(b, " ")[c(F,T)] # extract the grouped words
#  d
# [1] "(Peter|Sally|Tom)"
# [1] "(apple|grape|tomato)" "(rice|mice|lice)"        

# now iterate through each 'd', split it on `|` and then clear it on `()` this output is then pasted with contents of 'c'
sapply(seq_along(d), function(x) paste("(", paste(c[x],gsub("(\\(|\\))", "",unlist(strsplit(d[x], "\\|"))), 
                                              collapse = "|"),")"))

# [1] "( blahblah Peter|blahblah Sally|blahblah Tom )"
# [1] "( word apple|word grape|word tomato )"  "( vocabulary rice|vocabulary mice|vocabulary lice )"