我的目标是重新定位单词并以特定模式复制和粘贴它们。
a = 'blahblah (Peter|Sally|Tom)'
b = 'word (apple|grape|tomato) vocabulary (rice|mice|lice)'
c = 'people person (you|me|us) do not know how (it|them) works'
我可以重新定位放置在'之前的字符串('使用gsub
gsub('\\s*(\\S+)\\s*\\(', '(\\1 ', a)
使用该功能,我可以在下面创建字符串集。
a
[1]'(blahblah Peter|Sally|Tom)'
b
[1]'(word apple|grape|tomato) (vocabulary rice|mice|lice)'
c
[1]'people (person you|me|us) do not know (how it|them) works'
但是,我不知道如何复制'\\1'
并将其粘贴到'|'
之后
a
[1]'(blahblah Peter|blahblah Sally|blahblah Tom)'
b
[1]'(word apple|word grape|word tomato) (vocabulary rice|vocabulary mice|vocabulary lice)'
c
[1]'people (person you|person me|person us) do not know (how it|how them) works'
有没有办法让这成为可能?
答案 0 :(得分:3)
我们可以使用strsplit
sapply(strsplit(a, "[| ]|\\(|\\)"), function(x) {
x1 <- x[nzchar(x)]
paste0("(", paste(x1[1], x1[-1], collapse="|"), ")")})
#[1] "(blahblah Peter|blahblah Sally|blahblah Tom)"
多个案例
paste(sapply(strsplit(b, "(?<=\\))\\s+", perl = TRUE)[[1]],
function(x) sapply(strsplit(x, "[| ]|\\(|\\)"), function(y) {
x1 <- y[nzchar(y)]
paste0("(", paste(x1[1], x1[-1], collapse="|"), ")") })), collapse=' ')
#[1] "(word apple|word grape|word tomato) (vocabulary rice|vocabulary mice|vocabulary lice)"
另一个选项是str_extract
library(stringr)
m1 <- matrix(str_extract_all(b, "\\w+")[[1]], ncol=2)
do.call(sprintf, c(do.call(paste, c(as.data.frame(matrix(paste(m1[1,][col(m1[-1,])],
m1[-1,]), nrow=2, byrow=TRUE)), sep="|")), list(fmt = "(%s) (%s)")))
#[1] "(word apple|word grape|word tomato) (vocabulary rice|vocabulary mice|vocabulary lice)"
基于OP的帖子中显示的新模式,我们创建了一种更通用的方法
funPaste <- function(str1){
v1 <- strsplit(str1, "\\s+")[[1]]
i1 <- grep("\\(", v1)
v1[i1] <- mapply(function(x,y) paste0("(", paste(x, y, collapse="|"), ")"),
v1[i1-1], str_extract_all(v1[i1], "\\w+"))
paste(v1[-(i1-1)], collapse=" ")
}
funPaste(a)
#[1] "(blahblah Peter|blahblah Sally|blahblah Tom)"
funPaste(b)
#[1] "(word apple|word grape|word tomato) (vocabulary rice|vocabulary mice|vocabulary lice)"
funPaste(c)
#[1] "people (person you|person me|person us) do not know (how it|how them) works"
我们也可以使用gsubfn
library(gsubfn)
funPaste2 <- function(str1){
gsubfn("(\\w+)\\s+[(]([^)]+)[)]", function(x,y)
paste0("(", paste(x, unlist(strsplit(y, "[|]")), collapse="|"), ")"), str1)
}
funPaste2(c(a, b, c))
#[1] "(blahblah Peter|blahblah Sally|blahblah Tom)"
#[2] "(word apple|word grape|word tomato) (vocabulary rice|vocabulary mice|vocabulary lice)"
#[3] "people (person you|person me|person us) do not know (how it|how them) works"
答案 1 :(得分:1)
另一种方法:(尽可能减少正则表达式) - 因为我不太了解:)
c=unlist(strsplit(b, " "))[c(T,F)] # extract all the single words
# c
# [1] "blahblah"
# [1] "word" "vocabulary"
d=unlist(strsplit)(b, " ")[c(F,T)] # extract the grouped words
# d
# [1] "(Peter|Sally|Tom)"
# [1] "(apple|grape|tomato)" "(rice|mice|lice)"
# now iterate through each 'd', split it on `|` and then clear it on `()` this output is then pasted with contents of 'c'
sapply(seq_along(d), function(x) paste("(", paste(c[x],gsub("(\\(|\\))", "",unlist(strsplit(d[x], "\\|"))),
collapse = "|"),")"))
# [1] "( blahblah Peter|blahblah Sally|blahblah Tom )"
# [1] "( word apple|word grape|word tomato )" "( vocabulary rice|vocabulary mice|vocabulary lice )"