跟进this question,我有另一个例子,我无法在那里应用已接受的答案。
这一次,我想找到group
向量中的每个EXACT labs
元素,发生两次。
labs <- c("Beijing T0 - BC-89 + CN --vs-- Zhangjiakou T0 - BC-89 + CN",
"Beijing T24 - BC-89 + CN --vs-- Zhangjiakou T24 - BC-89 + CN",
"Beijing T0 - BC-89 + CN with 2% DD + 1.6% ZC --vs-- Zhangjiakou T0 - BC-89 + CN with 2% DD + 1.6% ZC",
"Beijing T24 - BC-89 + CN with 2% DD + 1.6% ZC --vs-- Zhangjiakou T24 - BC-89 + CN with 2% DD + 1.6% ZC",
"Beijing T0 - BC-89 with 2% Puricare + 5% Merquat + CN --vs-- Zhangjiakou T0 - BC-89 with 2% Puricare + 5% Merquat + CN",
"Beijing T24 - BC-89 with 2% Puricare + 5% Merquat + CN --vs-- Zhangjiakou T24 - BC-89 with 2% Puricare + 5% Merquat + CN",
"Beijing T0 - BC-89 + CN --vs-- Beijing T24 - BC-89 + CN",
"Zhangjiakou T0 - BC-89 + CN --vs-- Zhangjiakou T24 - BC-89 + CN",
"Beijing T0 - BC-89 + CN with 2% DD + 1.6% ZC --vs-- Beijing T24 - BC-89 + CN with 2% DD + 1.6% ZC",
"Zhangjiakou T0 - BC-89 + CN with 2% DD + 1.6% ZC --vs-- Zhangjiakou T24 - BC-89 + CN with 2% DD + 1.6% ZC",
"Beijing T0 - BC-89 with 2% Puricare + 5% Merquat + CN --vs-- Beijing T24 - BC-89 with 2% Puricare + 5% Merquat + CN",
"Zhangjiakou T0 - BC-89 with 2% Puricare + 5% Merquat + CN --vs-- Zhangjiakou T24 - BC-89 with 2% Puricare + 5% Merquat + CN",
"Beijing T0 - BC-89 + CN --vs-- Beijing T0 - BC-89 + CN with 2% DD + 1.6% ZC",
"Beijing T0 - BC-89 + CN --vs-- Beijing T0 - BC-89 with 2% Puricare + 5% Merquat + CN",
"Beijing T0 - BC-89 + CN with 2% DD + 1.6% ZC --vs-- Beijing T0 - BC-89 with 2% Puricare + 5% Merquat + CN",
"Beijing T24 - BC-89 + CN --vs-- Beijing T24 - BC-89 + CN with 2% DD + 1.6% ZC",
"Beijing T24 - BC-89 + CN --vs-- Beijing T24 - BC-89 with 2% Puricare + 5% Merquat + CN",
"Beijing T24 - BC-89 + CN with 2% DD + 1.6% ZC --vs-- Beijing T24 - BC-89 with 2% Puricare + 5% Merquat + CN",
"Zhangjiakou T0 - BC-89 + CN --vs-- Zhangjiakou T0 - BC-89 + CN with 2% DD + 1.6% ZC",
"Zhangjiakou T0 - BC-89 + CN --vs-- Zhangjiakou T0 - BC-89 with 2% Puricare + 5% Merquat + CN",
"Zhangjiakou T0 - BC-89 + CN with 2% DD + 1.6% ZC --vs-- Zhangjiakou T0 - BC-89 with 2% Puricare + 5% Merquat + CN",
"Zhangjiakou T24 - BC-89 + CN --vs-- Zhangjiakou T24 - BC-89 + CN with 2% DD + 1.6% ZC",
"Zhangjiakou T24 - BC-89 + CN --vs-- Zhangjiakou T24 - BC-89 with 2% Puricare + 5% Merquat + CN",
"Zhangjiakou T24 - BC-89 + CN with 2% DD + 1.6% ZC --vs-- Zhangjiakou T24 - BC-89 with 2% Puricare + 5% Merquat + CN")
labs
groups <- c("BC-89 + CN", "BC-89 + CN with 2% DD + 1.6% ZC", "BC-89 with 2% Puricare + 5% Merquat + CN")
groups
这是我的尝试,但无效:
A <- grep(gsub("\\+", "\\\\+", paste0(groups[1], "{2}")), labs, value=TRUE) #only elements with exactly "BC-89 + CN" appearing twice
B <- grep(gsub("\\+", "\\\\+", paste0(groups[2], "{2}")), labs, value=TRUE) #only elements with exactly "BC-89 + CN with 2% DD + 1.6% ZC" appearing twice
C <- grep(gsub("\\+", "\\\\+", paste0(groups[3], "{2}")), labs, value=TRUE) #only elements with exactly "BC-89 with 2% Puricare + 5% Merquat + CN" appearing twice
期望的输出是(注意我想要精确的组,所以“BC-89 + CN”不应该找到“BC-89 + CN,2%DD + 1.6%ZC”):
> A
[1] "Beijing T0 - BC-89 + CN --vs-- Zhangjiakou T0 - BC-89 + CN"
[2] "Beijing T24 - BC-89 + CN --vs-- Zhangjiakou T24 - BC-89 + CN"
[3] "Beijing T0 - BC-89 + CN --vs-- Beijing T24 - BC-89 + CN"
[4] "Zhangjiakou T0 - BC-89 + CN --vs-- Zhangjiakou T24 - BC-89 + CN"
> B
[1] "Beijing T0 - BC-89 + CN with 2% DD + 1.6% ZC --vs-- Zhangjiakou T0 - BC-89 + CN with 2% DD + 1.6% ZC"
[2] "Beijing T24 - BC-89 + CN with 2% DD + 1.6% ZC --vs-- Zhangjiakou T24 - BC-89 + CN with 2% DD + 1.6% ZC"
[3] "Beijing T0 - BC-89 + CN with 2% DD + 1.6% ZC --vs-- Beijing T24 - BC-89 + CN with 2% DD + 1.6% ZC"
[4] "Zhangjiakou T0 - BC-89 + CN with 2% DD + 1.6% ZC --vs-- Zhangjiakou T24 - BC-89 + CN with 2% DD + 1.6% ZC"
> C
[1] "Beijing T0 - BC-89 with 2% Puricare + 5% Merquat + CN --vs-- Zhangjiakou T0 - BC-89 with 2% Puricare + 5% Merquat + CN"
[2] "Beijing T24 - BC-89 with 2% Puricare + 5% Merquat + CN --vs-- Zhangjiakou T24 - BC-89 with 2% Puricare + 5% Merquat + CN"
[3] "Beijing T0 - BC-89 with 2% Puricare + 5% Merquat + CN --vs-- Beijing T24 - BC-89 with 2% Puricare + 5% Merquat + CN"
[4] "Zhangjiakou T0 - BC-89 with 2% Puricare + 5% Merquat + CN --vs-- Zhangjiakou T24 - BC-89 with 2% Puricare + 5% Merquat + CN"
答案 0 :(得分:1)
您应该使用(paste0(group[1], ".*", group[1])
或sprintf("(%s.*){2}", groups[1])
)
a <- grep(gsub("\\+", "\\\\+", sprintf("(%s.*){2}", groups[1])), labs)
b <- grep(gsub("\\+", "\\\\+", sprintf("(%s.*){2}", groups[2])), labs)
c <- grep(gsub("\\+", "\\\\+", sprintf("(%s.*){2}", groups[3])), labs)
输出:
> print(list(a, b, c))
# [[1]]
# [1] 1 2 3 4 7 8 9 10 13 16 19 22
#
# [[2]]
# [1] 3 4 9 10
#
# [[3]]
# [1] 5 6 11 12
以groups[1]
("BC-89 + CN"
)为例,您只找到包含"BC-89 + CNBC-89 + CN"
的元素,但在您想要的字符串出现之间可能会出现其他字符。
修改强>:
由于“BC-89 + CN”组不应包含“BC-89 + CN含2%DD + 1.6%ZC”,因此需要再做一步
a <- a[!a %in% b]
输出:
> print(a)
# [1] 1 2 7 8 13 16 19 22
编辑2:
我注意到您可能想要检查'group'字符串是否出现在'--vs--'之前和之后(两次),并考虑另一种方法。
check_group <- function(ele, group) {
x <- strsplit(ele, " --vs-- ")[[1]]
group <- gsub("\\-", "\\\\-", group)
group <- gsub("\\+", "\\\\+", group)
group <- paste0(group, "$")
if (grepl(group, x[[1]]) & grepl(group, x[[2]])) {
return(ele)
} else {
return(NULL)
}
}
remove_null <- function(x) {
return(unlist(x[!sapply(x, is.null)]))
}
a1 <- remove_null(lapply(labs, check_group, groups[1]))
a2 <- remove_null(lapply(labs, check_group, groups[2]))
a3 <- remove_null(lapply(labs, check_group, groups[3]))
输出:
> print(list(a1, a2, a3))
# [[1]]
# [1] "Beijing T0 - BC-89 + CN --vs-- Zhangjiakou T0 - BC-89 + CN" "Beijing T24 - BC-89 + CN --vs-- Zhangjiakou T24 - BC-89 + CN"
# [3] "Beijing T0 - BC-89 + CN --vs-- Beijing T24 - BC-89 + CN" "Zhangjiakou T0 - BC-89 + CN --vs-- Zhangjiakou T24 - BC-89 + CN"
#
# [[2]]
# [1] "Beijing T0 - BC-89 + CN with 2% DD + 1.6% ZC --vs-- Zhangjiakou T0 - BC-89 + CN with 2% DD + 1.6% ZC"
# [2] "Beijing T24 - BC-89 + CN with 2% DD + 1.6% ZC --vs-- Zhangjiakou T24 - BC-89 + CN with 2% DD + 1.6% ZC"
# [3] "Beijing T0 - BC-89 + CN with 2% DD + 1.6% ZC --vs-- Beijing T24 - BC-89 + CN with 2% DD + 1.6% ZC"
# [4] "Zhangjiakou T0 - BC-89 + CN with 2% DD + 1.6% ZC --vs-- Zhangjiakou T24 - BC-89 + CN with 2% DD + 1.6% ZC"
#
# [[3]]
# [1] "Beijing T0 - BC-89 with 2% Puricare + 5% Merquat + CN --vs-- Zhangjiakou T0 - BC-89 with 2% Puricare + 5% Merquat + CN"
# [2] "Beijing T24 - BC-89 with 2% Puricare + 5% Merquat + CN --vs-- Zhangjiakou T24 - BC-89 with 2% Puricare + 5% Merquat + CN"
# [3] "Beijing T0 - BC-89 with 2% Puricare + 5% Merquat + CN --vs-- Beijing T24 - BC-89 with 2% Puricare + 5% Merquat + CN"
# [4] "Zhangjiakou T0 - BC-89 with 2% Puricare + 5% Merquat + CN --vs-- Zhangjiakou T24 - BC-89 with 2% Puricare + 5% Merquat + CN"