给出两个向量:“模式”和“尾随”。问题:“线索”中的“模式”多久出现一次? 示例:
模式<-c(1,2,3)
线索<-c(7,1,4,2,9,2,3)
正确的解决方案:2(即1,2,3和1,2,3;“ 2”在中间出现两次)。
我尝试过:
getPerformance <- function(pattern,trail) {
tmp <- 0
for(i in 1:length(pattern)) {
for(j in 1:length(trail)) {
if(pattern[i]==trail[j]) {
if(i<length(pattern)) {
sum(pattern[i:length(pattern)])
}
tmp <- 1 * getPerformance(pattern[i:length(pattern)],trail[j:length(trail)])
}
}
}
return(tmp)
}
但是此功能不会终止。当然,欢迎使用非递归解决方案。感谢您的帮助!
答案 0 :(得分:7)
n_subseq = function(trail, pattern) {
# generate all subsets of the elements of `trail` in `pattern`
# of `length(pattern)`
# preserving order (as combn does)
# that are all equal to `pattern`
sum(combn(
x = trail[trail %in% pattern],
m = length(pattern),
FUN = function(x) all(x == pattern)
))
}
n_subseq(trail = c(7, 1, 4, 2, 9, 2, 3), pattern = 1:3)
# [1] 2
n_subseq(c(1, 2, 2, 3, 3), 1:3)
# [1] 4
答案 1 :(得分:4)
首先,我们可以忽略pattern
中未出现的元素:
tt = trail[trail %in% pattern]
然后,我将执行以下递归解决方案:
count_patt = function(p, v){
# stop if done searching
if (length(p) == 0L) return(0L)
# find matches
w = which(v == p[1L])
# report matches if done searching
if (length(p) == 1L) return(length(w))
# otherwise, search for subsequent matches
pn = p[-1L]
sum(vapply(w, function(wi) count_patt(pn, tail(v, -wi)), FUN.VALUE = 0L))
}
count_patt(pattern, tt)
# [1] 2
另一个递归思想:
count_patt2 = function(p, v){
# succeed if there's nothing to search for
if (length(p) == 0L) return(1L)
# find match
w = match(p[1L], v)
# fail if not found
if (is.na(w)) return(0L)
# if found, define rest of searchable vector
tv = tail(v, -w)
# count if same pattern is found later
count_same = count_patt(p, tv)
# or if rest of pattern is found later
count_next = count_patt(p[-1L], tv)
count_same + count_next
}
count_patt2(pattern, trail)
# [1] 2
如果pattern
的元素是不同的,我认为这也可行:
v = na.omit(match(trail, pattern))
prod(table(v[v == cummax(v)]))*(length(pattern) == length(v))
# [1] 2
一个简单的基准(到目前为止仅包括@Gregor函数):
set.seed(1)
v0 = 1:9
nv = 200L
np = 5L
vec = sample(v0, nv, replace=TRUE)
patt = sample(v0, np, replace=TRUE)
system.time(res_count2 <- count_patt2(patt, vec))
# user system elapsed
# 0.56 0.00 0.56
system.time(res_count1 <- count_patt(patt, vec))
# user system elapsed
# 0.60 0.00 0.61
system.time(res_subseq <- n_subseq(vec, patt))
# user system elapsed
# 25.89 0.15 26.16
length(unique(c(res_subseq, res_count1, res_count2))) == 1L
# [1] TRUE
评论。我发现Gregor的res_subseq
比我的更具可读性。我确信会有更有效的递归解决方案。
答案 2 :(得分:3)
您可以将rle
用作代理:
max(rle(trail[trail %in% pattern])$lengths)
[1] 2