Question

对于这个问题，我并不是在寻找一个山谷，而是一种识别数字序列中重复“山谷”的方法。

我有这个数据

x <- c(1,1,2,2,1,1,2,2,3,3,3,2,2,2,3)

所以我尝试了以下内容：

test <- data.frame(x)

test <- test %>% mutate(Lag = c(tail(x, -1), NA))

which(test$x > test$Lag)+1

让我获得了5和12的位置。

问题是，如何获取代码以识别数字序列中剩余的“谷”。预期产出将是确定5,6和12,13,14的位置。

它有点类似于时间序列中的局部最小值，但这不是我想要的。

我还想将这些视为块;比如属于类别1的5,6的位置和属于类别2的12,13,14。

提前多多感谢！

Answer 1

可以使用cummax解决此问题。使用x，

cummax(x)
#  [1] 1 1 2 2 2 2 2 2 3 3 3 3 3 3 3
which(x != cummax(x))
# [1]  5  6 12 13 14
x[x != cummax(x)]
# [1] 1 1 2 2 2

您将获得5-6和12-14，然后您知道每个分配的类别。您可以使用split或某些分箱功能对它们进行更有意义的分组。

Answer 2

我们也可以使用rle

中的base R执行此操作

v1 <- seq_along(x)*inverse.rle(within.list(rle(x),
         {i1 <- c(0, diff(values))<0; values <- i1}))
v1[v1!=0]
#[1]  5  6 12 13 14

Answer 3

我们可以使用正则表达式（val表示negative slope后跟0 slopes后跟positive slope x，假设斜率为-1, 0和输入数据中的1一样，但我们可以推广）：

pattern <- 'N([0]+)P' # \_.._/
txt <- gsub('1', 'P', gsub('-1', 'N', paste(diff(x), collapse='')))
matched <- gregexpr(pattern,txt)
positions <- unlist(matched) + 1
lengths <- attr(matched[[1]], "match.length") - 2 # exclude N & P
valley.points <- lapply(1:length(positions), function(i)seq(positions[i], positions[i]+lengths[i],1))

#[[1]]
#[1] 5 6

#[[2]]
#[1] 12 13 14

plot(x, type='l')
points(unlist(valley.points), x[unlist(valley.points)], pch=19, col='red')

以数字序列

3 个答案: