Question

我有一个基本上是

的数据集

p       t
0       35.6
0       34
0.08    33.9
0       33.9
0.72    33.9
0.82    33.9
0.78    33.9
0.78    33.9
0.02    33.9
0.81    33.9
0.81    33.9
0.81    33.9
0.77    28.6
0.71    21
0.16    20.2
0       33.9

并且希望将数据集修剪为当p首次上升到高于.1时以及当t先降低到p跳过起始阈值时所具有的值之间的条目。

我尝试的语法是

dataset$delete <- 0
dataset$p <- as.numeric(as.character(dataset$p))
for (i in seq(along=dataset$p)) {if (dataset$p[i] < .1) {dataset$delete <- 1} else {break("done")}}

并且我无法弄清楚为什么它不想工作，特别是为什么我得到循环已经停止的报告然后进入并发现所有观察的删除已设置为1

我觉得这归结于我忘记了循环如何在R中起作用，但我无法解决问题。有什么提示吗？

Answer 1

dat <- read.table(head=TRUE, text = "p       t
0       35.6
0       34
0.08    33.9
0       33.9
0.72    33.9
0.82    33.9
0.78    33.9
0.78    33.9
0.02    33.9
0.81    33.9
0.81    33.9
0.81    33.9
0.77    28.6
0.71    21
0.16    20.2
0       33.9")


## i0: row index when p first rises to above .1
thresh.p <- 0.1
i0 <- min(which(dat$p > thresh.p))

## thresh.t: value of t when p trips the start threshold
thresh.t <- dat$t[i0]
## trick: reset values of t to thresh.t for i<=i0,
## so that the first t to drop below thresh.t has row index larger than i0
dat2 <- dat
dat2$t[1:i0] <- thresh.t
i1 <- min(which(dat2$t < thresh.t))

dat[i0:i1, ]

Answer 2

一个相当简短但又惯用的dplyr解决方案

 library(dplyr)
 df %>% filter(p>.1) %>% filter(t >= t[1])

符合预期

     p    t
1 0.72 33.9
2 0.82 33.9
3 0.78 33.9
4 0.78 33.9
5 0.81 33.9
6 0.81 33.9
7 0.81 33.9

尝试根据第一次传递阈值来修剪数据集

2 个答案: