我有一个数据帧,其数据连续减少,有的数据依次减少和增加,然后又减少(成组)。
我需要提取此连续的减少部分,并删除非系统性的增加和减少部分!
这是我的意思的测试数据
test=data.frame(set=gl(3,9),vals=c(c(10,10,10, 9.9, 8.1, 1, 1,1,1),c(10,10,10, 9.9,6.1,1, 2,1,1),c(10,10,10, 7,6,1,2,0,1)))
> test
set vals
1 1 10.0
2 1 10.0
3 1 10.0
4 1 9.9
5 1 8.1
6 1 1.0
7 1 1.0
8 1 1.0
9 1 1.0
10 2 10.0
11 2 10.0
12 2 10.0
13 2 9.9
14 2 6.1
15 2 1.0
16 2 2.0
17 2 1.0
18 2 1.0
19 3 10.0
20 3 10.0
21 3 10.0
22 3 7.0
23 3 6.0
24 3 1.0
25 3 2.0
26 3 0.0
27 3 1.0
我编写了一个简单的函数slice_it
来查找数据中的连续减少s
slice_it <- function(x){
temp <- c(0,diff(x))
}
library(dplyr)
test%>%
group_by(set)%>%
mutate(diff_x=slice_it(vals))
给出
set vals diff_x
1 1 10.0 0.0 #remove
2 1 10.0 0.0 #remove
3 1 10.0 0.0 #remove
4 1 9.9 -0.1 #keep
5 1 8.1 -1.8 #keep
6 1 1.0 -7.1 #keep
7 1 1.0 0.0 #remove
8 1 1.0 0.0 #remove
9 1 1.0 0.0 #remove
10 2 10.0 0.0 #remove
11 2 10.0 0.0 #remove
12 2 10.0 0.0 #remove
13 2 9.9 -0.1 #keep
14 2 6.1 -3.8 #keep
15 2 1.0 -5.1 #keep
16 2 2.0 1.0 #remove
17 2 1.0 -1.0 #remove
18 2 1.0 0.0 #remove
19 3 10.0 0.0 #remove
20 3 10.0 0.0 #remove
21 3 10.0 0.0 #remove
22 3 7.0 -3.0 #keep
23 3 6.0 -1.0 #keep
24 3 1.0 -5.0 #keep
25 3 2.0 1.0 #remove
26 3 0.0 -2.0 #remove
27 3 1.0 1.0 #remove
如果我将过滤器添加到dplyr
链中
给出
filter(diff_x<0)
# A tibble: 11 x 3
# Groups: set [3]
set vals diff_x
1 1 9.9 -0.1000 #keep
2 1 8.1 -1.8 #keep
3 1 1 -7.1 #keep
4 2 9.9 -0.1000 #keep
5 2 6.1 -3.8 #keep
6 2 1 -5.1 #keep
7 2 1 -1 #remove
8 3 7 -3 #keep
9 3 6 -1 #keep
10 3 1 -5 #keep
11 3 0 -2 #remove
我用#remove
指示的行仍然保留为diff
<0。但这是在前一个数字增加之后发生的,因此应将其删除!
预期应该像
set vals diff_x
4 1 9.9 -0.1 #keep
5 1 8.1 -1.8 #keep
6 1 1.0 -7.1 #keep
13 2 9.9 -0.1 #keep
14 2 6.1 -3.8 #keep
15 2 1.0 -5.1 #keep
22 3 7.0 -3.0 #keep
23 3 6.0 -1.0 #keep
24 3 1.0 -5.0 #keep
如何实现这一目标。 谢谢!
ps。从最后一部分开始切片将无济于事,因为从底部开始有多少行是不确定的。
答案 0 :(得分:1)
假设我正确理解了您的情况,我们可以在diff
ged值上使用第二个lag
;这会重现您的预期输出
test %>%
rowid_to_column("row") %>%
group_by(set) %>%
mutate(
diff = c(0, diff(vals)),
diff2 = c(0, diff(lag(vals)))) %>%
filter(diff < 0 & diff2 <= 0) %>%
select(-diff2)
## A tibble: 9 x 4
## Groups: set [3]
# row set vals diff
# <int> <fct> <dbl> <dbl>
#1 4 1 9.90 -0.1000
#2 5 1 8.10 -1.80
#3 6 1 1.00 -7.10
#4 13 2 9.90 -0.1000
#5 14 2 6.10 -3.80
#6 15 2 1.00 -5.10
#7 22 3 7.00 -3.00
#8 23 3 6.00 -1.00
#9 24 3 1.00 -5.00
要重新使用您的slice_it
函数
slice_it <- function(x) c(0, diff(x))
test %>%
group_by(set) %>%
mutate(diff_x = slice_it(vals)) %>%
filter(diff_x < 0 & slice_it(lag(vals)) <= 0)