有条件地提取连续的递减数字

时间:2018-06-28 21:51:13

标签: r dplyr

我有一个数据帧,其数据连续减少,有的数据依次减少和增加,然后又减少(成组)。

我需要提取此连续的减少部分,并删除非系统性的增加和减少部分!

这是我的意思的测试数据

test=data.frame(set=gl(3,9),vals=c(c(10,10,10, 9.9, 8.1, 1, 1,1,1),c(10,10,10, 9.9,6.1,1, 2,1,1),c(10,10,10, 7,6,1,2,0,1)))

> test
   set vals
1    1 10.0
2    1 10.0
3    1 10.0
4    1  9.9
5    1  8.1
6    1  1.0
7    1  1.0
8    1  1.0
9    1  1.0
10   2 10.0
11   2 10.0
12   2 10.0
13   2  9.9
14   2  6.1
15   2  1.0
16   2  2.0
17   2  1.0
18   2  1.0
19   3 10.0
20   3 10.0
21   3 10.0
22   3  7.0
23   3  6.0
24   3  1.0
25   3  2.0
26   3  0.0
27   3  1.0

我编写了一个简单的函数slice_it来查找数据中的连续减少s

slice_it <-  function(x){

  temp <- c(0,diff(x))

  }

library(dplyr)
test%>%
  group_by(set)%>%
  mutate(diff_x=slice_it(vals))

给出

  set vals diff_x
1    1 10.0    0.0  #remove
2    1 10.0    0.0  #remove
3    1 10.0    0.0  #remove
4    1  9.9   -0.1  #keep 
5    1  8.1   -1.8  #keep 
6    1  1.0   -7.1  #keep
7    1  1.0    0.0  #remove 
8    1  1.0    0.0  #remove
9    1  1.0    0.0  #remove
10   2 10.0    0.0  #remove
11   2 10.0    0.0  #remove
12   2 10.0    0.0  #remove
13   2  9.9   -0.1  #keep
14   2  6.1   -3.8  #keep
15   2  1.0   -5.1  #keep
16   2  2.0    1.0  #remove
17   2  1.0   -1.0  #remove 
18   2  1.0    0.0  #remove
19   3 10.0    0.0  #remove
20   3 10.0    0.0  #remove
21   3 10.0    0.0  #remove
22   3  7.0   -3.0  #keep
23   3  6.0   -1.0  #keep
24   3  1.0   -5.0  #keep
25   3  2.0    1.0  #remove
26   3  0.0   -2.0  #remove
27   3  1.0    1.0  #remove

如果我将过滤器添加到dplyr链中 给出

filter(diff_x<0)

# A tibble: 11 x 3
# Groups:   set [3]
  set    vals  diff_x
 1 1       9.9 -0.1000 #keep
 2 1       8.1 -1.8    #keep
 3 1       1   -7.1    #keep 
 4 2       9.9 -0.1000 #keep 
 5 2       6.1 -3.8    #keep
 6 2       1   -5.1    #keep 
 7 2       1   -1      #remove
 8 3       7   -3      #keep
 9 3       6   -1      #keep
10 3       1   -5      #keep
11 3       0   -2      #remove

我用#remove指示的行仍然保留为diff <0。但这是在前一个数字增加之后发生的,因此应将其删除!

预期应该像

  set vals diff_x

4    1  9.9   -0.1  #keep 
5    1  8.1   -1.8  #keep 
6    1  1.0   -7.1  #keep
13   2  9.9   -0.1  #keep
14   2  6.1   -3.8  #keep
15   2  1.0   -5.1  #keep
22   3  7.0   -3.0  #keep
23   3  6.0   -1.0  #keep
24   3  1.0   -5.0  #keep

如何实现这一目标。 谢谢!

ps。从最后一部分开始切片将无济于事,因为从底部开始有多少行是不确定的。

1 个答案:

答案 0 :(得分:1)

假设我正确理解了您的情况,我们可以在diff ged值上使用第二个lag;这会重现您的预期输出

test %>%
    rowid_to_column("row") %>%
    group_by(set) %>%
    mutate(
        diff = c(0, diff(vals)),
        diff2 = c(0, diff(lag(vals)))) %>%
    filter(diff < 0 & diff2 <= 0) %>%
    select(-diff2)
## A tibble: 9 x 4
## Groups:   set [3]
#    row set    vals    diff
#  <int> <fct> <dbl>   <dbl>
#1     4 1      9.90 -0.1000
#2     5 1      8.10 -1.80
#3     6 1      1.00 -7.10
#4    13 2      9.90 -0.1000
#5    14 2      6.10 -3.80
#6    15 2      1.00 -5.10
#7    22 3      7.00 -3.00
#8    23 3      6.00 -1.00
#9    24 3      1.00 -5.00

更新

要重新使用您的slice_it函数

slice_it <-  function(x) c(0, diff(x))
test %>%
    group_by(set) %>%
    mutate(diff_x = slice_it(vals)) %>%
    filter(diff_x < 0 & slice_it(lag(vals)) <= 0)