用值过滤因子

时间:2020-09-30 18:34:11

标签: r dplyr

我想过滤具有变量idyearvalue的基数。

structure(list(id = c(
  70101L, 70101L, 70101L, 70102L, 70102L,
  70102L, 70102L, 70102L, 70103L, 70103L, 70103L, 70103L, 70103L,
  70103L, 70104L, 70104L, 70104L, 70104L, 70104L, 70104L
), year = c(
  2013L,
  2014L, 2015L, 2013L, 2014L, 2015L, 2016L, 2017L, 2013L, 2014L,
  2015L, 2016L, 2017L, 2018L, 2013L, 2014L, 2015L, 2016L, 2017L,
  2018L
), value = c(
  4.68, 4.76, 5.14, 4.48, 4.71, 4.24, 5.13, 5.22,
  5.13, 5.05, 4.96, 5.09, 8.09, 7.82, 3.57, 7.96, 1.83, 4.56, 11,
  10.6
)), row.names = c(NA, -20L), class = "data.frame")

目标

过滤仅包含2013年至2018年完整信息的ID

     id      year value
     <fct>  <dbl> <dbl>
1   070103  2013  5.13
2   070103  2014  5.05
3   070103  2015  4.96
4   070103  2016  5.09
5   070103  2017  8.09
6   070103  2018  7.82
7   070104  2013  3.57
8   070104  2014  7.96
9   070104  2015  1.83
10  070104  2016  4.56
11  070104  2017 11.0 
12  070104  2018 10.6 

3 个答案:

答案 0 :(得分:1)

可以这样实现:

library(dplyr)

d <- structure(list(id = c(
  70101L, 70101L, 70101L, 70102L, 70102L,
  70102L, 70102L, 70102L, 70103L, 70103L, 70103L, 70103L, 70103L,
  70103L, 70104L, 70104L, 70104L, 70104L, 70104L, 70104L
), year = c(
  2013L,
  2014L, 2015L, 2013L, 2014L, 2015L, 2016L, 2017L, 2013L, 2014L,
  2015L, 2016L, 2017L, 2018L, 2013L, 2014L, 2015L, 2016L, 2017L,
  2018L
), value = c(
  4.68, 4.76, 5.14, 4.48, 4.71, 4.24, 5.13, 5.22,
  5.13, 5.05, 4.96, 5.09, 8.09, 7.82, 3.57, 7.96, 1.83, 4.56, 11,
  10.6
)), row.names = c(NA, -20L), class = "data.frame")

d %>%
  group_by(id) %>%
  filter(all(c(2013:2018) %in% year))
#> # A tibble: 12 x 3
#> # Groups:   id [2]
#>       id  year value
#>    <int> <int> <dbl>
#>  1 70103  2013  5.13
#>  2 70103  2014  5.05
#>  3 70103  2015  4.96
#>  4 70103  2016  5.09
#>  5 70103  2017  8.09
#>  6 70103  2018  7.82
#>  7 70104  2013  3.57
#>  8 70104  2014  7.96
#>  9 70104  2015  1.83
#> 10 70104  2016  4.56
#> 11 70104  2017 11   
#> 12 70104  2018 10.6

答案 1 :(得分:1)

另一种方法是使用变量来检查年份中是否有这样的连续序列:

library(dplyr)
#Code
df <- df %>% group_by(id) %>%
  mutate(Diff=c(1,diff(year)),
         Index=sum(Diff)) %>%
  filter(Index==6) %>% select(-c(Index,Diff))

输出:

# A tibble: 12 x 3
# Groups:   id [2]
      id  year value
   <int> <int> <dbl>
 1 70103  2013  5.13
 2 70103  2014  5.05
 3 70103  2015  4.96
 4 70103  2016  5.09
 5 70103  2017  8.09
 6 70103  2018  7.82
 7 70104  2013  3.57
 8 70104  2014  7.96
 9 70104  2015  1.83
10 70104  2016  4.56
11 70104  2017 11   
12 70104  2018 10.6 

使用了一些数据:

#Data
df <- structure(list(id = c(70101L, 70101L, 70101L, 70102L, 70102L, 
70102L, 70102L, 70102L, 70103L, 70103L, 70103L, 70103L, 70103L, 
70103L, 70104L, 70104L, 70104L, 70104L, 70104L, 70104L), year = c(2013L, 
2014L, 2015L, 2013L, 2014L, 2015L, 2016L, 2017L, 2013L, 2014L, 
2015L, 2016L, 2017L, 2018L, 2013L, 2014L, 2015L, 2016L, 2017L, 
2018L), value = c(4.68, 4.76, 5.14, 4.48, 4.71, 4.24, 5.13, 5.22, 
5.13, 5.05, 4.96, 5.09, 8.09, 7.82, 3.57, 7.96, 1.83, 4.56, 11, 
10.6)), row.names = c(NA, -20L), class = "data.frame")

答案 2 :(得分:1)

使用基本R函数,您可以这样做:

new_df <- do.call("rbind", split(df, df$id)[sapply(split(df, df$id), function (x) {
  all(2013:2018 %in% x$year)
})])
rownames(new_df) <- NULL
new_df
#       id year value
# 1  70103 2013  5.13
# 2  70103 2014  5.05
# 3  70103 2015  4.96
# 4  70103 2016  5.09
# 5  70103 2017  8.09
# 6  70103 2018  7.82
# 7  70104 2013  3.57
# 8  70104 2014  7.96
# 9  70104 2015  1.83
# 10 70104 2016  4.56
# 11 70104 2017 11.00
# 12 70104 2018 10.60
相关问题