我有一个包含三列的数据框,其中第一列是ID,第二列表示年份,第三列是与该年度ID相关联的值:
df.in <- data.frame("id"=c(1,1,1,1,1, 2,2,2,2,2,2, 3,3,3,3,3,3,3),
"yr"=c(2005,2006,2007,2008,2010, 2001,2002,2003,2006,2008,2009, 2001, 2002,2003,2004,2005,2007,2009),
"val"=c(5,6,7,8,10, 1,2,3,6,8,10, 1,2,3,4,5,7,9))
我想删除年份与上一年大于1的差距的行。换句话说,我想只保留数据中的那些行,其中年份以1为增量相互跟随:
df.out <- data.frame("id"=c(1,1,1,1, 2,2,2, 3,3,3,3,3),
"yr"=c(2005,2006,2007,2008, 2001,2002,2003,2001, 2002,2003,2004,2005),
"val"=c(5,6,7,8, 1,2,3, 1,2,3,4,5))
有没有办法在使用dplyr
的R中执行此操作?如果可能的话,我想要一个包含所有废弃年份的数据框:
df.discard <- data.frame("id"=c(1, 2,2, 3,3),
"yr"=c(2010, 2006, 2008,2009, 2007,2009),
"val"=c(10, 6, 8,10, 7,9))
答案 0 :(得分:3)
使用lag
df.in %>% filter(val - lag(val) > 1)
基于@Sotos和@akrun,将代码从使用val
更改为yr
:
df.in <- data.frame("id"=c(1,1,1,1,1, 2,2,2,2,2,2, 3,3,3,3,3,3,3),
"yr"=c(2005,2006,2007,2008,2010, 2001,2002,2003,2006,2008,2010, 2001, 2002,2003,2004,2005,2007,2009),
"val"=c(5,6,7,8,10, 1,2,3,6,8,10, 1,2,3,4,5,7,9))
df.out <- data.frame("id"=c(1,1,1,1, 2,2,2,2, 3,3,3,3,3),
"yr"=c(2005,2006,2007,2008, 2001,2002,2003,2006,2001, 2002,2003,2004,2005),
"val"=c(5,6,7,8, 1,2,3,6, 1,2,3,4,5))
#output
df.out <- df.in %>% group_by(id) %>% filter((yr - lag(yr, default = yr[1]) <= 1))
df.out
#ignored
df.ignored <- df.in %>% group_by(id) %>% filter((yr - lag(yr, default = yr[1]) > 1))
df.ignored
输出:
> df.out
# A tibble: 12 x 3
# Groups: id [3]
id yr val
<dbl> <dbl> <dbl>
1 1.00 2005 5.00
2 1.00 2006 6.00
3 1.00 2007 7.00
4 1.00 2008 8.00
5 2.00 2001 1.00
6 2.00 2002 2.00
7 2.00 2003 3.00
8 3.00 2001 1.00
9 3.00 2002 2.00
10 3.00 2003 3.00
11 3.00 2004 4.00
12 3.00 2005 5.00
> df.ignored
# A tibble: 6 x 3
# Groups: id [3]
id yr val
<dbl> <dbl> <dbl>
1 1.00 2010 10.0
2 2.00 2006 6.00
3 2.00 2008 8.00
4 2.00 2010 10.0
5 3.00 2007 7.00
6 3.00 2009 9.00