我想删除重复项并保留year变量最大的那个。我的数据如下所示:
id name year position
1 Jane 1990 Sales
1 Jane 1991 Sales
1 Jane 1992 Sales
1 Jane 1993 Boss
1 Jane 1994 CEO
2 Tom 1978 HR
2 Tom 1979 Sales
2 Tom 1980 PR
2 Tom 1981 Boss
3 Jim 1981 Sales
3 Jim 1982 Sales
3 Jim 1983 PR
想要的输出是:
id name year position
1 Jane 1992 Sales
1 Jane 1993 Boss
1 Jane 1994 CEO
2 Tom 1978 HR
2 Tom 1979 Sales
2 Tom 1980 PR
2 Tom 1981 Boss
3 Jim 1982 Sales
3 Jim 1983 PR
有没有办法对此进行编码?我尝试了以下但没有奏效:
new<-ddply(df, df$position=="Sales", function(df) return(df[df$year==max(df$year),]))
答案 0 :(得分:3)
ddply(df, .(id, name, position), summarize, year = max(year))
如果您想要对其进行排序
arrange(ddply(df, .(id, name, position), summarize, year = max(year)), id, year)
我确实推荐plyr
的成功者:dplyr
library(dplyr)
df %>% group_by(id, name, position) %>% summarise(year=max(year)) %>% arrange(id, year)