删除R

时间:2016-04-22 12:31:32

标签: r sum duplicates

我有一个数据集,df,看起来像这样但有几百万个实例:

Date AD Runway MTOW nr.flights 2008-01-01 A 18 376 2 2008-01-01 A 18 376 2 2008-01-01 D 36 190 1 2008-01-02 D 09 150 2 2008-01-02 A 36 280 1 2008-01-02 A 36 280 1

我希望它看起来像这样:

Date AD Runway MTOW nr.flights 2008-01-01 A 18 752 4 2008-01-01 D 36 190 2 2008-01-02 D 9 150 2 2008-01-02 A 36 560 1

基本上我想将所有相同的Date,AD和Runway行组合在一起,以便删除所有重复项。同时,我希望MTOW和nr.flights能够针对特定的Date,AD和Runway进行总结。

我试过这个: vals <- expand.grid(Date = unique(df$Date), Runway = unique(df$Runway), AD = unique(df$AD))

所以我可以将它与原始数据集df合并,但这不起作用。我也试过了group_by的几个组合,但这也没有给我我想要的结果。

重现:

df <- data.frame(Date=c("2008-01-01","2008-01-01","2008-01-01","2008-01-02","2008-01-02","2008-01-02"),
              AD = c("A", "A", "D", "D", "A", "A"), Runway = c(18, 18, 36, 09, 36,36), 
              MTOW = c(376, 376, 190, 150, 280, 280), nr.flights = c(2,2,1,2,1,1))

任何帮助将不胜感激!

2 个答案:

答案 0 :(得分:2)

使用库dplyr,您可以执行以下操作:

df %>% group_by(Date, AD, Runway) %>% summarise(MTOW = sum(MTOW), nr.flights = sum(nr.flights))
Source: local data frame [4 x 5]
Groups: Date, AD [?]

        Date     AD Runway  MTOW nr.flights
      (fctr) (fctr)  (dbl) (dbl)      (dbl)
1 2008-01-01      A     18   752          4
2 2008-01-01      D     36   190          1
3 2008-01-02      A     36   560          2
4 2008-01-02      D      9   150          2

我认为有很多这样的帖子和例子。

答案 1 :(得分:0)

这是一个使用包 plyr

的人
Authorization: Negotiate TlRMTVNTUAABAAAAl4II4gAAAAAAAAAAAAAAAAAAAAAGAbEdAAAADw==