Question

我有一个大型数据框，其中某些行在某些列中具有重复值。我想保留重复的值并总结那些不同的值。下面是我的数据样本：

data<-data.frame(season=c(2008,2009,2010,2011,2011,2012,2000,2001),
             lic=c(132228,140610,149215,158559,158559,944907,37667,45724),
             client=c(174,174,174,174,174,174,175,175),
             qtty=c(31,31,31,31,31,31,36,26),
             held=c(60,65,58,68,68,70,29,23),
             catch=c(7904,6761,9236,9323.2,801,NA,2330,3594.5),
             potlift=c(2715,2218,3000,3887,750,NA,2314,3472))

season  lic client  qtty    held    catch   potlift
2008    132228  174 31  60  7904    2715
2009    140610  174 31  65  6761    2218
2010    149215  174 31  58  9236    3000
2011    158559  174 31  68  9323.2  3887
2011    158559  174 31  68  801 750
2012    944907  174 31  70  NA  NA
2000    37667   175 36  29  2330    2314
2001    45724   175 26  23  3594.5  3472

请注意2011赛季重复，每个变量（client... held），catch和potlift除外。我需要保留（client... held）和总和catch和potlift的值;因此，我的新数据框应该如下例所示：

    season  lic client  qtty    held    catch   potlift
2008    132228  174 31  60  7904    2715
2009    140610  174 31  65  6761    2218
2010    149215  174 31  58  9236    3000
2011    158559  174 31  68  10124.2 4637
2012    944907  174 31  70  NA  NA
2000    37667   175 36  29  2330    2314
2001    45724   175 26  23  3594.5  3472

我试图使用aggregate这样做，但这个功能总结了一切。任何帮助将不胜感激。

Answer 1

data$catch <- with(data, ave(catch,list(lic,client,qtty,held),FUN=sum))
data$potlift <- with(data, ave(potlift,list(lic,client,qtty,held),FUN=sum))
unique(data)
  season    lic client qtty held   catch potlift
1   2008 132228    174   31   60  7904.0    2715
2   2009 140610    174   31   65  6761.0    2218
3   2010 149215    174   31   58  9236.0    3000
4   2011 158559    174   31   68 10124.2    4637
6   2012 944907    174   31   70      NA      NA
7   2000  37667    175   36   29  2330.0    2314
8   2001  45724    175   26   23  3594.5    3472

Answer 2

aggregate似乎对我很好，但我不确定你在尝试什么：

> aggregate(cbind(catch, potlift) ~ ., data, sum, na.action = "na.pass")
  season    lic client qtty held   catch potlift
1   2001  45724    175   26   23  3594.5    3472
2   2000  37667    175   36   29  2330.0    2314
3   2010 149215    174   31   58  9236.0    3000
4   2008 132228    174   31   60  7904.0    2715
5   2009 140610    174   31   65  6761.0    2218
6   2011 158559    174   31   68 10124.2    4637
7   2012 944907    174   31   70      NA      NA

在此处，使用cbind标识要汇总的列。然后，您可以指定所有其他列，或者只使用.表示“使用cbind调用中未提及的所有其他列。

求和一些行和列的单元格

2 个答案: