Question

我有一个看起来像这样的数据集。

  year recipient amount  id
1 1973    AG      17      7
2 1973    AG      18      7
3 1974    BE      20      9
4 1974    BE      22      9
5 1975    AG      20      7
6 1975    AG      25      7

我试图压扁行，以便每年每个收件人只有一行。我想将金额变量转换为等于该年度所有金额的总和。我理想的结果如下：

  year recipient amount id
1 1973    AG      35     7
2 1974    BE      42     7
3 1975    AG      45     7

我尝试编写一个循环来实现这一目标，但我认为必须有一种我不熟悉的简单方法。包裹中某处可能有地图或扁平功能？

Answer 1

尝试：

library(dplyr)
df %>% group_by(year, recipient, id) %>% summarise(amount=sum(amount))
Source: local data frame [3 x 4]
Groups: year, recipient

  year recipient id amount
1 1973        AG  7     35
2 1974        BE  9     42
3 1975        AG  7     45

Answer 2

这个简单的例子可能比你需要的更多，但是对于这种事情，我喜欢sqldf库，它允许你像使用SQL一样操纵数据帧。在你的情况下

library(sqldf)
newdf <- sqldf("SELECT year,recipient,id,sum(amount) as amount from olddf group by year,recipient,id")

默认情况下它使用SQLite作为后端，因此它可以使用相当复杂的SQL语句。我经常发现R的数据操作语言有点令人困惑，并且总是要查找我想要做的事情，所以使用SQL会非常方便。

Answer 3

以下是使用data.table

的选项

library(data.table)
setDT(df1)[, list(amount=sum(amount), id= id[1L]) ,.(year, recipient)]
#   year recipient amount id
#1: 1973        AG     35  7
#2: 1974        BE     42  9
#3: 1975        AG     45  7

或者“id”是否也应该是分组变量

setDT(df1)[, list(amount=sum(amount)), .(year, recipient, id)]

通过列匹配展平R数据框中的行

3 个答案: