Question

可能我没有很好地定义问题。我似乎不明白R从sapply返回的是什么。我有一个大型的分层数据框架。大约一半的列是因子，一半是数字。我想获得一个包含一些因子的新数据帧，并对数字列求和，但我希望总和保持由因子水平分隔。

例如，从下面的示例数据中，我想建立一个状态，分区，分支相同的数据帧，但是对相同类型但具有不同颜色的订单的数据求和。我认为迭代使用sapply会做到这一点，但我似乎无法让它发挥作用。

示例数据：

state district branch   order   colour  number  cost    amount
CA   central newtown    shoes   black   6   25.50  127.40
CA   central newtown    shoes   brown   3   32.12   75.40
CA   central newtown    gloves  blue    15  12.20  157.42
CA   central newtown    gloves  black   9   8.70    65.37
CA  central columbus    shoes   black   12  30.75   316.99
CA  central columbus    shoes   brown   1   40.98    45.00
CA  central columbus    gloves  blue    47  11.78   498.32
CA  central columbus    gloves  black   23  7.60    135.50

Answer 1

aggregate的另一项工作。调用您的数据框dat：

aggregate(cbind(cost, amount) ~ state+district+branch+order, data=dat, FUN=sum)

##   state district   branch  order  cost amount
## 1    CA  central columbus gloves 19.38 633.82
## 2    CA  central  newtown gloves 20.90 222.79
## 3    CA  central columbus  shoes 71.73 361.99
## 4    CA  central  newtown  shoes 57.62 202.80

在〜的左侧，cbind用于表示我们需要单独列出每一列。如果指定了cost + amount，则表示此处的总和，因为它们是数字。在〜的右侧，我们有因子，所以+表示我们按每个因子的每个级别进行聚合。

Answer 2

我总是发现sql最直观的聚合：）

    library(sqldf)

    # write a full aggregation command, grouping by your specified columns
    x <- sqldf( "select state, district, branch, order, sum( cost ) as sumcost, sum(amount) as sumamount from yourdata group by state, district, branch, order" )

    # print your result
    x

这里an explanation of aggregate() and tapply()和explanation of sql within r for aggregation

相同

R-中的分层数据如何在保留树的同时对子集求和？

2 个答案: