R - 使用list()聚合数据帧

时间:2017-09-10 08:13:55

标签: r dataframe aggregate

我有这个数据框:

> head(DF, 10)
         DATE    USER    CATEGORY     QTY
1  2017-09-04     A79 Footwear       2167
2  2017-08-31     A41 Accessories     342
3  2017-08-27     A34 Accessories     828
4  2017-08-22     A68 Accessories    1292
5  2017-08-23     A68 Accessories    1297
6  2017-08-23     A68 Footwear       1944
7  2017-08-25     A68 Accessories      60
8  2017-08-25     A68 Footwear          5
9  2017-08-25     A68 Apparel        2454
10 2017-08-29     A68 Accessories    2521

我想要的是:

> head(DF1, 10)
         DATE    USER                               CATEGORIES   QTY_SUM
1  2017-09-04     A79 Footwear                                      2167
2  2017-08-31     A41 Accessories                                    342
3  2017-08-27     A34 Accessories                                    828
4  2017-08-22     A68 Accessories                                   1292
5  2017-08-23     A68 Accessories-1297, Footwear-1944               3241
6  2017-08-25     A68 Accessories-60, Footwear-5, Apparel-2454      2519
7  2017-08-29     A68 Accessories                                   2521

我尝试使用aggregate,效果不佳。我认为这可能与以下类似:

DF1 <- data.table(DF, key=c('DATE', 'USER_ID'))
DF1 <- DF1[, list(CATEGORIES=paste0(CATEGORY, "-", QTY), QTY=sum(QTY)), by=c('DATE', 'USER_ID')]
> head(x, 10) #getting this
         DATE    USER         CATEGORY     QTY
1  2017-09-04     A79 Footwear-2167       2167
2  2017-08-31     A41 Accessories-342      342
3  2017-08-27     A34 Accessories-828      828
4  2017-08-22     A68 Accessories-1292    1292
5  2017-08-23     A68 Accessories-1297    1297
6  2017-08-23     A68 Footwear-1944       1944
7  2017-08-25     A68 Accessories-60        60
8  2017-08-25     A68 Footwear-5             5
9  2017-08-25     A68 Apparel-2454        2454
10 2017-08-29     A68 Accessories         2521

我做错了什么?请建议是否有更好的方法来做到这一点。

1 个答案:

答案 0 :(得分:4)

使用dplyr,,您可以:

df <- read.table(text="
DATE    USER    CATEGORY     QTY
1  2017-09-04     A79 Footwear       2167
2  2017-08-31     A41 Accessories     342
3  2017-08-27     A34 Accessories     828
4  2017-08-22     A68 Accessories    1292
5  2017-08-23     A68 Accessories    1297
6  2017-08-23     A68 Footwear       1944
7  2017-08-25     A68 Accessories      60
8  2017-08-25     A68 Footwear          5
9  2017-08-25     A68 Apparel        2454
10 2017-08-29     A68 Accessories    2521")

library(dplyr)

我们首先group_by DATE和USER(我猜),然后您将CATEGORY中的每个项目粘贴一些装饰。最后,您取消了data.frame {tibble此处,但仍为data.frame):

df %>% 
  group_by(DATE, USER) %>% 
  summarise(CATEGORIES=paste(CATEGORY, QTY, sep="-", collapse=","),
            QTY_SUM=sum(QTY)) %>% 
  ungroup()

# A tibble: 7 x 4
DATE   USER                             CATEGORIES QTY_SUM
<fctr> <fctr>                                  <chr>   <int>
  1 2017-08-22    A68                       Accessories-1292    1292
2 2017-08-23    A68         Accessories-1297,Footwear-1944    3241
3 2017-08-25    A68 Accessories-60,Footwear-5,Apparel-2454    2519
4 2017-08-27    A34                        Accessories-828     828
5 2017-08-29    A68                       Accessories-2521    2521
6 2017-08-31    A41                        Accessories-342     342
7 2017-09-04    A79                          Footwear-2167    2167

这是你想要的吗?