重塑具有重复行

时间:2016-04-10 08:24:52

标签: r reshape

我有一个如下所示的数据框:

master_bill_no  category
SBA5100008  CONDOMS
SBA5100008  HAND CREAM
SBA5100009  PREGNANCY TESTS
SBA5100010  MULTI VITAMINS & MIN
SBA5100010  CALCIUM PREPARATIONS
SBA5100010  VITAMINS
SBA5100010  BETABLOCKERS

下面给出了一个可重复的例子:

structure(list(master_bill_no = c("SBA5100008", "SBA5100008", 
"SBA5100009", "SBA5100010", "SBA5100010", "SBA5100010", "SBA5100010"
), category = c("CONDOMS", "HAND CREAM", "PREGNANCY TESTS", "MULTI VITAMINS & MIN", 
"CALCIUM PREPARATIONS", "VITAMINS", "BETABLOCKERS")), .Names = c("master_bill_no", 
"category"), class = "data.frame", row.names = c(NA, -7L))

对于每个唯一的主账单号,我正在尝试将列类别重新整理为宽类别。

例如,所需的输出为:

master_bill_no  category
SBA5100008  CONDOMS,HAND CREAM
SBA5100009  PREGNANCY TESTS
SBA5100010  MULTI VITAMINS & MIN,CALCIUM PREPARATIONS,CALCIUM PREPARATIONS,BETABLOCKERS

我使用了基本重塑公式,它只删除了类别列。

reshape(df, idvar = "master_bill_no", timevar = "category", direction = "wide")

我尝试了聚合功能:

aggregate(df, master_bill_no, FUN = paste(category, sep = ","))

这将返回错误消息“找不到对象类别”

我确信这样做的原因是重塑正在寻找填补缺失的值。有人可以帮忙吗?

1 个答案:

答案 0 :(得分:0)

imho - 最好使用聚合等基本功能: 正确的语法应该是:

aggregate(df$category, by=list(df$master_bill_no), FUN = paste)
         ( the field ,    list of 'group by'     , the fun to operate on field )

>df
  master_bill_no             category
1     SBA5100008              CONDOMS
2     SBA5100008           HAND CREAM
3     SBA5100009      PREGNANCY TESTS
4     SBA5100010 MULTI VITAMINS & MIN
5     SBA5100010 CALCIUM PREPARATIONS
6     SBA5100010             VITAMINS
7     SBA5100010         BETABLOCKERS


> aggregate(df$category, by=list(df$master_bill_no), FUN = paste)
     Group.1                                                                  x
1 SBA5100008                                                CONDOMS, HAND CREAM
2 SBA5100009                                                    PREGNANCY TESTS
3 SBA5100010 MULTI VITAMINS & MIN, CALCIUM PREPARATIONS, VITAMINS, BETABLOCKERS