我有一个如下所示的数据框:
master_bill_no category
SBA5100008 CONDOMS
SBA5100008 HAND CREAM
SBA5100009 PREGNANCY TESTS
SBA5100010 MULTI VITAMINS & MIN
SBA5100010 CALCIUM PREPARATIONS
SBA5100010 VITAMINS
SBA5100010 BETABLOCKERS
下面给出了一个可重复的例子:
structure(list(master_bill_no = c("SBA5100008", "SBA5100008",
"SBA5100009", "SBA5100010", "SBA5100010", "SBA5100010", "SBA5100010"
), category = c("CONDOMS", "HAND CREAM", "PREGNANCY TESTS", "MULTI VITAMINS & MIN",
"CALCIUM PREPARATIONS", "VITAMINS", "BETABLOCKERS")), .Names = c("master_bill_no",
"category"), class = "data.frame", row.names = c(NA, -7L))
对于每个唯一的主账单号,我正在尝试将列类别重新整理为宽类别。
例如,所需的输出为:
master_bill_no category
SBA5100008 CONDOMS,HAND CREAM
SBA5100009 PREGNANCY TESTS
SBA5100010 MULTI VITAMINS & MIN,CALCIUM PREPARATIONS,CALCIUM PREPARATIONS,BETABLOCKERS
我使用了基本重塑公式,它只删除了类别列。
reshape(df, idvar = "master_bill_no", timevar = "category", direction = "wide")
我尝试了聚合功能:
aggregate(df, master_bill_no, FUN = paste(category, sep = ","))
这将返回错误消息“找不到对象类别”
我确信这样做的原因是重塑正在寻找填补缺失的值。有人可以帮忙吗?
答案 0 :(得分:0)
imho - 最好使用聚合等基本功能: 正确的语法应该是:
aggregate(df$category, by=list(df$master_bill_no), FUN = paste)
( the field , list of 'group by' , the fun to operate on field )
>df
master_bill_no category
1 SBA5100008 CONDOMS
2 SBA5100008 HAND CREAM
3 SBA5100009 PREGNANCY TESTS
4 SBA5100010 MULTI VITAMINS & MIN
5 SBA5100010 CALCIUM PREPARATIONS
6 SBA5100010 VITAMINS
7 SBA5100010 BETABLOCKERS
> aggregate(df$category, by=list(df$master_bill_no), FUN = paste)
Group.1 x
1 SBA5100008 CONDOMS, HAND CREAM
2 SBA5100009 PREGNANCY TESTS
3 SBA5100010 MULTI VITAMINS & MIN, CALCIUM PREPARATIONS, VITAMINS, BETABLOCKERS