所以,我知道很多人会认为这个问题已被问过很多次,但我认为它没有。基本上,我希望完全展平我的数据,这意味着我想为每个人提供单一记录。以下是我的数据的可重现示例:
id BusinessUnit var1 var2 var3
1 1 Risk & Compliance 8 7 7
2 1 Investments 7 8 7
3 1 Credit Cards 8 9 7
4 2 Investments 9 10 8
5 2 Credit Cards 9 10 8
6 3 Risk & Compliance 9 10 9
7 3 Credit Cards 10 9 10
8 3 Call Center 6 9 10
9 4 Investments 7 6 10
10 4 Call Centers 7 5 9
11 5 Risk & Compliance 10 7 9
12 6 Risk & Compliance 6 8 9
13 6 Credit Cards 5 10 6
我想最终得到的是这样的:
id BusinessUnit1 var1_1 var2_1 var3_1 BusinessUnit2 var1_2 var2_2 var3_2
1 1 Risk & Compliance 8 7 7 Investments 7 8 7
2 2 Investments 9 10 8 Credit Cards 9 10 8
BusinessUnit3 var1_3 var2_3 var3_3
1 Credit Cards 8 9 7
2 <NA> NA NA NA
我尝试过使用reshape2软件包的cast()函数,但它希望我汇总我不想做的数据。此外,我不想为每个业务部门单独记录,因为这只会让我回到我已有的状态。是否有不同的方法可以避免使用for循环?
答案 0 :(得分:6)
reshape
这比dcast
更好。只需添加&#34;时间&#34;变量第一:
mydf$time <- ave(rep(1, nrow(mydf)), mydf$id, FUN = seq_along)
reshape(mydf, idvar="id", direction = "wide")
# id BusinessUnit.1 var1.1 var2.1 var3.1 BusinessUnit.2 var1.2 var2.2 var3.2
# 1 1 Risk & Compliance 8 7 7 Investments 7 8 7
# 4 2 Investments 9 10 8 Credit Cards 9 10 8
# 6 3 Risk & Compliance 9 10 9 Credit Cards 10 9 10
# 9 4 Investments 7 6 10 Call Centers 7 5 9
# 11 5 Risk & Compliance 10 7 9 <NA> NA NA NA
# 12 6 Risk & Compliance 6 8 9 Credit Cards 5 10 6
# BusinessUnit.3 var1.3 var2.3 var3.3
# 1 Credit Cards 8 9 7
# 4 <NA> NA NA NA
# 6 Call Center 6 9 10
# 9 <NA> NA NA NA
# 11 <NA> NA NA NA
# 12 <NA> NA NA NA