使用dcast重塑我的数据

时间:2015-02-25 01:17:38

标签: r reshape2

我一直在尝试使用dcast()。我有这个例子:

class = c(rep("A1", 3), rep("B2", 5), rep("C3", 2), rep("D4", 4))
myvar = rnorm(14)
mydf = data.frame(class, myvar)

输出结果为:

> mydf
   class       myvar
1     A1 -0.27256423
2     A1  1.98435540
3     A1 -1.38193488
4     B2 -0.20843958
5     B2 -0.08651873
6     B2  1.34213192
7     B2  1.32848845
8     B2  2.26547847
9     C3 -0.60518721
10    C3  1.98786369
11    D4 -1.16306103
12    D4  1.09872582
13    D4  0.15150502
14    D4  0.49064154

我希望看起来像这样:

A1              B2           C3           D4
-0.27256423  -0.20843958   -0.60518721   -1.16306103
1.98435540   -0.08651873   1.98786369    1.09872582
-1.38193488  1.34213192                  0.15150502
             1.32848845                  0.49064154
             2.26547847

3 个答案:

答案 0 :(得分:2)

扩展我的评论,只需添加一个辅助ID(" class&#34中每个值的索引位置;)并将其用作dcast中公式的LHS。

library(splitstackshape)
set.seed(1) ## To make a reproducible example
class = c(rep("A1", 3), rep("B2", 5), rep("C3", 2), rep("D4", 4))
myvar = rnorm(14)
mydf = data.frame(class, myvar)
dcast.data.table(getanID(mydf, "class"), .id ~ class, value.var = "myvar")
#    .id         A1         B2         C3         D4
# 1:   1 -0.6264538  1.5952808  0.5757814  1.5117812
# 2:   2  0.1836433  0.3295078 -0.3053884  0.3898432
# 3:   3 -0.8356286 -0.8204684         NA -0.6212406
# 4:   4         NA  0.4874291         NA -2.2146999
# 5:   5         NA  0.7383247         NA         NA

答案 1 :(得分:1)

这是一种方法。使用spread(),我将数据放在更宽的格式中。我使用lapply()获取了每列中的所有完整案例。我想对@Richard Scriven的最后一步表示赞赏。这是我从他那里学到的东西。最后一步为每个向量添加NA。 max(vapply(foo, length, 1L))查找最大长度,即$B2为5。您创建长度为5的每个列表项。例如​​,$C3有两个元素。因此,您使用sapply()添加三个NAs。

library(tidyr)
library(magrittr)

spread(mydf, class, myvar) %>%
lapply(., function(x) x[complete.cases(x)]) -> foo
as.data.frame(sapply(foo, `length<-`, max(vapply(foo, length, 1L))))

#          A1          B2         C3         D4
#1 -0.2725642 -0.20843958 -0.6051872 -1.1630610
#2  1.9843554 -0.08651873  1.9878637  1.0987258
#3 -1.3819349  1.34213192         NA  0.1515050
#4         NA  1.32848845         NA  0.4906415
#5         NA  2.26547847         NA         NA

修改

看到@djas的评论,我做了以下事情。我认为这更好。

split(mydf, mydf$class) %>%
lapply(., function(x) x[,2]) -> foo
as.data.frame(sapply(foo, `length<-`, max(vapply(foo, length, 1L))))

以下是dplyrtidyr的另一个想法。

spread(mydf, class, myvar) %>%
mutate_each(funs(c(.[complete.cases(.)], .[!complete.cases(.)]))) %>%
filter(rowSums(., na.rm = TRUE) != 0)

DATA

mydf <- structure(list(class = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("A1", "B2", "C3", "D4"
), class = "factor"), myvar = c(-0.27256423, 1.9843554, -1.38193488, 
-0.20843958, -0.08651873, 1.34213192, 1.32848845, 2.26547847, 
-0.60518721, 1.98786369, -1.16306103, 1.09872582, 0.15150502, 
0.49064154)), .Names = c("class", "myvar"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
"11", "12", "13", "14"))

答案 2 :(得分:0)

在您的情况下,因为类具有不同的长度,您可以使用&#34; ...&#34;用NAs吐出完整的清单。如果NAs有问题,我会说@djas对split()的建议是你最好的选择。

library(reshape2)
class = c(rep("A1", 3), rep("B2", 5), rep("C3", 2), rep("D4", 4))
myvar = rnorm(14)
mydf = data.frame(class, myvar)
dcast(mydf,myvar~...)

         myvar         A1          B2          C3          D4
1  -2.66688596         NA          NA          NA -2.66688596
2  -1.65370213         NA -1.65370213          NA          NA
3  -1.53464694 -1.5346469          NA          NA          NA
4  -1.34557734         NA -1.34557734          NA          NA
5  -0.92107697         NA          NA          NA -0.92107697
6  -0.85066517 -0.8506652          NA          NA          NA
7  -0.23682480         NA -0.23682480          NA          NA
8  -0.02716902         NA          NA -0.02716902          NA
9   0.06063714         NA  0.06063714          NA          NA
10  0.07434025         NA          NA          NA  0.07434025
11  0.25034532         NA          NA          NA  0.25034532
12  0.70988347         NA  0.70988347          NA          NA
13  1.66455350         NA          NA  1.66455350          NA
14  2.61991105  2.6199110          NA          NA          NA