我一直在尝试使用dcast()。我有这个例子:
class = c(rep("A1", 3), rep("B2", 5), rep("C3", 2), rep("D4", 4))
myvar = rnorm(14)
mydf = data.frame(class, myvar)
输出结果为:
> mydf
class myvar
1 A1 -0.27256423
2 A1 1.98435540
3 A1 -1.38193488
4 B2 -0.20843958
5 B2 -0.08651873
6 B2 1.34213192
7 B2 1.32848845
8 B2 2.26547847
9 C3 -0.60518721
10 C3 1.98786369
11 D4 -1.16306103
12 D4 1.09872582
13 D4 0.15150502
14 D4 0.49064154
我希望看起来像这样:
A1 B2 C3 D4
-0.27256423 -0.20843958 -0.60518721 -1.16306103
1.98435540 -0.08651873 1.98786369 1.09872582
-1.38193488 1.34213192 0.15150502
1.32848845 0.49064154
2.26547847
答案 0 :(得分:2)
扩展我的评论,只需添加一个辅助ID(" class&#34中每个值的索引位置;)并将其用作dcast
中公式的LHS。
library(splitstackshape)
set.seed(1) ## To make a reproducible example
class = c(rep("A1", 3), rep("B2", 5), rep("C3", 2), rep("D4", 4))
myvar = rnorm(14)
mydf = data.frame(class, myvar)
dcast.data.table(getanID(mydf, "class"), .id ~ class, value.var = "myvar")
# .id A1 B2 C3 D4
# 1: 1 -0.6264538 1.5952808 0.5757814 1.5117812
# 2: 2 0.1836433 0.3295078 -0.3053884 0.3898432
# 3: 3 -0.8356286 -0.8204684 NA -0.6212406
# 4: 4 NA 0.4874291 NA -2.2146999
# 5: 5 NA 0.7383247 NA NA
答案 1 :(得分:1)
这是一种方法。使用spread()
,我将数据放在更宽的格式中。我使用lapply()
获取了每列中的所有完整案例。我想对@Richard Scriven的最后一步表示赞赏。这是我从他那里学到的东西。最后一步为每个向量添加NA。 max(vapply(foo, length, 1L))
查找最大长度,即$B2
为5。您创建长度为5的每个列表项。例如,$C3
有两个元素。因此,您使用sapply()
添加三个NAs。
library(tidyr)
library(magrittr)
spread(mydf, class, myvar) %>%
lapply(., function(x) x[complete.cases(x)]) -> foo
as.data.frame(sapply(foo, `length<-`, max(vapply(foo, length, 1L))))
# A1 B2 C3 D4
#1 -0.2725642 -0.20843958 -0.6051872 -1.1630610
#2 1.9843554 -0.08651873 1.9878637 1.0987258
#3 -1.3819349 1.34213192 NA 0.1515050
#4 NA 1.32848845 NA 0.4906415
#5 NA 2.26547847 NA NA
修改
看到@djas的评论,我做了以下事情。我认为这更好。
split(mydf, mydf$class) %>%
lapply(., function(x) x[,2]) -> foo
as.data.frame(sapply(foo, `length<-`, max(vapply(foo, length, 1L))))
以下是dplyr
和tidyr
的另一个想法。
spread(mydf, class, myvar) %>%
mutate_each(funs(c(.[complete.cases(.)], .[!complete.cases(.)]))) %>%
filter(rowSums(., na.rm = TRUE) != 0)
DATA
mydf <- structure(list(class = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("A1", "B2", "C3", "D4"
), class = "factor"), myvar = c(-0.27256423, 1.9843554, -1.38193488,
-0.20843958, -0.08651873, 1.34213192, 1.32848845, 2.26547847,
-0.60518721, 1.98786369, -1.16306103, 1.09872582, 0.15150502,
0.49064154)), .Names = c("class", "myvar"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
"11", "12", "13", "14"))
答案 2 :(得分:0)
在您的情况下,因为类具有不同的长度,您可以使用&#34; ...&#34;用NAs吐出完整的清单。如果NAs有问题,我会说@djas对split()
的建议是你最好的选择。
library(reshape2)
class = c(rep("A1", 3), rep("B2", 5), rep("C3", 2), rep("D4", 4))
myvar = rnorm(14)
mydf = data.frame(class, myvar)
dcast(mydf,myvar~...)
myvar A1 B2 C3 D4
1 -2.66688596 NA NA NA -2.66688596
2 -1.65370213 NA -1.65370213 NA NA
3 -1.53464694 -1.5346469 NA NA NA
4 -1.34557734 NA -1.34557734 NA NA
5 -0.92107697 NA NA NA -0.92107697
6 -0.85066517 -0.8506652 NA NA NA
7 -0.23682480 NA -0.23682480 NA NA
8 -0.02716902 NA NA -0.02716902 NA
9 0.06063714 NA 0.06063714 NA NA
10 0.07434025 NA NA NA 0.07434025
11 0.25034532 NA NA NA 0.25034532
12 0.70988347 NA 0.70988347 NA NA
13 1.66455350 NA NA 1.66455350 NA
14 2.61991105 2.6199110 NA NA NA