我是R的新手,对我们大多数人来说,我的问题似乎很容易。我有这样的数据
> data.frame(table(dat),total)
AGEintervals mytest.G_B_FLAG Freq total
1 (1,23] 0 5718 5912
2 (23,26] 0 5249 5579
3 (26,28] 0 3105 3314
4 (28,33] 0 6277 6693
5 (33,37] 0 4443 4682
6 (37,41] 0 4277 4514
7 (41,46] 0 4904 5169
8 (46,51] 0 4582 4812
9 (51,57] 0 4039 4236
10 (57,76] 0 3926 4031
11 (1,23] 1 194 5912
12 (23,26] 1 330 5579
13 (26,28] 1 209 3314
14 (28,33] 1 416 6693
15 (33,37] 1 239 4682
16 (37,41] 1 237 4514
17 (41,46] 1 265 5169
18 (46,51] 1 230 4812
19 (51,57] 1 197 4236
20 (57,76] 1 105 4031
您可能已经注意到年龄间隔开始重复11行。 我只需要获得10行0和0以及1'在不同的列中。喜欢这个
AGEintervals 1 0 total
1 (1,23] 194 5718 5912
2 (23,26] 330 5249 5579
3 (26,28] 209 3105 3314
4 (28,33] 416 6277 6693
5 (33,37] 239 4443 4682
6 (37,41] 237 4277 4514
7 (41,46] 265 4904 5169
8 (46,51] 230 4582 4812
9 (51,57] 197 4039 4236
10 (57,76] 105 3926 4031
非常感谢
答案 0 :(得分:2)
这是一个简单的“长”到“广泛”的转换,很容易通过基础R的reshape
来实现:
reshape(mydf, idvar = c("AGEintervals", "total"),
timevar = "mytest.G_B_FLAG", direction = "wide")
# AGEintervals total Freq.0 Freq.1
# 1 (1,23] 5912 5718 194
# 2 (23,26] 5579 5249 330
# 3 (26,28] 3314 3105 209
# 4 (28,33] 6693 6277 416
# 5 (33,37] 4682 4443 239
# 6 (37,41] 4514 4277 237
# 7 (41,46] 5169 4904 265
# 8 (46,51] 4812 4582 230
# 9 (51,57] 4236 4039 197
# 10 (57,76] 4031 3926 105
其他替代方案包括:
reshape2
library(reshape2)
dcast(mydf, ... ~ mytest.G_B_FLAG, value.var='Freq')
tidyr
library(tidyr)
spread(df, mytest.G_B_FLAG, Freq)
首先可以避免这个问题。
运行以下示例代码并比较每个阶段的输出:
## Create some sample data
set.seed(1)
dat <- data.frame(V1 = sample(letters[1:3], 20, TRUE),
V2 = sample(c(0, 1), 20, TRUE))
## View the output
dat
## Look what happens when we use `data.frame` on a `table`
data.frame(table(dat))
## Compare it with `as.data.frame.matrix`
as.data.frame.matrix(table(dat))
## The total can be added automatically with `addmargins`
as.data.frame.matrix(addmargins(table(dat), 2, sum))