如何将特定行转换为R中的列

时间:2014-11-27 05:31:42

标签: r dataframe reshape

我是R的新手,对我们大多数人来说,我的问题似乎很容易。我有这样的数据

> data.frame(table(dat),total)
   AGEintervals mytest.G_B_FLAG Freq total
1        (1,23]               0 5718  5912
2       (23,26]               0 5249  5579
3       (26,28]               0 3105  3314
4       (28,33]               0 6277  6693
5       (33,37]               0 4443  4682
6       (37,41]               0 4277  4514
7       (41,46]               0 4904  5169
8       (46,51]               0 4582  4812
9       (51,57]               0 4039  4236
10      (57,76]               0 3926  4031
11       (1,23]               1  194  5912
12      (23,26]               1  330  5579
13      (26,28]               1  209  3314
14      (28,33]               1  416  6693
15      (33,37]               1  239  4682
16      (37,41]               1  237  4514
17      (41,46]               1  265  5169
18      (46,51]               1  230  4812
19      (51,57]               1  197  4236
20      (57,76]               1  105  4031

您可能已经注意到年龄间隔开始重复11行。 我只需要获得10行0和0以及1'在不同的列中。喜欢这个

AGEintervals              1       0   total
1        (1,23]          194      5718  5912
2       (23,26]          330      5249  5579
3       (26,28]          209      3105  3314
4       (28,33]          416      6277  6693
5       (33,37]          239      4443  4682
6       (37,41]          237      4277  4514
7       (41,46]          265      4904  5169
8       (46,51]          230      4582  4812
9       (51,57]          197      4039  4236
10      (57,76]          105      3926  4031

非常感谢

1 个答案:

答案 0 :(得分:2)

这是一个简单的“长”到“广泛”的转换,很容易通过基础R的reshape来实现:

reshape(mydf, idvar = c("AGEintervals", "total"), 
        timevar = "mytest.G_B_FLAG", direction = "wide")
#    AGEintervals total Freq.0 Freq.1
# 1        (1,23]  5912   5718    194
# 2       (23,26]  5579   5249    330
# 3       (26,28]  3314   3105    209
# 4       (28,33]  6693   6277    416
# 5       (33,37]  4682   4443    239
# 6       (37,41]  4514   4277    237
# 7       (41,46]  5169   4904    265
# 8       (46,51]  4812   4582    230
# 9       (51,57]  4236   4039    197
# 10      (57,76]  4031   3926    105

其他替代方案包括:

reshape2

library(reshape2)
dcast(mydf, ... ~ mytest.G_B_FLAG, value.var='Freq')

tidyr

library(tidyr)
spread(df, mytest.G_B_FLAG, Freq)

更新

首先可以避免这个问题。

运行以下示例代码并比较每个阶段的输出:

## Create some sample data
set.seed(1)
dat <- data.frame(V1 = sample(letters[1:3], 20, TRUE),
                  V2 = sample(c(0, 1), 20, TRUE))

## View the output
dat

## Look what happens when we use `data.frame` on a `table`
data.frame(table(dat))

## Compare it with `as.data.frame.matrix`
as.data.frame.matrix(table(dat))

## The total can be added automatically with `addmargins`
as.data.frame.matrix(addmargins(table(dat), 2, sum))