我创建了一个数据集来说明我遇到的问题。
我的数据看起来像这样
id time act
1 1 time1 a
2 1 time2 a
3 1 time3 a
4 1 time101 a
5 1 time103 a
6 1 time1001 b
7 1 time1003 b
9 1 time10000 b
10 1 time100010 c
我想要的是spread
time
数据的正确顺序,如下所示:
id 1 2 3 101 103 1001 1003 1004 10000 100010
1 a a a a a b b b b c
这是我不完全理解的。当我spread
我的数据时,我会得到类似
library(dplyr)
library(tidyr)
dt %>% spread(time, act)
id time1 time10000 time100010 time1001 time1003 time1004 time101 time103 time2 time3
1 1 a b c b b b a a a a
所以R
似乎认识到了一些数字顺序,但认为time10000
优先于2
或3
。
为什么会这样?我可以解决这个问题。
我想要的是:
id time1 time2 time3 time101 time103 time1001 time1003 time1004 time10000 time100010
1 1 a a a a a b b b b c
数据
dt = structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
time = structure(c(1L, 9L, 10L, 7L, 8L, 4L, 5L, 6L, 2L, 3L
), .Label = c("time1", "time10000", "time100010", "time1001",
"time1003", "time1004", "time101", "time103", "time2", "time3"
), class = "factor"), act = structure(c(1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 3L), .Label = c("a", "b", "c"), class = "factor")), .Names = c("id",
"time", "act"), class = "data.frame", row.names = c(NA, -10L))
答案 0 :(得分:4)
重新排列您的因子水平:
> dt$time<-factor(dt$time, as.character(dt$time))
> dt %>% spread(time, act)
id time1 time2 time3 time101 time103 time1001 time1003 time1004 time10000
1 1 a a a a a b b b b
time100010
1 c