对于给定的数据集,我想将我的数据集从长格式转换为宽格式。我使用了reshape函数来做到这一点。
id status timestamp
1 assigned 2017-01-02
1 done 2017-01-03
1 locked 2017-01-04
2 assigned 2017-01-02
2 done 2017-01-03
2 assigned 2017-01-03
2 done 2017-01-04
2 locked 2017-01-05
3 assigned 2017-01-02
3 done 2017-01-03
3 locked 2017-01-04
...
# reshape function to convert long format to Wide.
temp <- reshape(temp, idvar = "id", timevar = "status", direction = "wide")
结果:
id timestamp.assigned timestamp.done timestamp.locked
1 2017-01-02 2017-01-03 2017-01-04
2 2017-01-02 2017-01-03 2017-01-05
3 2017-01-02 2017-01-03 2017-01-04
当我这样做时它删除了一些行,例如:对于id 2,有多个行匹配status=assigned
,它占据第一行。
如何在不删除行的情况下转换为宽屏。基本上,我不想丢失任何数据。
预期结果:
id timestamp.assigned timestamp.done timestamp.locked
1 2017-01-02 2017-01-03 2017-01-04
2 2017-01-02 2017-01-03 2017-01-05
2 2017-01-03 2017-01-04 2017-01-05
3 2017-01-02 2017-01-03 2017-01-04
或
id timestamp.assigned timestamp.done timestamp.locked
1 2017-01-02 2017-01-03 2017-01-04
2 2017-01-02 2017-01-03 NA
2 2017-01-03 2017-01-04 2017-01-05
3 2017-01-02 2017-01-03 2017-01-04
答案 0 :(得分:0)
您可以做的一件事是添加一个为每个新作业赋予唯一值的变量。然后你可以使用它来塑造你的数据
i <- 0
temp$key <- sapply(temp$status, function(x) {
if(x == "assigned") {i <<- i+1; i}
else {i}
})
temp
id status timestamp key
1 1 assigned 2017-01-02 1
2 1 done 2017-01-03 1
3 1 locked 2017-01-04 1
4 2 assigned 2017-01-02 2
5 2 done 2017-01-03 2
6 2 assigned 2017-01-03 3
7 2 done 2017-01-04 3
8 2 locked 2017-01-05 3
9 3 assigned 2017-01-02 4
10 3 done 2017-01-03 4
11 3 locked 2017-01-04 4
temp2 <- reshape(temp, idvar = c("key", "id"), timevar = "status", direction = "wide")
temp2
id key timestamp.assigned timestamp.done timestamp.locked
1 1 1 2017-01-02 2017-01-03 2017-01-04
4 2 2 2017-01-02 2017-01-03 <NA>
6 2 3 2017-01-03 2017-01-04 2017-01-05
9 3 4 2017-01-02 2017-01-03 2017-01-04
答案 1 :(得分:0)
cumsum()
Esther's approach为每项新工作分配编号是要去的方式。
但是,R已经具有cumsum()
函数,可用于此目的:
temp$key <- cumsum(temp$status == "assigned")
reshape(temp, idvar = c("key", "id"), timevar = "status", direction = "wide")
id key timestamp.assigned timestamp.done timestamp.locked 1: 1 1 2017-01-02 2017-01-03 2017-01-04 2: 2 2 2017-01-02 2017-01-03 <NA> 3: 2 3 2017-01-03 2017-01-04 2017-01-05 4: 3 4 2017-01-02 2017-01-03 2017-01-04
cumsum()
尽管这解决了OP的原始问题,但key
仅对 all id
个中的 all 个分配编号。如果OP希望为每个id
分别分配编号,我们需要应用cumsum()
分组的id
。
一种实现此目的的方法是使用data.table
语法:
library(data.table)
setDT(temp)[, key := cumsum(status == "assigned"), by = id]
dcast(temp, id + key ~ status, value.var = "timestamp")
id key assigned done locked 1: 1 1 2017-01-02 2017-01-03 2017-01-04 2: 2 1 2017-01-02 2017-01-03 <NA> 3: 2 2 2017-01-03 2017-01-04 2017-01-05 4: 3 1 2017-01-02 2017-01-03 2017-01-04
dcast()
替代了基础R的reshape(..., direction = "wide")
函数,该函数可从reshape2
和data.table
包中获得。
cumsum()
data.table
的{{1}}的公式接口也接受表达式。这样,就不必在整形之前在 之前附加dcast()
列来修改temp
。取而代之的是,可以在整形时动态地
key
dcast(temp, id + ave(key <- status == "assigned", id, FUN = cumsum) ~ paste0("timestamp.", status))
id key timestamp.assigned timestamp.done timestamp.locked
1: 1 1 2017-01-02 2017-01-03 2017-01-04
2: 2 1 2017-01-02 2017-01-03 <NA>
3: 2 2 2017-01-03 2017-01-04 2017-01-05
4: 3 1 2017-01-02 2017-01-03 2017-01-04