我正在使用像这样的数据框:
idno 08:00 08:05 08:10 08:15 08:20 08:25
1 1 Domestic Domestic Domestic Domestic Domestic Domestic
2 2 Leisure Leisure Leisure Leisure Leisure Leisure
3 3 Eat Eat Eat Eat Eat Eat
4 4 Paid Paid Paid Paid Paid Paid
5 5 Sleep Sleep Sleep Sleep Sleep Sleep
6 6 Eat Eat Eat Missing Missing Missing
7 7 Sleep Sleep Sleep Sleep Sleep Sleep
8 8 Paid Paid Paid Paid Paid Paid
9 9 Sleep Sleep Sleep Sleep Sleep Sleep
10 10 Child Care Child Care Child Care Travel Travel Travel
我感兴趣的是总结这样的数据帧。
(输出想要的)
idno `Child Care` Domestic Eat Leisure Missing Paid Sleep Travel
* <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 6 0 0 0 0 0 0
2 2 0 0 0 6 0 0 0 0
3 3 0 0 6 0 0 0 0 0
4 4 0 0 0 0 0 6 0 0
5 5 0 0 0 0 0 0 6 0
6 6 0 0 3 0 3 0 0 0
7 7 0 0 0 0 0 0 6 0
8 8 0 0 0 0 0 6 0 0
9 9 0 0 0 0 0 0 6 0
10 10 3 0 0 0 0 0 0 3
我通常只做这件事:
melt(df, id.vars = 'idno') %>% count(idno, value) %>% spread(value, n, 0)
然而,我想知道是否有更直截了当的做法。我的问题是我正在使用一个非常大的数据库并使用melt
,然后count
然后spread
可能会有点慢。
是否有直接的方式count
每行的列(变量的分布),最好使用data.table
。
setDT(df)[,.N,by=] #
每行的by
列?
df = structure(list(idno = 1:10, `08:00` = c("Domestic", "Leisure",
"Eat", "Paid", "Sleep", "Eat", "Sleep", "Paid", "Sleep", "Child Care"
), `08:05` = c("Domestic", "Leisure", "Eat", "Paid", "Sleep",
"Eat", "Sleep", "Paid", "Sleep", "Child Care"), `08:10` = c("Domestic",
"Leisure", "Eat", "Paid", "Sleep", "Eat", "Sleep", "Paid", "Sleep",
"Child Care"), `08:15` = c("Domestic", "Leisure", "Eat", "Paid",
"Sleep", "Missing", "Sleep", "Paid", "Sleep", "Travel"), `08:20` = c("Domestic",
"Leisure", "Eat", "Paid", "Sleep", "Missing", "Sleep", "Paid",
"Sleep", "Travel"), `08:25` = c("Domestic", "Leisure", "Eat",
"Paid", "Sleep", "Missing", "Sleep", "Paid", "Sleep", "Travel"
)), .Names = c("idno", "08:00", "08:05", "08:10", "08:15", "08:20",
"08:25"), row.names = c(NA, 10L), class = "data.frame")
答案 0 :(得分:4)
您可以在mtabulate
,
qdapTools
library(qdapTools)
mtabulate(split(df[-1], seq(nrow(df))))
# Child Care Domestic Eat Leisure Missing Paid Sleep Travel
#1 0 6 0 0 0 0 0 0
#2 0 0 0 6 0 0 0 0
#3 0 0 6 0 0 0 0 0
#4 0 0 0 0 0 6 0 0
#5 0 0 0 0 0 0 6 0
#6 0 0 3 0 3 0 0 0
#7 0 0 0 0 0 0 6 0
#8 0 0 0 0 0 6 0 0
#9 0 0 0 0 0 0 6 0
#10 3 0 0 0 0 0 0 3