我有以下数据集,我希望按end.station.id
求和并转换为317行和72列的矩阵
> sapply(df, class) $end.station.id [1] "integer" $stoptime [1] "POSIXct" "POSIXt" $interval [1] "POSIXct" "POSIXt" > dim(df) [1] 8256 3 > length(unique(df$end.station.id)) [1] 317 > length(unique(df$interval)) [1] 72 > head(df) end.station.id stoptime interval 14785 437 2014-08-18 21:08:36 2014-08-18 21:00:00 16980 406 2014-08-18 20:34:22 2014-08-18 20:30:00 20200 372 2014-08-18 22:53:33 2014-08-18 22:50:00 20935 2000 2014-08-18 22:43:18 2014-08-18 22:40:00 22610 499 2014-08-18 20:51:28 2014-08-18 20:50:00 22678 401 2014-08-18 20:05:54 2014-08-18 20:00:00
我无法使用dplyr
library(dplyr); library(tidyr); > matrix % + group_by(end.station.id, interval)%>% + summarise(sum = nrow) %>% + spread(end.station.id, nrow) Error: not a vector
我想过为每个区间分配一个唯一的整数,但由于它是POSIXct格式,当我尝试提取列interval
并按顺序排序时,数据会丢失(x,减去= FALSE)
最后,结果应该类似于这样的矩阵,尽管每个站点的每个区间的总和都是填充的。
> head(m) station_id 2014-08-18 20:00:00 2014-08-18 20:10:00 2014-08-18 20:20:00 1 302 0 0 0 2 487 0 0 0 3 218 0 0 0 4 465 0 0 0 5 160 0 0 0 6 291 0 0 0 2014-08-18 20:30:00 2014-08-18 20:40:00 2014-08-18 20:50:00 2014-08-18 21:00:00 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 2014-08-18 21:10:00 2014-08-18 21:20:00 2014-08-18 21:30:00 2014-08-18 21:40:00 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 2014-08-18 21:50:00 2014-08-18 22:00:00 2014-08-18 22:10:00 2014-08-18 22:20:00 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 2014-08-18 22:30:00 2014-08-18 22:40:00 2014-08-18 22:50:00 2014-08-18 23:00:00 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 2014-08-18 23:10:00 2014-08-18 23:20:00 2014-08-18 23:30:00 2014-08-18 23:40:00 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 2014-08-18 23:50:00 2014-08-19 00:00:00 2014-08-19 00:10:00 2014-08-19 00:20:00 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 2014-08-19 00:30:00 2014-08-19 00:40:00 2014-08-19 00:50:00 2014-08-19 01:00:00 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 2014-08-19 01:10:00 2014-08-19 01:20:00 2014-08-19 01:30:00 2014-08-19 01:40:00 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 2014-08-19 01:50:00 2014-08-19 02:00:00 2014-08-19 02:10:00 2014-08-19 02:20:00 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 2014-08-19 02:30:00 2014-08-19 02:40:00 2014-08-19 02:50:00 2014-08-19 03:00:00 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 2014-08-19 03:10:00 2014-08-19 03:20:00 2014-08-19 03:30:00 2014-08-19 03:40:00 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 2014-08-19 03:50:00 2014-08-19 04:00:00 2014-08-19 04:10:00 2014-08-19 04:20:00 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 2014-08-19 04:30:00 2014-08-19 04:40:00 2014-08-19 04:50:00 2014-08-19 05:00:00 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 2014-08-19 05:10:00 2014-08-19 05:20:00 2014-08-19 05:30:00 2014-08-19 05:40:00 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 2014-08-19 05:50:00 2014-08-19 06:00:00 2014-08-19 06:10:00 2014-08-19 06:20:00 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 2014-08-19 06:30:00 2014-08-19 06:40:00 2014-08-19 06:50:00 2014-08-19 07:00:00 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 2014-08-19 07:10:00 2014-08-19 07:20:00 2014-08-19 07:30:00 2014-08-19 07:40:00 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 2014-08-19 07:50:00 1 0 2 0 3 0 4 0 5 0 6 0
答案 0 :(得分:1)
将行summarize(sum = nrow)
更改为summarize(sum = n())
,将spread(end.station.id, nrow)
行改为spread(end.station.id, sum)
。
最后,如果您希望顶部的间隔为t()
,则转置结果。