基于因子列在数据帧上扩展列

时间:2015-10-28 17:25:01

标签: r tidyr

我有一个包含3列的数据框

select make, model
from car c
join `option` o on (c.id = o.car_id)
where name in ('alloys','cd player')
group by make, model having count(*) = 2
;

列滞后表示特定时间段内特定站点的延迟

    > str(lagdf)
'data.frame':   2208 obs. of  3 variables:
 $ time: POSIXct, format: "2015-10-27 00:00:00" "2015-10-27 00:15:00" "2015-10-27 00:30:00" "2015-10-27 00:45:00" ...
 $ site: Factor w/ 23 levels "2001","2002",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ lag : int  8 8 8 8 8 8 8 8 7 8 ...

我希望能够传播延迟,这样我就可以将每个特定网站的列作为列。

> head(lagdf,14)
                  time site lag
1  2015-10-27 00:00:00 2001   8
2  2015-10-27 00:15:00 2001   8
3  2015-10-27 00:30:00 2001   8
4  2015-10-27 00:45:00 2001   8
5  2015-10-27 01:00:00 2001   8
6  2015-10-27 01:15:00 2001   8
7  2015-10-27 01:30:00 2001   8
8  2015-10-27 01:45:00 2001   8
9  2015-10-27 02:00:00 2001   7
10 2015-10-27 02:15:00 2001   8
11 2015-10-27 02:30:00 2001   9
12 2015-10-27 02:45:00 2001   9
13 2015-10-27 03:00:00 2001   9
14 2015-10-27 03:15:00 2001   8

时间栏不会保留

使用tidyr没有帮助

1 个答案:

答案 0 :(得分:2)

您确实可以将tidyrdplyr结合使用:

library(tidyr)
library(dplyr)
lagdf %>% group_by(site) %>%
          select(-time) %>%
          mutate(row = paste0("lag",row_number())) %>%
          spread(row, lag)

Source: local data frame [1 x 15]

   site  lag1 lag10 lag11 lag12 lag13 lag14  lag2  lag3  lag4  lag5  lag6  lag7  lag8  lag9
  (int) (int) (int) (int) (int) (int) (int) (int) (int) (int) (int) (int) (int) (int) (int)
1  2001     8     8     9     9     9     8     8     8     8     8     8     8     8     7