在dplyr中取消列出日期列表的列

时间:2018-04-26 12:32:10

标签: r dplyr tidyr

我一直在使用tidyr中的unnest()函数来查找包含日期列表的列。

x <- seq(from= as.POSIXct('2011-01-01 14:00:00'),length.out=100,by = "hour")

y <- seq(from= as.POSIXct('2012-01-01 14:00:00'),length.out=100,by = "hour")
df <- data.frame(x,y)

当我尝试为每一行创建一个列表,然后将其删除。我收到以下错误。

df %>% rowwise() %>% mutate(sequence = list(seq.POSIXt(x,y,"10 min"))) %>% unnest(sequence)
  

错误:每列必须是向量列表或数据帧列表[sequence]

其他人可以帮忙吗?我用数字完成了这个,并且不需要的功能正常工作。但是,它似乎不适用于包含日期/日期时间的列表。

3 个答案:

答案 0 :(得分:1)

seq.POSIXt()的结果强制转换为数据框并列出该列表......

x <- seq(from= as.POSIXct('2011-01-01 14:00:00'),length.out=100,by = "hour")
y <- seq(from= as.POSIXct('2012-01-01 14:00:00'),length.out=100,by = "hour")
df <- data.frame(x,y)

library(dplyr)
library(tidyr)

df %>% 
  rowwise() %>% 
  mutate(sequence = list(data.frame(seq.POSIXt(x, y, "10 min")))) %>% 
  unnest(sequence)

# # A tibble: 5,256,100 x 3
#    x                   y                   seq.POSIXt.x..y...10.min..
#    <dttm>              <dttm>              <dttm>                    
#  1 2011-01-01 14:00:00 2012-01-01 14:00:00 2011-01-01 14:00:00       
#  2 2011-01-01 14:00:00 2012-01-01 14:00:00 2011-01-01 14:10:00       
#  3 2011-01-01 14:00:00 2012-01-01 14:00:00 2011-01-01 14:20:00       
#  4 2011-01-01 14:00:00 2012-01-01 14:00:00 2011-01-01 14:30:00       
#  5 2011-01-01 14:00:00 2012-01-01 14:00:00 2011-01-01 14:40:00       
#  6 2011-01-01 14:00:00 2012-01-01 14:00:00 2011-01-01 14:50:00       
#  7 2011-01-01 14:00:00 2012-01-01 14:00:00 2011-01-01 15:00:00       
#  8 2011-01-01 14:00:00 2012-01-01 14:00:00 2011-01-01 15:10:00       
#  9 2011-01-01 14:00:00 2012-01-01 14:00:00 2011-01-01 15:20:00       
# 10 2011-01-01 14:00:00 2012-01-01 14:00:00 2011-01-01 15:30:00       
# # ... with 5,256,090 more rows

答案 1 :(得分:0)

如果我没有记错,data.frame并不支持列表列。 尝试将df <- data.frame(x,y)替换为df <- tibble::tibble(x, y)


library(dplyr)
library(tidyr)
x <- seq(from= as.POSIXct('2011-01-01 14:00:00'),length.out=100,by = "hour")

y <- seq(from= as.POSIXct('2012-01-01 14:00:00'),length.out=100,by = "hour")
df <- tibble::tibble(x,y)


df %>% rowwise() %>% mutate(sequence = list(seq.POSIXt(x,y,"10 min"))) %>% unnest(sequence)
#> # A tibble: 5,256,100 x 3
#>    x                   y                   sequence           
#>    <dttm>              <dttm>              <dttm>             
#>  1 2011-01-01 14:00:00 2012-01-01 14:00:00 2011-01-01 14:00:00
#>  2 2011-01-01 14:00:00 2012-01-01 14:00:00 2011-01-01 14:10:00
#>  3 2011-01-01 14:00:00 2012-01-01 14:00:00 2011-01-01 14:20:00
#>  4 2011-01-01 14:00:00 2012-01-01 14:00:00 2011-01-01 14:30:00
#>  5 2011-01-01 14:00:00 2012-01-01 14:00:00 2011-01-01 14:40:00
#>  6 2011-01-01 14:00:00 2012-01-01 14:00:00 2011-01-01 14:50:00
#>  7 2011-01-01 14:00:00 2012-01-01 14:00:00 2011-01-01 15:00:00
#>  8 2011-01-01 14:00:00 2012-01-01 14:00:00 2011-01-01 15:10:00
#>  9 2011-01-01 14:00:00 2012-01-01 14:00:00 2011-01-01 15:20:00
#> 10 2011-01-01 14:00:00 2012-01-01 14:00:00 2011-01-01 15:30:00
#> # ... with 5,256,090 more rows

答案 2 :(得分:0)

我无法重现错误,但认为替代方法可能会有所帮助。

library(dplyr)
library(tidyr)

df %>% 
  rowwise() %>% 
  mutate(sequence = paste(seq.POSIXt(x, y, "10 min"), collapse=",")) %>%
  ungroup() %>%
  separate_rows(sequence, sep=",") %>%
  mutate(sequence = as.POSIXct(sequence))

OR

如果您想使用unnest,那么

df %>% 
  rowwise() %>% 
  mutate(sequence = list(seq.POSIXt(x, y, "10 min"))) %>% 
  ungroup() %>%
  unnest(sequence)

输出为:

   x                   y                   sequence           
   <dttm>              <dttm>              <dttm>             
 1 2011-01-01 14:00:00 2011-01-02 14:00:00 2011-01-01 14:00:00
 2 2011-01-01 14:00:00 2011-01-02 14:00:00 2011-01-01 14:10:00
 3 2011-01-01 14:00:00 2011-01-02 14:00:00 2011-01-01 14:20:00
 4 2011-01-01 14:00:00 2011-01-02 14:00:00 2011-01-01 14:30:00
 5 2011-01-01 14:00:00 2011-01-02 14:00:00 2011-01-01 14:40:00
...

示例数据:

df <- structure(list(x = structure(c(1293870600L, 1293874200L, 1293877800L, 
1293881400L, 1293885000L, 1293888600L, 1293892200L, 1293895800L, 
1293899400L, 1293903000L), class = c("POSIXct", "POSIXt"), tzone = ""), 
    y = structure(c(1293957000L, 1293960600L, 1293964200L, 1293967800L, 
    1293971400L, 1293975000L, 1293978600L, 1293982200L, 1293985800L, 
    1293989400L), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("x", 
"y"), row.names = c(NA, -10L), class = "data.frame")