常规切片在包含“日期”列的数据框中无效吗?

时间:2018-11-19 03:54:26

标签: r

我创建一个数据框,其中Date列为:

yrmonday=as.Date(sapply(2000:2017,function(x) {
     seq(as.Date(paste0(as.character(x),'-01-01')),by='8 day',length=46)}),
     origin='1970-01-01')

df <- data.frame(date=yrmonday,
                 fid=rep(1:46,time=18),
                 dayorder=rep(seq(1,365,8),time=18),
                 value=runif(length(yrmonday))
                 )

此数据帧无法使用常规操作。

> tail(df)
Error in `[.default`(xj, i, , drop = FALSE) : subscript out of bounds
> df[1:100,]
Error in `[.default`(xj, i, , drop = FALSE) : subscript out of bounds
> head(df)
        date fid dayorder      value
1 2000-01-01   1        1 0.92817146
2 2000-01-09   2        9 0.59638497
3 2000-01-17   3       17 0.72256721
4 2000-01-25   4       25 0.04086397
5 2000-02-02   5       33 0.01346682
6 2000-02-10   6       41 0.57895922
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
broken data frame:....

我很困惑为什么会报告这样的错误?

1 个答案:

答案 0 :(得分:1)

与其将Date类强制转换为integer然后重新转换,更好的选择是将其创建为list并将do.callc一起使用< / p>

yrmonday <- do.call(`c`, lapply(2000:2017,function(x) 
     seq(as.Date(paste0(as.character(x),'-01-01')),by='8 day',length=46)) )
str(yrmonday)
#  Date[1:828], format: "2000-01-01" "2000-01-09" "2000-01-17" "2000-01-25" "2000-02-02" "2000-02-10" "2000-02-18" "2000-02-26" ...

dput(head(yrmonday))
structure(c(10957L, 10965L, 10973L, 10981L, 10989L, 10997L), class = "Date")

通过将以上内容用作data.frame中的列

df <- data.frame(date=yrmonday,fid=rep(1:46,time=18),
                 dayorder=rep(seq(1,365,8),time=18),
                 value=runif(length(yrmonday))
             )

tailhead正常工作

tail(df)
#          date fid dayorder     value
#823 2017-11-17  41      321 0.2477746
#824 2017-11-25  42      329 0.3980863
#825 2017-12-03  43      337 0.1112133
#826 2017-12-11  44      345 0.4216226
#827 2017-12-19  45      353 0.2391892
#828 2017-12-27  46      361 0.8505323


head(df)
#        date fid dayorder     value
#1 2000-01-01   1        1 0.3654198
#2 2000-01-09   2        9 0.4804265
#3 2000-01-17   3       17 0.6757607
#4 2000-01-25   4       25 0.7864473
#5 2000-02-02   5       33 0.8100581
#6 2000-02-10   6       41 0.0786775

该问题似乎与integer类的numeric中存储模式从sapplyDate的更改有关(来自OP的“ yrmonday”)< / p>

dput(head(yrmonday))
#structure(c(10957, 10965, 10973, 10981, 10989, 10997), class = "Date")

tbl_df中创建tidyverse的类似方法是

library(tidyverse)
map(2000:2017, ~ 
    seq(as.Date(paste0(.x, '-01-01')), by = '8 day', length = 46)) %>% 
     reduce(c) %>% 
  data_frame(date = ., fid = rep(1:46, time = 18),
        dayorder = rep(seq(1, 365, 8), time = 18), value = runif(length(.)))