错误:必须提供原点

时间:2017-03-14 20:07:18

标签: r datetime

我有以下代码来清理数据集。

data1 <- data1 %>% 
  mutate(YEAR = year(DATE), 
         MONTH = month(DATE), 
         DAY=day(DATE), 
         HOUR=hour(TIME), 
         MINUTE = minute(TIME), 
         RETURN= ((PRICE-lag(PRICE))/lag(PRICE))
  ) %>% 
  filter(HOUR >= 9, (HOUR <= 16 & MINUTE <=61)) %>%
  group_by(MINUTE, HOUR, DAY, MONTH, YEAR) %>% 
  summarize(AV.PRICE = mean(PRICE, na.rm=TRUE), 
            SUM.SIZE=sum(SIZE, na.rm=TRUE),
            RV=sum(RET^2)) %>%
  arrange(YEAR, MONTH, DAY, HOUR, MINUTE) %>%
  mutate(DATETIME = as.POSIXct(
    paste(YEAR,"/",MONTH,"/", DAY, " ", HOUR,":", MINUTE,":00",sep=""), 
    format="%Y/%m/%d %H:%M:%S", origin = "1970-01-01")
  )

但是,它有时会给我错误消息:Error: 'origin' must be supplied

奇怪的是,第一次在会话中运行此代码时,错误不会出现,但会在后续运行中出现。如果我重新启动会话,问题将消失一次,并在以后的运行中返回。因此,我必须始终重新启动才能使其正常工作。

我检查了问题:How to solve: "Error in as.POSIXct.numeric(X[[2L]], ...) : 'origin' must be supplied"这表明它可能是因为它正在从整数转换为时间。但是,glimpse数据显示DATE是<date>类而不是整数。

为了安全起见:我遵循了错误的建议,并在处理日期的所有函数中添加了origin =“1970-01-01”参数:

data1 <- data1 %>% 
  mutate(YEAR = year(DATE, origin = "1970-01-01"),
         MONTH = month(DATE, origin = "1970-01-01"), 
         DAY=day(DATE, origin = "1970-01-01"), 
         HOUR=hour(TIME, origin = "1970-01-01"),
         MINUTE = minute(TIME, origin = "1970-01-01"), 
         RET= ((PRICE-lag(PRICE))/lag(PRICE))
  ) %>% 
  filter(HOUR >= 9, (HOUR <= 16 & MINUTE <=61)) %>%
  group_by(MINUTE,HOUR,DAY,MONTH,YEAR) %>% 
  summarize(AV.PRICE = mean(PRICE, na.rm=TRUE), 
            SUM.SIZE=sum(SIZE, na.rm=TRUE),
            RV=sum(RET^2)
  ) %>% 
  arrange(YEAR, MONTH, DAY, HOUR, MINUTE) %>% 
  mutate(DATETIME = as.POSIXct(
    paste(YEAR,"/",MONTH,"/", DAY, " ", HOUR,":", MINUTE,":00",sep=""), 
    format="%Y/%m/%d %H:%M:%S", origin = "1970-01-01")
  )

然后返回Error: unused argument (origin = "1970-01-01")

如果有帮助,这里是我的数据集的一瞥:

Observations: 146,016,609
Variables: 4
$ DATE  <date> 2008-01-02, 2008-01-02, 2008-01-02, 2008-01-02, 2008-01-02, 2008-01-02, 2008-01-02, ...
$ TIME  <S4: Period> 9H 0M 4S, 9H 0M 4S, 9H 0M 4S, 9H 0M 4S, 9H 0M 4S, 9H 0M 4S, 9H 0M 4S, 9H 0M 4S...
$ PRICE <dbl> 146.86, 146.86, 146.86, 146.86, 146.86, 146.86, 146.86, 146.86, 146.86, 146.86, 146.8...
$ SIZE  <int> 1000, 1000, 1000, 500, 2400, 1000, 1000, 1000, 2500, 1000, 1000, 400, 1000, 1000, 100...

我正在寻找使用基本包功能或最多使用lubridate / dplyr的答案。谢谢!

2 个答案:

答案 0 :(得分:1)

或者随时使用anydate()包:

R> anydate(20170314L)  # integer
[1] "2017-03-14"
R> anydate(20170314)   # numeric
[1] "2017-03-14"
R> anydate("20170314") # character 
[1] "2017-03-14"
R> anydate(as.factor("20170314")) 
[1] "2017-03-14"
R> 

以及更多,包括猜测大多数(理智)日期(和anytime())格式的日期时间 - 并且不需要(通常是多余的)来源。

编辑:鉴于您的数据,您正在努力实现复杂化。试试这个:

最小的data.frame对象
R> df <- data.frame(DATE=rep(as.Date("2008-01-02"),4), TIME=rep(period(c(9,0,4), c("hour", "minute", "second")), 4))
R> df
        DATE     TIME
1 2008-01-02 9H 0M 4S
2 2008-01-02 9H 0M 4S
3 2008-01-02 9H 0M 4S
4 2008-01-02 9H 0M 4S
R>
只需添加日期和时间
R> df$DATE + df$TIME
[1] "2008-01-02 09:00:04 UTC" "2008-01-02 09:00:04 UTC" "2008-01-02 09:00:04 UTC" "2008-01-02 09:00:04 UTC"
R> class(df$DATE + df$TIME)
[1] "POSIXlt" "POSIXt" 
R> as.POSIXct(df$DATE + df$TIME)
[1] "2008-01-02 09:00:04 UTC" "2008-01-02 09:00:04 UTC" "2008-01-02 09:00:04 UTC" "2008-01-02 09:00:04 UTC"
R> 

你有答案。

答案 1 :(得分:1)

使用Error: 'origin' must be supplied包 - lubridate函数时,我遇到了同样的错误hms()。罪魁祸首是代码指的是来自hms()包的函数hms。因此,当我将其引用到lubridate::hms()时,错误就消失了。

air_reserve <- 
  air_reserve %>% 
  mutate( Reserve.time = lubridate::hms(Reserve.time)
          , Visit.time = lubridate::hms(Visit.time)
          , Hours = lubridate::hour(Visit.time) )