我有以下代码来清理数据集。
data1 <- data1 %>%
mutate(YEAR = year(DATE),
MONTH = month(DATE),
DAY=day(DATE),
HOUR=hour(TIME),
MINUTE = minute(TIME),
RETURN= ((PRICE-lag(PRICE))/lag(PRICE))
) %>%
filter(HOUR >= 9, (HOUR <= 16 & MINUTE <=61)) %>%
group_by(MINUTE, HOUR, DAY, MONTH, YEAR) %>%
summarize(AV.PRICE = mean(PRICE, na.rm=TRUE),
SUM.SIZE=sum(SIZE, na.rm=TRUE),
RV=sum(RET^2)) %>%
arrange(YEAR, MONTH, DAY, HOUR, MINUTE) %>%
mutate(DATETIME = as.POSIXct(
paste(YEAR,"/",MONTH,"/", DAY, " ", HOUR,":", MINUTE,":00",sep=""),
format="%Y/%m/%d %H:%M:%S", origin = "1970-01-01")
)
但是,它有时会给我错误消息:Error: 'origin' must be supplied
奇怪的是,第一次在会话中运行此代码时,错误不会出现,但会在后续运行中出现。如果我重新启动会话,问题将消失一次,并在以后的运行中返回。因此,我必须始终重新启动才能使其正常工作。
我检查了问题:How to solve: "Error in as.POSIXct.numeric(X[[2L]], ...) : 'origin' must be supplied"这表明它可能是因为它正在从整数转换为时间。但是,glimpse
数据显示DATE是<date>
类而不是整数。
为了安全起见:我遵循了错误的建议,并在处理日期的所有函数中添加了origin =“1970-01-01”参数:
data1 <- data1 %>%
mutate(YEAR = year(DATE, origin = "1970-01-01"),
MONTH = month(DATE, origin = "1970-01-01"),
DAY=day(DATE, origin = "1970-01-01"),
HOUR=hour(TIME, origin = "1970-01-01"),
MINUTE = minute(TIME, origin = "1970-01-01"),
RET= ((PRICE-lag(PRICE))/lag(PRICE))
) %>%
filter(HOUR >= 9, (HOUR <= 16 & MINUTE <=61)) %>%
group_by(MINUTE,HOUR,DAY,MONTH,YEAR) %>%
summarize(AV.PRICE = mean(PRICE, na.rm=TRUE),
SUM.SIZE=sum(SIZE, na.rm=TRUE),
RV=sum(RET^2)
) %>%
arrange(YEAR, MONTH, DAY, HOUR, MINUTE) %>%
mutate(DATETIME = as.POSIXct(
paste(YEAR,"/",MONTH,"/", DAY, " ", HOUR,":", MINUTE,":00",sep=""),
format="%Y/%m/%d %H:%M:%S", origin = "1970-01-01")
)
然后返回Error: unused argument (origin = "1970-01-01")
如果有帮助,这里是我的数据集的一瞥:
Observations: 146,016,609
Variables: 4
$ DATE <date> 2008-01-02, 2008-01-02, 2008-01-02, 2008-01-02, 2008-01-02, 2008-01-02, 2008-01-02, ...
$ TIME <S4: Period> 9H 0M 4S, 9H 0M 4S, 9H 0M 4S, 9H 0M 4S, 9H 0M 4S, 9H 0M 4S, 9H 0M 4S, 9H 0M 4S...
$ PRICE <dbl> 146.86, 146.86, 146.86, 146.86, 146.86, 146.86, 146.86, 146.86, 146.86, 146.86, 146.8...
$ SIZE <int> 1000, 1000, 1000, 500, 2400, 1000, 1000, 1000, 2500, 1000, 1000, 400, 1000, 1000, 100...
我正在寻找使用基本包功能或最多使用lubridate / dplyr的答案。谢谢!
答案 0 :(得分:1)
或者随时使用anydate()
包:
R> anydate(20170314L) # integer
[1] "2017-03-14"
R> anydate(20170314) # numeric
[1] "2017-03-14"
R> anydate("20170314") # character
[1] "2017-03-14"
R> anydate(as.factor("20170314"))
[1] "2017-03-14"
R>
以及更多,包括猜测大多数(理智)日期(和anytime()
)格式的日期时间 - 并且不需要(通常是多余的)来源。
编辑:鉴于您的数据,您正在努力实现复杂化。试试这个:
最小的data.frame对象R> df <- data.frame(DATE=rep(as.Date("2008-01-02"),4), TIME=rep(period(c(9,0,4), c("hour", "minute", "second")), 4))
R> df
DATE TIME
1 2008-01-02 9H 0M 4S
2 2008-01-02 9H 0M 4S
3 2008-01-02 9H 0M 4S
4 2008-01-02 9H 0M 4S
R>
只需添加日期和时间
R> df$DATE + df$TIME
[1] "2008-01-02 09:00:04 UTC" "2008-01-02 09:00:04 UTC" "2008-01-02 09:00:04 UTC" "2008-01-02 09:00:04 UTC"
R> class(df$DATE + df$TIME)
[1] "POSIXlt" "POSIXt"
R> as.POSIXct(df$DATE + df$TIME)
[1] "2008-01-02 09:00:04 UTC" "2008-01-02 09:00:04 UTC" "2008-01-02 09:00:04 UTC" "2008-01-02 09:00:04 UTC"
R>
你有答案。
答案 1 :(得分:1)
使用Error: 'origin' must be supplied
包 - lubridate
函数时,我遇到了同样的错误hms()
。罪魁祸首是代码指的是来自hms()
包的函数hms
。因此,当我将其引用到lubridate::hms()
时,错误就消失了。
air_reserve <-
air_reserve %>%
mutate( Reserve.time = lubridate::hms(Reserve.time)
, Visit.time = lubridate::hms(Visit.time)
, Hours = lubridate::hour(Visit.time) )