目前,我在列表中有多个数据帧,格式如下:
datetime precip code
1 2015-04-15 00:00:00 NA M
2 2015-04-15 01:00:00 NA M
3 2015-04-15 02:00:00 NA M
4 2015-04-15 03:00:00 NA M
5 2015-04-15 04:00:00 NA M
6 2015-04-15 05:00:00 NA M
每个数据框都有不同的开始和结束日期,但我希望每个数据框都从2015-04-01 0:00:00
开始到2015-11-30 23:59:59
。我想为每个数据框中的datetime
生成缺少日期的行,并使用precip
填充NA
列,以便我在每个列中都有nrow=5856
的连续时间序列数据帧。
忽略code
列。如果precip
存在值,请勿更改它们,只需使用datetime
rows
NAs
到目前为止,我的尝试产生错误:
library(dplyr)
dates <- seq.POSIXt(as.POSIXlt("2015-04-01 0:00:00"), as.POSIXlt("2015-11-30 23:59:59"), by="hour",tz="GMT")
ts <- format.POSIXct(dates,"%Y/%m/%d %H:%M")
df <- data.frame(datetime=ts)
dat=mylist
final_list <- lapply(dat, function(x) full_join(df,dat$precip))
Error in UseMethod("tbl_vars") :
no applicable method for 'tbl_vars' applied to an object of class "c('double', 'numeric')"
link to sample file in case it is needed
感谢您的建议。
答案 0 :(得分:1)
正如vitor在上面指出的那样,你只能加入两个data.frames,而不是data.frame和vector。 dplyr
也适用于POSIXct
,但不是POSIXlt
(Hadley有偏好),因此如果您将数据存储为实际时间,则可以更轻松地加入。
此外,在lapply
内,您需要使用您创建的函数的变量(此处为x
),或者您只是重复同样的事情。如果要加入data.frames,也不要对其进行子集化;你需要一个具有相同名称和数据类型的列。
总之,你需要这样的东西:
library(dplyr)
df$datetime <- as.POSIXct(df$datetime, tz = "GMT")
df <- tbl_df(df) # not necessary, but prints nicely
list_df <- list(df, df) # fake list of data.frames
# make a data.frame of sequence to join on
seq_df <- data_frame(datetime = seq.POSIXt(as.POSIXct("2015-04-01 0:00:00", tz = 'GMT'),
as.POSIXct("2015-11-30 23:59:59", tz = 'GMT'),
by="hour",tz="GMT"))
lapply(list_df, function(x){full_join(x, seq_df)})
# Joining by: "datetime"
# Joining by: "datetime"
# [[1]]
# Source: local data frame [5,857 x 3]
#
# datetime precip code
# (POSI) (lgl) (fctr)
# 1 2015-04-15 00:00:00 NA M
# 2 2015-04-15 01:00:00 NA M
# 3 2015-04-15 02:00:00 NA M
# 4 2015-04-15 03:00:00 NA M
# 5 2015-04-15 04:00:00 NA M
# 6 2015-04-15 05:00:00 NA M
# 7 2015-04-01 04:00:00 NA NA
# 8 2015-04-01 05:00:00 NA NA
# 9 2015-04-01 06:00:00 NA NA
# 10 2015-04-01 07:00:00 NA NA
# .. ... ... ...
#
# [[2]]
# Source: local data frame [5,857 x 3]
#
# datetime precip code
# (POSI) (lgl) (fctr)
# 1 2015-04-15 00:00:00 NA M
# 2 2015-04-15 01:00:00 NA M
# 3 2015-04-15 02:00:00 NA M
# 4 2015-04-15 03:00:00 NA M
# 5 2015-04-15 04:00:00 NA M
# 6 2015-04-15 05:00:00 NA M
# 7 2015-04-01 04:00:00 NA NA
# 8 2015-04-01 05:00:00 NA NA
# 9 2015-04-01 06:00:00 NA NA
# 10 2015-04-01 07:00:00 NA NA
# .. ... ... ...
df <- structure(list(datetime = structure(c(1429056000, 1429059600, 1429063200, 1429066800,
1429070400, 1429074000), class = c("POSIXct", "POSIXt"), tzone = "GMT"), precip = c(NA,
NA, NA, NA, NA, NA), code = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "M",
class = "factor")), .Names = c("datetime", "precip", "code"), row.names = c("1",
"2", "3", "4", "5", "6"), class = c("tbl_df", "tbl", "data.frame"))