Question

目前，我在列表中有多个数据帧，格式如下：

             datetime precip code
1 2015-04-15 00:00:00     NA    M
2 2015-04-15 01:00:00     NA    M
3 2015-04-15 02:00:00     NA    M
4 2015-04-15 03:00:00     NA    M
5 2015-04-15 04:00:00     NA    M
6 2015-04-15 05:00:00     NA    M

每个数据框都有不同的开始和结束日期，但我希望每个数据框都从2015-04-01 0:00:00开始到2015-11-30 23:59:59。我想为每个数据框中的datetime生成缺少日期的行，并使用precip填充NA列，以便我在每个列中都有nrow=5856的连续时间序列数据帧。

忽略code列。如果precip存在值，请勿更改它们，只需使用datetime

填充其他rows NAs

到目前为止，我的尝试产生错误：

library(dplyr)
dates <- seq.POSIXt(as.POSIXlt("2015-04-01 0:00:00"), as.POSIXlt("2015-11-30 23:59:59"), by="hour",tz="GMT")
ts <- format.POSIXct(dates,"%Y/%m/%d %H:%M")
df <- data.frame(datetime=ts)
dat=mylist
final_list <- lapply(dat, function(x) full_join(df,dat$precip))

Error in UseMethod("tbl_vars") : 
  no applicable method for 'tbl_vars' applied to an object of class "c('double', 'numeric')"

link to sample file in case it is needed

感谢您的建议。

Answer 1

正如vitor在上面指出的那样，你只能加入两个data.frames，而不是data.frame和vector。 dplyr也适用于POSIXct，但不是POSIXlt（Hadley有偏好），因此如果您将数据存储为实际时间，则可以更轻松地加入。

此外，在lapply内，您需要使用您创建的函数的变量（此处为x），或者您只是重复同样的事情。如果要加入data.frames，也不要对其进行子集化;你需要一个具有相同名称和数据类型的列。

总之，你需要这样的东西：

library(dplyr)

df$datetime <- as.POSIXct(df$datetime, tz = "GMT")
df <- tbl_df(df)    # not necessary, but prints nicely

list_df <- list(df, df)    # fake list of data.frames
# make a data.frame of sequence to join on
seq_df <- data_frame(datetime = seq.POSIXt(as.POSIXct("2015-04-01 0:00:00", tz = 'GMT'), 
                                           as.POSIXct("2015-11-30 23:59:59", tz = 'GMT'), 
                                           by="hour",tz="GMT"))

lapply(list_df, function(x){full_join(x, seq_df)})
# Joining by: "datetime"
# Joining by: "datetime"
# [[1]]
# Source: local data frame [5,857 x 3]
# 
#               datetime precip   code
#                 (POSI)  (lgl) (fctr)
# 1  2015-04-15 00:00:00     NA      M
# 2  2015-04-15 01:00:00     NA      M
# 3  2015-04-15 02:00:00     NA      M
# 4  2015-04-15 03:00:00     NA      M
# 5  2015-04-15 04:00:00     NA      M
# 6  2015-04-15 05:00:00     NA      M
# 7  2015-04-01 04:00:00     NA     NA
# 8  2015-04-01 05:00:00     NA     NA
# 9  2015-04-01 06:00:00     NA     NA
# 10 2015-04-01 07:00:00     NA     NA
# ..                 ...    ...    ...
# 
# [[2]]
# Source: local data frame [5,857 x 3]
# 
#               datetime precip   code
#                 (POSI)  (lgl) (fctr)
# 1  2015-04-15 00:00:00     NA      M
# 2  2015-04-15 01:00:00     NA      M
# 3  2015-04-15 02:00:00     NA      M
# 4  2015-04-15 03:00:00     NA      M
# 5  2015-04-15 04:00:00     NA      M
# 6  2015-04-15 05:00:00     NA      M
# 7  2015-04-01 04:00:00     NA     NA
# 8  2015-04-01 05:00:00     NA     NA
# 9  2015-04-01 06:00:00     NA     NA
# 10 2015-04-01 07:00:00     NA     NA
# ..                 ...    ...    ...

数据：

df <- structure(list(datetime = structure(c(1429056000, 1429059600, 1429063200, 1429066800, 
    1429070400, 1429074000), class = c("POSIXct", "POSIXt"), tzone = "GMT"), precip = c(NA, 
    NA, NA, NA, NA, NA), code = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "M", 
    class = "factor")), .Names = c("datetime", "precip", "code"), row.names = c("1", 
    "2", "3", "4", "5", "6"), class = c("tbl_df", "tbl", "data.frame"))

在列表R中插入并填充缺少日期的行

1 个答案:

数据：