将缺少的日期添加到数据框

时间:2014-01-08 22:43:02

标签: r date dataframe rows

我有一个如下所示的数据框:

    times                      values
1   2013-07-06 20:00:00        0.02
2   2013-07-07 20:00:00        0.03
3   2013-07-09 20:00:00        0.13
4   2013-07-10 20:00:00        0.12
5   2013-07-11 20:00:00        0.03
6   2013-07-14 20:00:00        0.06
7   2013-07-15 20:00:00        0.08
8   2013-07-16 20:00:00        0.07
9   2013-07-17 20:00:00        0.08

数据中缺少一些日期,我想插入它们并将前一天的值转移到这些新行中,即获取:

    times                      values
1   2013-07-06 20:00:00        0.02
2   2013-07-07 20:00:00        0.03
3   2013-07-08 20:00:00        0.03
4   2013-07-09 20:00:00        0.13
5   2013-07-10 20:00:00        0.12
6   2013-07-11 20:00:00        0.03
7   2013-07-12 20:00:00        0.03
8   2013-07-13 20:00:00        0.03
9   2013-07-14 20:00:00        0.06
10  2013-07-15 20:00:00        0.08
11  2013-07-16 20:00:00        0.07
12  2013-07-17 20:00:00        0.08
...

我一直在尝试使用所有日期的矢量:

dates <- as.Date(1:length(df),origin = df$times[1])

我被卡住了,如果没有一个我迷路的可怕循环,我找不到办法去做... 谢谢你的帮助

5 个答案:

答案 0 :(得分:5)

一些测试数据(我使用的是Date,你的似乎是一个不同的类型,但这不会影响算法):

data = data.frame(dates = as.Date(c("2011-12-15", "2011-12-17", "2011-12-19")), 
                  values = as.double(1:3))

# Generate **all** timestamps at which you want to have your result. 
# I use `seq`, but you may use any other method of generating those timestamps. 

alldates = seq(min(data$dates), max(data$dates), 1)

# Filter out timestamps that are already present in your `data.frame`:
# Construct a `data.frame` to append with missing values:
dates0 = alldates[!(alldates %in% data$dates)]
data0 = data.frame(dates = dates0, values = NA_real_)

# Append this `data.frame` and resort in time:
data = rbind(data, data0)
data = data[order(data$dates),]

# forward fill the values 
# I would recommend to move this code into a separate `ffill` function: 
# proved to be very useful in general):
current = NA_real_
data$values = sapply(data$values, function(x) { 
           current <<- ifelse(is.na(x), current, x); current })

答案 1 :(得分:4)

library(zoo)
g <- data.frame(dates=seq(min(data$dates),max(data$dates),1))
na.locf(merge(g,data,by="dates",all.x=TRUE))

或完全与动物园:

z <- read.zoo(data)
gz <- zoo(, seq(min(time(z)), max(time(z)), "day"))  # time grid in zoo
na.locf(merge(z, gz))

答案 2 :(得分:1)

使用tidyr的{​​{1}}和complete,假设fill列已经属于times类。

POSIXct

数据

library(tidyr)
df %>%
  complete(times = seq(min(times), max(times), by = 'day')) %>%
  fill(values)

# A tibble: 12 x 2
#   times               values
#   <dttm>               <dbl>
# 1 2013-07-06 20:00:00   0.02
# 2 2013-07-07 20:00:00   0.03
# 3 2013-07-08 20:00:00   0.03
# 4 2013-07-09 20:00:00   0.13
# 5 2013-07-10 20:00:00   0.12
# 6 2013-07-11 20:00:00   0.03
# 7 2013-07-12 20:00:00   0.03
# 8 2013-07-13 20:00:00   0.03
# 9 2013-07-14 20:00:00   0.06
#10 2013-07-15 20:00:00   0.08
#11 2013-07-16 20:00:00   0.07
#12 2013-07-17 20:00:00   0.08

答案 3 :(得分:0)

df2 <- data.frame(times=seq(min(df$times), max(df$times), by="day"))
df3 <- merge(x=df2, y=df, by="times", all.x=T)
idx <- which(is.na(df3$values))
for (id in idx) 
  df3$values[id] <- df3$values[id-1]
df3
#                  times values
# 1  2013-07-06 20:00:00   0.02
# 2  2013-07-07 20:00:00   0.03
# 3  2013-07-08 20:00:00   0.03
# 4  2013-07-09 20:00:00   0.13
# 5  2013-07-10 20:00:00   0.12
# 6  2013-07-11 20:00:00   0.03
# 7  2013-07-12 20:00:00   0.03
# 8  2013-07-13 20:00:00   0.03
# 9  2013-07-14 20:00:00   0.06
# 10 2013-07-15 20:00:00   0.08
# 11 2013-07-16 20:00:00   0.07
# 12 2013-07-17 20:00:00   0.08

答案 4 :(得分:0)

你可以试试这个:

   setkey(NADayWiseOrders, date)
   all_dates <- seq(from = as.Date("2013-01-01"), 
               to = as.Date("2013-01-07"), 
               by = "days")

  NADayWiseOrders[J(all_dates), roll=Inf]
     date orders  amount guests
  1: 2013-01-01     50 2272.55    149
  2: 2013-01-02      3   64.04      4
  3: 2013-01-03      3   64.04      4
  4: 2013-01-04      1   18.81      0
  5: 2013-01-05      2   77.62      0
  6: 2013-01-06      2   77.62      0
  7: 2013-01-07      2   35.82      2