Question

我有以下数据（样本）有一些闰分钟（例如缺少6:32和6:33）。对于这些情况，count等于0但数据库只是没有报告它们并且跳过了分钟。

count   time
47  15/12/2014 06:30
3   15/12/2014 06:31
431 15/12/2014 06:34
320 15/12/2014 06:35
42  15/12/2014 06:36
13  15/12/2014 06:37
383 15/12/2014 06:38
160 15/12/2014 06:39

我尝试关注其他帖子（I，II，III），但他们使用的是xts package但不起作用。我尝试了自己的方法，但它也没有用：

sort.df <- df[order(df$time),]
time.min <- min(sort.df$time)
time.max <- max(sort.df$time)
all.dates <- seq(time.min, time.max, by="min") # I create a list of all the minutes. 
all.dates.frame <- data.frame(list(time=all.dates))
merged.data <- merge(all.dates.frame, sorted.data, all=T)

我得到的是所有与NA值重复的分钟。谁知道我做错了什么？任何帮助/想法都非常感激！

Answer 1

这个怎么样 - 它适用于小样本数据：

您的输入数据：

df <- read.table(header=T, text='count   time
47  "15/12/2014 06:30"
3   "15/12/2014 06:31"
431 "15/12/2014 06:34"
320 "15/12/2014 06:35"
42  "15/12/2014 06:36"
13  "15/12/2014 06:37"
383 "15/12/2014 06:38"
160 "15/12/2014 06:39"')

格式化＆＃34;时间＆＃34;柱：

df$time <- as.POSIXct(df$time, format = "%d/%m/%Y %H:%M")

使用所有分钟创建一个新的data.frame：

newdf <- data.frame(time = seq(min(df$time), max(df$time), by = "mins"))

然后将其与原始数据合并：

merge(newdf, df, by = "time", all.x = TRUE)
#                  time count
#1  2014-12-15 06:30:00    47
#2  2014-12-15 06:31:00     3
#3  2014-12-15 06:32:00    NA
#4  2014-12-15 06:33:00    NA
#5  2014-12-15 06:34:00   431
#6  2014-12-15 06:35:00   320
#7  2014-12-15 06:36:00    42
#8  2014-12-15 06:37:00    13
#9  2014-12-15 06:38:00   383
#10 2014-12-15 06:39:00   160

Answer 2

如果使用时间序列表示（如zoo或xts），则会自动执行大部分操作。在动物园的小插曲中有这样的例子，但在这里再次出现。 g是一个时间网格，在此基础上我们将零宽度系列与此类时间一起合并到z以获得结果：

# test data
Lines <- "count,time
47,15/12/2014 06:30
3,15/12/2014 06:31
431,15/12/2014 06:34
320,15/12/2014 06:35
42,15/12/2014 06:36
13,15/12/2014 06:37
383,15/12/2014 06:38
160,15/12/2014 06:39"

library(zoo)
df <- read.csv(text = Lines)

# convert to zoo
fmt <- "%d/%m/%Y %H:%M"
z <- read.zoo(df, index = 2, tz = "", format = fmt)

# create grid and merge 0-width series based on it with z
g <- seq(start(z), end(z), by = "min") # grid of times
merge(z, zoo(, g))

，并提供：

2014-12-15 06:30:00 2014-12-15 06:31:00 2014-12-15 06:32:00 2014-12-15 06:33:00 
                 47                   3                  NA                  NA 
2014-12-15 06:34:00 2014-12-15 06:35:00 2014-12-15 06:36:00 2014-12-15 06:37:00 
                431                 320                  42                  13 
2014-12-15 06:38:00 2014-12-15 06:39:00 
                383                 160

如果我们从输入文件而不是数据框df开始，那么我们可以将read.csv和read.zoo语句合并为read.zoo语句：< / p>

z <- read.zoo(text = Lines, header = TRUE, sep = ",", index = 2, tz = "", format =fmt)

Answer 3

现在可以在包padr中方便地实现。如果您的数据框是按照docendo（日期时间保存为POSIXct）准备的，那么这就是您所需要的：

library(padr)
pad(df)

请参阅vignette("padr")。

填充时间序列与缺少时间单位

3 个答案: