如何在R Dataframe中增加时间序列粒度?

时间:2018-02-17 22:50:34

标签: r datetime

我有一个包含每小时天气信息的数据框。我想增加时间测量的粒度(5分钟间隔而不是60分钟间隔),同时将其他列数据复制到创建的新行中:

当前数据帧结构:

Date                Temperature Humidity
2015-01-01 00:00:00 25          0.67
2015-01-01 01:00:00 26          0.69

目标数据框架结构:

Date                Temperature Humidity 
2015-01-01 00:00:00 25          0.67
2015-01-01 00:05:00 25          0.67
2015-01-01 00:10:00 25          0.67
.
.
.
2015-01-01 00:55:00 25          0.67
2015-01-01 01:00:00 26          0.69
2015-01-01 01:05:00 26          0.69
2015-01-01 01:10:00 26          0.69
.
.
.

我尝试过的事情:

for(i in 1:nrow(df)) {


  five.minutes <- seq(df$date[i], length = 12, by = "5 mins")

  for(j in 1:length(five.minutes)) {

    df$date[i]<-rbind(five.minutes[j])

  }
}

错误我得到了:

  

as.POSIXct.numeric(value)出错:&#39; origin&#39;必须提供

2 个答案:

答案 0 :(得分:1)

一种可能的解决方案是使用fill中的tidyrright_join中的dplyr

方法是在数据帧的date/timemin之间创建max+55mins系列。左连接数据框和时间序列,它将为NATemperature提供Humidity所有所需的行。现在使用fill填充先前有效值的NA值。

# Data
df <- read.table(text = "Date                Temperature Humidity 
'2015-01-01 00:00:00' 25          0.67
'2015-01-01 01:00:00' 26          0.69
'2015-01-01 02:00:00' 28          0.69
'2015-01-01 03:00:00' 25          0.69", header = T, stringsAsFactors = F)

df$Date <- as.POSIXct(df$Date, format = "%Y-%m-%d %H:%M:%S")

# Create a dataframe with all possible date/time at intervale of 5 mins
Dates <- data.frame(Date = seq(min(df$Date), max(df$Date)+3540, by = 5*60))


result <- df %>%
  right_join(Dates, by="Date") %>%
  fill(Temperature, Humidity)

 result
#                  Date Temperature Humidity
#1  2015-01-01 00:00:00          25     0.67
#2  2015-01-01 00:05:00          25     0.67
#3  2015-01-01 00:10:00          25     0.67
#4  2015-01-01 00:15:00          25     0.67
#5  2015-01-01 00:20:00          25     0.67
#6  2015-01-01 00:25:00          25     0.67
#7  2015-01-01 00:30:00          25     0.67
#8  2015-01-01 00:35:00          25     0.67
#9  2015-01-01 00:40:00          25     0.67
#10 2015-01-01 00:45:00          25     0.67
#11 2015-01-01 00:50:00          25     0.67
#12 2015-01-01 00:55:00          25     0.67
#13 2015-01-01 01:00:00          26     0.69
#14 2015-01-01 01:05:00          26     0.69
#.....
#.....
#44 2015-01-01 03:35:00          25     0.69
#45 2015-01-01 03:40:00          25     0.69
#46 2015-01-01 03:45:00          25     0.69
#47 2015-01-01 03:50:00          25     0.69
#48 2015-01-01 03:55:00          25     0.69

答案 1 :(得分:0)

我认为这可能会:

df=tibble(DateTime=c("2015-01-01 00:00:00","2015-01-01 01:00:00"),Temperature=c(25,26),Humidity=c(.67,.69))
df$DateTime<-ymd_hms(df$DateTime)
DateTime=as.POSIXct((sapply(1:(nrow(df)-1),function(x) seq(from=df$DateTime[x],to=df$DateTime[x+1],by="5 min"))),
           origin="1970-01-01", tz="UTC")
Temperature=c(sapply(1:(nrow(df)-1),function(x) rep(df$Temperature[x],12)),df$Temperature[nrow(df)])
Humidity=c(sapply(1:(nrow(df)-1),function(x) rep(df$Humidity[x],12)),df$Humidity[nrow(df)])
tibble(as.character(DateTime),Temperature,Humidity)

<chr>                          <dbl>    <dbl>
 1 2015-01-01 00:00:00             25.0    0.670
 2 2015-01-01 00:05:00             25.0    0.670
 3 2015-01-01 00:10:00             25.0    0.670
 4 2015-01-01 00:15:00             25.0    0.670
 5 2015-01-01 00:20:00             25.0    0.670
 6 2015-01-01 00:25:00             25.0    0.670
 7 2015-01-01 00:30:00             25.0    0.670
 8 2015-01-01 00:35:00             25.0    0.670
 9 2015-01-01 00:40:00             25.0    0.670
10 2015-01-01 00:45:00             25.0    0.670
11 2015-01-01 00:50:00             25.0    0.670
12 2015-01-01 00:55:00             25.0    0.670
13 2015-01-01 01:00:00             26.0    0.690