合并不规则时间序列数据集

时间:2017-02-14 18:41:48

标签: r

我正在尝试合并多个数据集。但是,每个都有不规则的每小时时间戳。我的目标是在数据处于相同的小时间隔时合并数据,并填充常规时间序列时间表。作为示例,您可以看到两个数据集:

 x <- structure(list(Date = structure(1:5, .Label = c("09.09.2011 21:54", 
"09.09.2011 22:59", "09.10.2011 00:04", "09.10.2011 01:09", "09.10.2011 02:14"
), class = "factor"), hexane = c(0, 0, 0, 0, 0), benzene = structure(c(1L, 
2L, 4L, 3L, 5L), .Label = c("0", "4.4", "4.7", "6.3", "7.7"), class = "factor"), 
    toluene = c(2.2, 2.6, 3.5, 2.7, 3.1)), .Names = c("Date", 
"hexane", "benzene", "toluene"), row.names = c(NA, 5L), class = "data.frame")
> 

    y <- structure(list(Date = structure(1:5, .Label = c("09.09.2011 21:54", 
"09.09.2011 22:59", "09.10.2011 00:04", "09.10.2011 01:09", "09.10.2011 02:14"
), class = "factor"), ethane = c(14.4, 868.9, 547, 491.4, 56.1
), propane = c(6.4, 32.1, 23.7, 22.8, 7.2), isobutane = c(1.7, 
2, 1.8, 1.3, 1.1), n.butane = c(3.1, 3, 3.7, 4.3, 2.9), isopentane = c(5.6, 
3, 2.4, 3.4, 2.7), n.pentane = c(1.4, 2.4, 2.3, 2.4, 2.3)), .Names = c("Date", 
"ethane", "propane", "isobutane", "n.butane", "isopentane", "n.pentane"
), row.names = c(NA, 5L), class = "data.frame")

na.fill (x, NA)
na.fill (y, NA

#identify "Date" column

x <- as.POSIXct(x$Date,format='%m.%d.%y %H:%M')
y <- as.POSIXct(y$Date,format='%m.%d.%y %H:%M')

#merge two data sets

merged_data <- merge.data.frame(x, y, by='Date', all=TRUE)

但是,输出文件的日期列&#34; merged_data&#34;充满了NA。我需要在日期列上按小时定时戳。

The aimed output file

1 个答案:

答案 0 :(得分:1)

您的merged_date $ Date是NA,因为转换为POSIXct失败。 获得结果有两个步骤。

  1. 将dfs的Date列投射为实际的Date对象
  2. 舍入(或截断)到小时并加入两个dfs
  3. 投放日期

    有几种方法可以做到这一点:

    as.POSIXct

    x$Date <- as.POSIXct(x$Date, format = '%m.%d.%Y %H:%M')
    

    注意4位数年份的大写字母

    strptime

    与上述几乎相同

    x$Date <- strptime(x$Date, format = '%m.%d.%Y %H:%M')
    

    随时

    使用令人敬畏的anytime软件包 - 让我非常头疼 -

    x$Date <- anytime(x$Date)
    

    回合并加入

    x$Date <- anytime(x$Date)
    y$Date <- anytime(y$Date)
    
    x$Date <- format(x$Date, '%m/%d/%y %H')
    y$Date <- format(y$Date, '%m/%d/%y %H')
    
    merge(x, y, by = Date)
    
    Date          hexane benzene toluene ethane propane isobutane n.butane isopentane n.pentane
    # 09/09/11 21      0       0     2.2   14.4     6.4       1.7      3.1        5.6       1.4
    # 09/09/11 22      0     4.4     2.6  868.9    32.1       2.0      3.0        3.0       2.4
    # 09/10/11 00      0     6.3     3.5  547.0    23.7       1.8      3.7        2.4       2.3
    # 09/10/11 01      0     4.7     2.7  491.4    22.8       1.3      4.3        3.4       2.4
    # 09/10/11 02      0     7.7     3.1   56.1     7.2       1.1      2.9        2.7       2.3
    

    希望这有帮助