合并具有不同采样频率的数据帧

时间:2019-01-25 22:21:15

标签: r dataframe merge

我有两个数据框-一个是每小时数据集(dfa),一个是每天在同一小时(15:29:05)每天进行测量(dfb)(请参见下面的两天示例) )

我想合并这些数据框,以便我将所有小时数据和每日联接保持正确的小时,​​并且当一天的其他小时没有数据时,它会填充NA

简单地应用合并只会将其剪切为每日数据,因此我松散了所有每小时的信息:

 dfc <- merge(dfa, dfb, by = "datetime")

任何帮助将不胜感激。

例如为期两天:

#hourly
dfa <- structure(list(datetime = structure(c(1466231345, 1466234945, 
 1466238545, 1466242145, 1466245745, 1466249345, 1466252945, 1466256545, 
 1466260145, 1466263745, 1466267345, 1466270945, 1466274545, 1466278145, 
 1466281745, 1466285345, 1466288945, 1466292545, 1466296145, 1466299745, 
 1466303345, 1466306945, 1466310545, 1466314145, 1466317745, 1466321345, 
 1466324945, 1466328545, 1466332145, 1466335745, 1466339345, 1466342945, 
 1466346545, 1466350145, 1466353745, 1466357345, 1466360945, 1466364545, 
 1466368145, 1466371745, 1466375345, 1466378945, 1466382545, 1466386145, 
 1466389745, 1466393345, 1466396945, 1466400545), class = c("POSIXct", 
 "POSIXt"), tzone = "UTC"), DFQ1 = c(0.408025, 0.4355833335, 
  0.68485, 0.650875, 0.5307833335, 0.509775, 0.5273135595, 0.5763083335, 
  0.4954, 0.444308333, 0.4048083335, 0.419475, 0.35105, 0.2740416665, 
  0.3038666665, 0.351774317, 0.306025, 0.3183916665, 0.249175, 
  0.268133333, 0.3285083335, 0.2807666665, 0.351633333, 0.374516667, 
 0.3763, 0.3806583335, 0.366675, 0.411133333, 0.433291667, 0.408225, 
 0.3812, 0.380358333, 0.3557166665, 0.3701, 0.400788842, 0.396833333, 
 0.362991667, 0.3790083335, 0.3631666665, 0.367041667, 0.3899583335, 
  0.360658333, 0.359675, 0.356358333, 0.3864083335, 0.3965083335, 
  0.3901166665, 0.403976695)), class = "data.frame", row.names = c(NA, 
 -48L))

 #daily
 dfb <- structure(list(datetime = structure(c(1466263745, 1466350145), class 
= c("POSIXct", 
"POSIXt"), tzone = "UTC"), Tchl = c(0.1265, 0.1503), TCSE = structure(c(12L, 
 9L), .Label = c("", "#DIV/0!", "0.000", "0.001", "0.002", "0.003", 
"0.004", "0.005", "0.007", "0.008", "0.009", "0.010", "0.011", 
"0.012", "0.013", "0.015", "0.021", "0.026", "0.027", "CB2016", 
"Std error"), class = "factor")), class = "data.frame", row.names = c(NA, 
 -2L))

2 个答案:

答案 0 :(得分:1)

您可以使用此

dfc <- merge(dfa, dfb, by = "datetime", all.x = TRUE)

#               datetime      DFQ1   Tchl  TCSE 
# 1  2016-06-18 06:29:05 0.4080250     NA  <NA>
# 2  2016-06-18 07:29:05 0.4355833     NA  <NA>
# 3  2016-06-18 08:29:05 0.6848500     NA  <NA>
# 4  2016-06-18 09:29:05 0.6508750     NA  <NA>
# 5  2016-06-18 10:29:05 0.5307833     NA  <NA>
# 6  2016-06-18 11:29:05 0.5097750     NA  <NA>
# 7  2016-06-18 12:29:05 0.5273136     NA  <NA>
# 8  2016-06-18 13:29:05 0.5763083     NA  <NA>
# 9  2016-06-18 14:29:05 0.4954000     NA  <NA>
# 10 2016-06-18 15:29:05 0.4443083 0.1265 0.010
# ...

答案 1 :(得分:1)

tidyverse解决方案:

library(tidyverse)
dfc <- left_join(dfa, dfb, by="datetime")

#> head(dfc,10)
#              datetime      DFQ1   Tchl  TCSE
#1  2016-06-18 06:29:05 0.4080250     NA  <NA>
#2  2016-06-18 07:29:05 0.4355833     NA  <NA>
#3  2016-06-18 08:29:05 0.6848500     NA  <NA>
#4  2016-06-18 09:29:05 0.6508750     NA  <NA>
#5  2016-06-18 10:29:05 0.5307833     NA  <NA>
#6  2016-06-18 11:29:05 0.5097750     NA  <NA>
#7  2016-06-18 12:29:05 0.5273136     NA  <NA>
#8  2016-06-18 13:29:05 0.5763083     NA  <NA>
#9  2016-06-18 14:29:05 0.4954000     NA  <NA>
#10 2016-06-18 15:29:05 0.4443083 0.1265 0.010