根据列

时间:2017-08-05 03:31:03

标签: r dataframe merge match

我有两个数据框。一个如下:

   D_Time                   Speed_BT       Speed_GT
2016-09-12 00:15:00          23            60  
2016-09-12 00:45:00          13            48  
2016-09-12 01:30:00          13            25  

另一个是这样的:

D_Time                     Speed_AA        Speed_DD
2016-09-12 00:30:00          29            17  
2016-09-12 01:00:00          46            59  
2016-09-12 01:30:00          36            51

我想基于D_Time添加两个数据框。因此,它将如下表所示:

D_Time                   Speed_BT       Speed_GT    Speed_AA   Speed_DD
2016-09-12 00:15:00          23            60          NA         NA
2016-09-12 00:30:00          NA            NA          29         17
2016-09-12 00:45:00          13            48          NA         NA
2016-09-12 01:00:00          NA            NA          46         59
2016-09-12 01:15:00          NA            NA          NA         NA
2016-09-12 01:30:00          13            25          36         51 

如果我能像我在数据框中添加的那样添加第5行,那将会很棒。但是,如果没有其他办法,那就可以了。

我已经尝试过使用此命令:

add <- merge(df1, df2,by = "D_Time", all=TRUE)

但是,问题是它没有正确添加。 Speed_AASpeed_DD值会在时间不同的行中添加。

D_Time class is "POSIXct" "POSIXt".

任何人都可以帮助我吗?

提前致谢。

3 个答案:

答案 0 :(得分:2)

您需要先每15分钟创建一个序列,然后将其与数据框合并,即

ind <- c(df1$D_Time, df2$D_Time)

df4 <- data.frame(D_Time = seq.POSIXt(min(ind), max(ind), by = '15 mins'), 
                                                             stringsAsFactors = FALSE)

Reduce(function(...)merge(..., all = TRUE), list(df1, df2, df4))

由此给出,

            D_Time Speed_BT Speed_GT Speed_AA Speed_DD
1 2016-09-12 00:15:00       23       60       NA       NA
2 2016-09-12 00:30:00       NA       NA       29       17
3 2016-09-12 00:45:00       13       48       NA       NA
4 2016-09-12 01:00:00       NA       NA       46       59
5 2016-09-12 01:15:00       NA       NA       NA       NA
6 2016-09-12 01:30:00       13       25       36       51

答案 1 :(得分:0)

除第5行外,可通过以下方式实现所需的输出:

df <- read.table(text="D_Time,Speed_BT,Speed_GT
2016-09-12 00:15:00, 23,  60  
2016-09-12 00:45:00, 13,  48  
2016-09-12 01:30:00, 13,  25", header=TRUE, sep=",")

df2 <- read.table(text="D_Time, Speed_AA,        Speed_DD
2016-09-12 00:30:00,          29,            17  
2016-09-12 01:00:00,          46,            59  
2016-09-12 01:30:00,          36,            51
", header=TRUE, sep=",")

merge(df, df2, all=TRUE)

如果您想要包含第五行,则必须位于dfdf2中的一个数据框中,如果您初始化df,如下所示,然后调用merge(df, df2, all=TRUE)你也将获得第五行。

df <- read.table(text="D_Time,Speed_BT,Speed_GT
2016-09-12 00:15:00, 23,  60  
2016-09-12 00:45:00, 13,  48  
2016-09-12 01:30:00, 13,  25
2016-09-12 01:15:00, NA, NA", header=TRUE, sep=",")

答案 2 :(得分:0)

以下是两种data.table方法:

多个右连接

这或多或少是Sotos' answerdata.table版本:

library(data.table)
setDT(df1, key = "D_Time")[setDT(df2, key = "D_Time")[
  .(D_Time = seq(min(df1$D_Time, df2$D_Time),
                 max(df1$D_Time, df2$D_Time), by = "15 mins"))]]
                D_Time Speed_BT Speed_GT Speed_AA Speed_DD
1: 2016-09-12 00:15:00       23       60       NA       NA
2: 2016-09-12 00:30:00       NA       NA       29       17
3: 2016-09-12 00:45:00       13       48       NA       NA
4: 2016-09-12 01:00:00       NA       NA       46       59
5: 2016-09-12 01:15:00       NA       NA       NA       NA
6: 2016-09-12 01:30:00       13       25       36       51

使用melt()dcast()

此方法也可用于组合两个以上的数据帧。各个数据块从宽到长形式重新整形,组合成一个大文件,然后再次从长格式转换为宽格式。最后,时间戳序列正确连接。

rbindlist(lapply(list(df1, df2), melt, id.vars = "D_Time"))[
  , dcast(.SD, D_Time ~ variable)][
    .(seq(min(D_Time), max(D_Time), by = "15 mins")), on = "D_Time"]
                D_Time Speed_BT Speed_GT Speed_AA Speed_DD
1: 2016-09-12 00:15:00       23       60       NA       NA
2: 2016-09-12 00:30:00       NA       NA       29       17
3: 2016-09-12 00:45:00       13       48       NA       NA
4: 2016-09-12 01:00:00       NA       NA       46       59
5: 2016-09-12 01:15:00       NA       NA       NA       NA
6: 2016-09-12 01:30:00       13       25       36       51

数据

df1 <- readr::read_table(
  "   D_Time                   Speed_BT       Speed_GT
2016-09-12 00:15:00          23            60  
  2016-09-12 00:45:00          13            48  
  2016-09-12 01:30:00          13            25  ")
df2 <- readr::read_table(
  "D_Time                     Speed_AA        Speed_DD
2016-09-12 00:30:00          29            17  
  2016-09-12 01:00:00          46            59  
  2016-09-12 01:30:00          36            51")