如何比较两个数据帧之间的值

时间:2018-04-17 04:48:25

标签: r dataframe dplyr tidyr

我在下面提到了两个数据帧:

我希望将DF_2的每一列与对应于唯一ID的DF_1进行比较,并在Dif列中,我希望得到DF_2和DF_1之间的精确时间差,以及ST,ST2需要输出中的ST3和ST3列我希望将DF_2的相应列与DF_1匹配。

DF_1

ID     Date                  Val1     Val2   Val3
AB-55  2017-01-04 18:25:14   adj.f@f  ad2er  25
AB-78  2017-01-08 08:48:14   arj.t@y  ar8ey  258
AB-48  2017-01-09 21:25:45   edR.u@t  wu5eu  75

DF_2

ID     Date                  Val1     Val2   Val3
AB-55  2017-01-04 18:25:14   adj.f@f  ad2er  25
AB-78  2017-01-08 08:58:14   arj.t@y  ar7ey  25
AB-48  2017-01-09 21:25:45   edR.u@t  wu5eu  75

必填数据框:

ID     Date                Dif   Val1    ST   Val2  ST2   Val3 ST3
AB-55  2017-01-04 18:25:14 0     adj.f@f T    ad2er T     25   T
AB-78  2017-01-08 08:48:14 10    arj.t@y T    ar8ey F     258  F
AB-48  2017-01-09 21:25:45 0     edR.u@t T    wu5eu T     75   T

1 个答案:

答案 0 :(得分:1)

以下是使用difftime的解决方案,假设您在每个数据框中都有相同的行,将起作用

DF_1$Dif <- as.numeric(difftime(DF_2$Date, DF_1$Date, units = "mins"))

......和数据框:

> DF_1
     ID                Date Dif
1 AB-55 2017-01-04 18:25:14   0
2 AB-78 2017-01-08 08:48:14  10

或者,如果行不匹配,请先首先实现merge

DF_3 <- merge(DF_1, DF_2, by = "ID", all.x = T, all.y = T)
DF_3$Dif <- as.numeric(difftime(DF_3$Date.y, DF_3$Date.x, units = "mins"))

结果......

> DF_3
     ID              Date.x              Date.y Dif
1 AB-55 2017-01-04 18:25:14 2017-01-04 18:25:14   0
2 AB-78 2017-01-08 08:48:14 2017-01-08 08:58:14  10
3 AB-89 2017-01-08 10:35:14                <NA>  NA
4 AD-87                <NA> 2017-01-08 06:15:12  NA

包含其他列,请按照您对问题所做的修改中的要求进行操作:

# Here I'm just making your dataframe, please provide code to make reproducible data where possible in the future
DF_1 <- data.frame("ID" = c("AB-55", "AB-78", "AB-89"), "Date" = c("2017-01-04 18:25:14", "2017-01-08 08:48:14", "2017-01-08 10:35:14")) 
DF_2 <- data.frame("ID" = c("AB-55", "AB-78", "AD-87"), "Date" = c("2017-01-04 18:25:14", "2017-01-08 08:58:14", "2017-01-08 06:15:12")) 

DF_1$Val1 <- c("adj.f@f", "adj.t@y", "edR.u@t")
DF_2$Val1 <- c("adj.f@f", "adj.t@y", "edR.u@t")

DF_1$Val2 <- c("ad2er", "ar7ey", "wu5eu")
DF_2$Val2 <- c("ad2er", "ar8ey", "wu5eu")

DF_1$Val3 <- c(25, 258, 75)
DF_2$Val3 <- c(25,  25, 75)

# The merge and difftime part remains
DF_3 <- merge(DF_1, DF_2, by = "ID", all.x = T, all.y = T)
DF_3$Dif <- as.numeric(difftime(DF_3$Date.y, DF_3$Date.x, units = "mins"))

# True/False part
DF_3$ST1 <- DF_3$Val1.x == DF_3$Val1.y
DF_3$ST2 <- DF_3$Val2.x == DF_3$Val2.y
DF_3$ST3 <- DF_3$Val3.x == DF_3$Val3.y

数据框

DF_3


> DF_3
     ID              Date.x  Val1.x Val2.x Val3.x              Date.y  Val1.y Val2.y Val3.y Dif  ST1   ST2   ST3
1 AB-55 2017-01-04 18:25:14 adj.f@f  ad2er     25 2017-01-04 18:25:14 adj.f@f  ad2er     25   0 TRUE  TRUE  TRUE
2 AB-78 2017-01-08 08:48:14 adj.t@y  ar7ey    258 2017-01-08 08:58:14 adj.t@y  ar8ey     25  10 TRUE FALSE FALSE
3 AB-89 2017-01-08 10:35:14 edR.u@t  wu5eu     75                <NA>    <NA>   <NA>     NA  NA   NA    NA    NA
4 AD-87                <NA>    <NA>   <NA>     NA 2017-01-08 06:15:12 edR.u@t  wu5eu     75  NA   NA    NA    NA