如何计算两个数据框中日期之间的差异

时间:2017-07-17 14:41:18

标签: r

以下是第一个数据框

Account reference number    Amount  Date
A   1   1583.51 16/05/2016
B   2   4038.18 27/09/2016
C   3   1161.36 20/05/2016
C   4   732.39  24/10/2016
C   5   747.69  24/11/2016

以下是第二个数据框

Account reference number    Amount  Date
A   6   3062.88 03/05/2016
A   7   2619.09 03/05/2016
A   8   4743.22 09/05/2016
B   9   115.28  03/05/2016
B   10  993.14  03/05/2016
B   11  879.05  03/05/2016
C   12  50.93   03/05/2016
C   13  21.83   03/05/2016
C   14  14.55   03/05/2016

我想通过比较两个数据框找到每个帐户的日期差异。 例如,如果您比较帐户数据框中的日期' A'它应该是-13天,因为开始日期是16/05/2016,停止日期是03/05/2016。

我希望使用该帐户的第二个数据框中的每个日期检查第一个数据框中的日期。如果您在我的问题16/05/2016中考虑A,请查看03/05/2016和09/05/2016

有人可以帮忙吗?

4 个答案:

答案 0 :(得分:0)

创建了我自己的样本数据,因为你的数据很难复制。基于dplyr的解决方案:

df1 = data.frame(account=c(1,2,3,4),date=seq(Sys.Date(),Sys.Date()+3,by=1),value = c(1,1,1,1))
df2 = data.frame(account=c(1,2,3,4),date=seq(Sys.Date()+2,Sys.Date()+5,by=1), value = c(2,2,2,2))

require(dplyr)

df2 = df2 %>% select(account,df2.date=date)
df1 = df1 %>% left_join(df2) %>% mutate(diff = as.numeric(date-df2.date))

INPUT

> df1
  account       date value
1       1 2017-07-17     1
2       2 2017-07-18     1
3       3 2017-07-19     1
4       4 2017-07-20     1
> df2
  account       date value
1       1 2017-07-19     2
2       2 2017-07-20     2
3       3 2017-07-21     2
4       4 2017-07-22     2

输出

> df1
  account       date value   df2.date diff
1       1 2017-07-17     1 2017-07-19   -2
2       2 2017-07-18     1 2017-07-20   -2
3       3 2017-07-19     1 2017-07-21   -2
4       4 2017-07-20     1 2017-07-22   -2

希望这有帮助!

答案 1 :(得分:0)

为简单起见,我认为第一个日期框架称为a,第二个日期框架称为第二个日期框架。 我已经用缩写形式创建了它们

a <- data.frame(Account = c("A,B"), reference_number = c(1,2), Amount = c(1583.51,4038.18),  Date = c("16/05/2016","27/09/2016"))
b <- data.frame(Account = c("A,A"), reference_number = c(6,7), Amount = c(3062.88,2619.09),  Date = c("03/05/2016","03/05/2016"))

您可以通过以下方式找到两个日期之间的差异:

#days
difftime(strptime(b$Date[1], format = "%d/%m/%Y"),
     strptime(a$Date[1], format = "%d/%m/%Y"),units="days")

#weeks
difftime(strptime(b$Date[1], format = "%d/%m/%Y"),
     strptime(a$Date[1], format = "%d/%m/%Y"),units="weeks")

答案 2 :(得分:0)

根据Florian的回答使用示例数据:

df1 = data.frame(account=c("A","A","B","B"),date=seq(Sys.Date(),Sys.Date()+3,by=1),value = c(1,1,1,1))
df2 = data.frame(account=c("A","A","A","B"),date=seq(Sys.Date()+2,Sys.Date()+5,by=1),value = c(2,2,2,2))

我在每个数据框中添加了每个account的几个实例。这对于获得自己数据的正确输出非常重要:

library(dplyr)
library(lubridate)
full_join(df1,df2,by="account") %>%
  mutate(diff=date.x-date.y) %>%

  account     date.x value.x     date.y value.y    diff
1       A 2017-07-17       1 2017-07-19       2 -2 days
2       A 2017-07-17       1 2017-07-20       2 -3 days
3       A 2017-07-17       1 2017-07-21       2 -4 days
4       A 2017-07-18       1 2017-07-19       2 -1 days
5       A 2017-07-18       1 2017-07-20       2 -2 days
6       A 2017-07-18       1 2017-07-21       2 -3 days
7       B 2017-07-19       1 2017-07-22       2 -3 days
8       B 2017-07-20       1 2017-07-22       2 -2 days 

答案 3 :(得分:0)

您可以使用plyrdplyr包来获得所需的输出。它首先对组合数据帧进行排序,然后计算每个组中每行的第一个日期和日期之间的时间差。之后,它找到每个组的最大值,最后删除添加的列。

df <- rbind(df1,df2)
df$Date <- as.Date(df$Date, "%d/%m/%Y")

library(dplyr)

df <- df %>% 
         arrange(Account, Date)

library(plyr)


plyr::ddply((df), .(Account), transform, 
      Date_1 = Date[1],
      change = abs((Date - Date[1]))) %>% 
             dplyr::group_by(Account) %>% 
             dplyr::slice(which.max(change)) %>%
             dplyr::select(-Date_1)


# Source: local data frame [3 x 5] 
# Groups: Account [3] 
#  
# # A tibble: 3 x 5 
#   Account reference.number  Amount       Date   change 
#    <fctr>            <int>   <dbl>     <date>   <time> 
# 1       A                1 1583.51 2016-05-16  13 days 
# 2       B                2 4038.18 2016-09-27 147 days 
# 3       C                5  747.69 2016-11-24 205 days

<强> 数据

df1 <- structure(list(Account = structure(c(1L, 2L, 3L, 3L, 3L), .Label = c("A", 
"B", "C"), class = "factor"), reference.number = 1:5, Amount = c(1583.51, 
4038.18, 1161.36, 732.39, 747.69), Date = structure(c(1L, 5L, 
2L, 3L, 4L), .Label = c("16/05/2016", "20/05/2016", "24/10/2016", 
"24/11/2016", "27/09/2016"), class = "factor")), .Names = c("Account", 
"reference.number", "Amount", "Date"), class = "data.frame", row.names = c(NA,-5L))

df2 <- structure(list(Account = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), reference.number = 6:14, 
Amount = c(3062.88, 2619.09, 4743.22, 115.28, 993.14, 879.05, 
50.93, 21.83, 14.55), Date = structure(c(1L, 1L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("03/05/2016", "09/05/2016"
), class = "factor")), .Names = c("Account", "reference.number", 
"Amount", "Date"), class = "data.frame", row.names = c(NA, -9L))