以下是第一个数据框
Account reference number Amount Date
A 1 1583.51 16/05/2016
B 2 4038.18 27/09/2016
C 3 1161.36 20/05/2016
C 4 732.39 24/10/2016
C 5 747.69 24/11/2016
以下是第二个数据框
Account reference number Amount Date
A 6 3062.88 03/05/2016
A 7 2619.09 03/05/2016
A 8 4743.22 09/05/2016
B 9 115.28 03/05/2016
B 10 993.14 03/05/2016
B 11 879.05 03/05/2016
C 12 50.93 03/05/2016
C 13 21.83 03/05/2016
C 14 14.55 03/05/2016
我想通过比较两个数据框找到每个帐户的日期差异。 例如,如果您比较帐户数据框中的日期' A'它应该是-13天,因为开始日期是16/05/2016,停止日期是03/05/2016。
我希望使用该帐户的第二个数据框中的每个日期检查第一个数据框中的日期。如果您在我的问题16/05/2016中考虑A,请查看03/05/2016和09/05/2016
有人可以帮忙吗?
答案 0 :(得分:0)
创建了我自己的样本数据,因为你的数据很难复制。基于dplyr的解决方案:
df1 = data.frame(account=c(1,2,3,4),date=seq(Sys.Date(),Sys.Date()+3,by=1),value = c(1,1,1,1))
df2 = data.frame(account=c(1,2,3,4),date=seq(Sys.Date()+2,Sys.Date()+5,by=1), value = c(2,2,2,2))
require(dplyr)
df2 = df2 %>% select(account,df2.date=date)
df1 = df1 %>% left_join(df2) %>% mutate(diff = as.numeric(date-df2.date))
INPUT
> df1
account date value
1 1 2017-07-17 1
2 2 2017-07-18 1
3 3 2017-07-19 1
4 4 2017-07-20 1
> df2
account date value
1 1 2017-07-19 2
2 2 2017-07-20 2
3 3 2017-07-21 2
4 4 2017-07-22 2
输出
> df1
account date value df2.date diff
1 1 2017-07-17 1 2017-07-19 -2
2 2 2017-07-18 1 2017-07-20 -2
3 3 2017-07-19 1 2017-07-21 -2
4 4 2017-07-20 1 2017-07-22 -2
希望这有帮助!
答案 1 :(得分:0)
为简单起见,我认为第一个日期框架称为a,第二个日期框架称为第二个日期框架。 我已经用缩写形式创建了它们
a <- data.frame(Account = c("A,B"), reference_number = c(1,2), Amount = c(1583.51,4038.18), Date = c("16/05/2016","27/09/2016"))
b <- data.frame(Account = c("A,A"), reference_number = c(6,7), Amount = c(3062.88,2619.09), Date = c("03/05/2016","03/05/2016"))
您可以通过以下方式找到两个日期之间的差异:
#days
difftime(strptime(b$Date[1], format = "%d/%m/%Y"),
strptime(a$Date[1], format = "%d/%m/%Y"),units="days")
#weeks
difftime(strptime(b$Date[1], format = "%d/%m/%Y"),
strptime(a$Date[1], format = "%d/%m/%Y"),units="weeks")
答案 2 :(得分:0)
根据Florian
的回答使用示例数据:
df1 = data.frame(account=c("A","A","B","B"),date=seq(Sys.Date(),Sys.Date()+3,by=1),value = c(1,1,1,1))
df2 = data.frame(account=c("A","A","A","B"),date=seq(Sys.Date()+2,Sys.Date()+5,by=1),value = c(2,2,2,2))
我在每个数据框中添加了每个account
的几个实例。这对于获得自己数据的正确输出非常重要:
library(dplyr)
library(lubridate)
full_join(df1,df2,by="account") %>%
mutate(diff=date.x-date.y) %>%
account date.x value.x date.y value.y diff
1 A 2017-07-17 1 2017-07-19 2 -2 days
2 A 2017-07-17 1 2017-07-20 2 -3 days
3 A 2017-07-17 1 2017-07-21 2 -4 days
4 A 2017-07-18 1 2017-07-19 2 -1 days
5 A 2017-07-18 1 2017-07-20 2 -2 days
6 A 2017-07-18 1 2017-07-21 2 -3 days
7 B 2017-07-19 1 2017-07-22 2 -3 days
8 B 2017-07-20 1 2017-07-22 2 -2 days
答案 3 :(得分:0)
您可以使用plyr
和dplyr
包来获得所需的输出。它首先对组合数据帧进行排序,然后计算每个组中每行的第一个日期和日期之间的时间差。之后,它找到每个组的最大值,最后删除添加的列。
df <- rbind(df1,df2)
df$Date <- as.Date(df$Date, "%d/%m/%Y")
library(dplyr)
df <- df %>%
arrange(Account, Date)
library(plyr)
plyr::ddply((df), .(Account), transform,
Date_1 = Date[1],
change = abs((Date - Date[1]))) %>%
dplyr::group_by(Account) %>%
dplyr::slice(which.max(change)) %>%
dplyr::select(-Date_1)
# Source: local data frame [3 x 5]
# Groups: Account [3]
#
# # A tibble: 3 x 5
# Account reference.number Amount Date change
# <fctr> <int> <dbl> <date> <time>
# 1 A 1 1583.51 2016-05-16 13 days
# 2 B 2 4038.18 2016-09-27 147 days
# 3 C 5 747.69 2016-11-24 205 days
<强> 数据 强>
df1 <- structure(list(Account = structure(c(1L, 2L, 3L, 3L, 3L), .Label = c("A",
"B", "C"), class = "factor"), reference.number = 1:5, Amount = c(1583.51,
4038.18, 1161.36, 732.39, 747.69), Date = structure(c(1L, 5L,
2L, 3L, 4L), .Label = c("16/05/2016", "20/05/2016", "24/10/2016",
"24/11/2016", "27/09/2016"), class = "factor")), .Names = c("Account",
"reference.number", "Amount", "Date"), class = "data.frame", row.names = c(NA,-5L))
df2 <- structure(list(Account = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), reference.number = 6:14,
Amount = c(3062.88, 2619.09, 4743.22, 115.28, 993.14, 879.05,
50.93, 21.83, 14.55), Date = structure(c(1L, 1L, 2L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = c("03/05/2016", "09/05/2016"
), class = "factor")), .Names = c("Account", "reference.number",
"Amount", "Date"), class = "data.frame", row.names = c(NA, -9L))