我想要实现的是能够基于日期比较数据,如果该日期在范围内,则取最低的“ PDF2”值。
这是我正在使用的两个数据框的示例。我想检查是否在“ df2”的“ R”列中找到了“ df”的“ R”列中的数据,请检查日期是否在df2的范围之间,是否有任何冲突或重复,我想始终保持“ PDF2”的最小值。
df <- data.frame("D" = c("01/01/2019", "01/02/2019", "01/03/2019", "01/12/2019"),
"R" = c("ABC123", "ABC123", "ABC123", "ABC1"),
"PDF" = c(1.23, 1.23, 1.23, 1.23),
stringsAsFactors = FALSE)
df2 <- data.frame("DD" = c("01/01/2019", "01/02/2019", "01/01/2019"),
"DF" = c("01/02/2019", "01/03/2019", "01/11/2019"),
"R" = c("ABC123", "ABC123", "ABC1"),
"PDF2" = c(1.12, 1.11, 1.12),
stringsAsFactors = FALSE)
这是我期望的结果。
result <- data.frame("R" = c("ABC123", "ABC123", "ABC123"),
"D" = c("01/01/2019", "01/02/2019", "01/03/2019"),
"DD" = c("01/01/2019", "01/02/2019", "01/02/2019"),
"DF" = c("01/02/2019", "01/03/2019", "01/03/2019"),
"PDF" = c(1.23, 1.23, 1.23),
"PDF2" = c(1.12, 1.11, 1.11),
stringsAsFactors = FALSE)
您会看到结果中没有“ ABC1”,因为日期不在范围内。
我当前的问题是,仅在日期范围重复或发生冲突时才保留最小值。
这是我当前代码的示例:
temp <- merge(df, df2, by = "R")
myd <- which(as.Date(temp$D, format = "%d/%m/%Y") <= as.Date(temp$DF, format = "%d/%m/%Y"))
myd2 <- which(as.Date(temp$D, format = "%d/%m/%Y") >= as.Date(temp$DD, format = "%d/%m/%Y"))
myd <- myd[myd %in% myd2]
if (length(myd)) {
temp <- temp[myd,]
}
还有如何在单独的数据框中获得与要求不符的行?
答案 0 :(得分:1)
我认为该问题的答案可能会对您有所帮助:
How to find matches for a row in a dataframe conditional on many rows from another dataframe
mobile_number | city
--------------|------
1406-09-227 | Frankfurt
1206-09-221 | Weisbaden
1104-97-221 | Berlin
1507-92-329 | Saarbrücken
答案 1 :(得分:0)
如果您需要高效的工具,可以使用data.table
软件包。以下代码可以满足您的要求
library(data.table)
setDT(df, key="R")
setDT(df2, key="R")
df[, D:=as.Date(D, format = "%d/%m/%Y")]
df2[, `:=`(
DD = as.Date(DD, format = "%d/%m/%Y"),
DF = as.Date(DF, format = "%d/%m/%Y")
)]
df[df2][D>=DD & D<=DF][, .(DD=max(DD), DF=max(DF), PDF2=PDF2[which.max(DD)]), .(D, R, PDF)]
## D R PDF DD DF PDF2
## 1: 2019-01-01 ABC123 1.23 2019-01-01 2019-02-01 1.12
## 2: 2019-02-01 ABC123 1.23 2019-02-01 2019-03-01 1.11
## 3: 2019-03-01 ABC123 1.23 2019-02-01 2019-03-01 1.11