我可以删除两个数据帧df1和df2之间匹配的行,其中一些代码由@Eric Fail提供:
df1[!(apply(df1[1:2], 1, toString) %in% apply(df2[1:2], 1, toString)), ]
或@steveb
的dplyr
解决方案
df1 %>% filter( ! ((date == df2$date) & (ticker == df2$ticker)) )
然而,我意识到我不仅需要删除这样的共享行:
df1 <- data.frame(ticker = c("MSFT", "MSFT", "MSFT", "MSFT"),
date = c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04"), stringsAsFactors=F)
df1
ticker date
1 MSFT 2016-01-01
2 MSFT 2016-01-02
3 MSFT 2016-01-03
4 MSFT 2016-01-04
df2 <- data.frame(ticker = c("AAPL", "GOOG", "MSFT", "FB"),
date = c("2016-01-01", "2016-01-01", "2016-01-02", "2016-01-03"), stringsAsFactors=F)
df2
ticker date
1 AAPL 2016-01-01
2 GOOG 2016-01-01
3 MSFT 2016-01-02
4 FB 2016-01-03
df3
ticker date
1 MSFT 2016-01-01
2 MSFT 2016-01-03
3 MSFT 2016-01-04
但也是指定行的前一天和后一天。所以我的最终df将是:
ticker date
1 MSFT 2016-01-04
注意,3 MSFT 2016-01-02
是匹配项,因此需要删除该行以及前一天和后一天3 MSFT 2016-01-01
和3 MSFT 2016-01-03
两场比赛的例子:
df1 <- data.frame(ticker = c("MSFT", "MSFT", "MSFT", "MSFT"),
date = as.Date(c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04")),
stringsAsFactors=F)
df2 <- data.frame(ticker = c("AAPL", "GOOG", "MSFT", "MSFT"),
date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-01","2016-01-02")),
stringsAsFactors=F)
目标输出:
ticker date
4 MSFT 2016-01-04
答案 0 :(得分:4)
您可以将字符串转换为日期,以便添加和减去日期
df1 <- data.frame(ticker = c("MSFT", "MSFT", "MSFT", "MSFT"),
date = as.Date(c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04")),
stringsAsFactors=F)
df2 <- data.frame(ticker = c("AAPL", "GOOG", "MSFT", "FB"),
date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-02", "2016-01-03")),
stringsAsFactors=F)
(m <- df2[(df2$date %in% df1$date) & (df2$ticker %in% df1$ticker), ])
# ticker date
# 3 MSFT 2016-01-02
df1[!(df1$date %in% (m$date + c(-1,0,1))), ]
# ticker date
# 4 MSFT 2016-01-04
编辑 - 对于多个匹配,只需在每个日期应用function(x)
df1 <- data.frame(ticker = c("MSFT", "MSFT", "MSFT", "MSFT"),
date = as.Date(c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04")),
stringsAsFactors=F)
df2 <- data.frame(ticker = c("AAPL", "GOOG", "MSFT", "MSFT"),
date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-01","2016-01-02")),
stringsAsFactors=F)
(m <- df2[(df2$date %in% df1$date) & (df2$ticker %in% df1$ticker), ])
# ticker date
# 3 MSFT 2016-01-01
# 4 MSFT 2016-01-02
df1[!(df1$date %in% (sapply(m$date, function(x) x + c(-1,0,1)))), ]
# ticker date
# 4 MSFT 2016-01-04