删除匹配行之前和之后的天数

时间:2016-01-23 03:05:05

标签: r

我可以删除两个数据帧df1和df2之间匹配的行,其中一些代码由@Eric Fail提供:

df1[!(apply(df1[1:2], 1, toString) %in% apply(df2[1:2], 1, toString)), ]

或@steveb

dplyr解决方案

df1 %>% filter( ! ((date == df2$date) & (ticker == df2$ticker)) )

然而,我意识到我不仅需要删除这样的共享行:

df1 <- data.frame(ticker = c("MSFT", "MSFT", "MSFT", "MSFT"), 
date = c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04"), stringsAsFactors=F)
df1

  ticker       date
1   MSFT 2016-01-01
2   MSFT 2016-01-02
3   MSFT 2016-01-03
4   MSFT 2016-01-04

df2 <- data.frame(ticker = c("AAPL", "GOOG", "MSFT", "FB"), 
date = c("2016-01-01", "2016-01-01", "2016-01-02", "2016-01-03"), stringsAsFactors=F)
df2

  ticker       date
1   AAPL 2016-01-01
2   GOOG 2016-01-01
3   MSFT 2016-01-02
4     FB 2016-01-03

df3 

  ticker       date
1   MSFT 2016-01-01
2   MSFT 2016-01-03
3   MSFT 2016-01-04

但也是指定行的前一天和后一天。所以我的最终df将是:

  ticker       date
1   MSFT 2016-01-04

注意,3 MSFT 2016-01-02是匹配项,因此需要删除该行以及前一天和后一天3 MSFT 2016-01-013 MSFT 2016-01-03

两场比赛的例子:

df1 <- data.frame(ticker = c("MSFT", "MSFT", "MSFT", "MSFT"),
                  date = as.Date(c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04")),
                  stringsAsFactors=F)
df2 <- data.frame(ticker = c("AAPL", "GOOG", "MSFT", "MSFT"),
                  date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-01","2016-01-02")),
                  stringsAsFactors=F)

目标输出:

ticker       date
4   MSFT 2016-01-04

1 个答案:

答案 0 :(得分:4)

您可以将字符串转换为日期,以便添加和减去日期

df1 <- data.frame(ticker = c("MSFT", "MSFT", "MSFT", "MSFT"),
                  date = as.Date(c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04")),
                  stringsAsFactors=F)
df2 <- data.frame(ticker = c("AAPL", "GOOG", "MSFT", "FB"),
                  date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-02", "2016-01-03")),
                  stringsAsFactors=F)


(m <- df2[(df2$date %in% df1$date) & (df2$ticker %in% df1$ticker), ])
#   ticker       date
# 3   MSFT 2016-01-02

df1[!(df1$date %in% (m$date + c(-1,0,1))), ]

#   ticker       date
# 4   MSFT 2016-01-04

编辑 - 对于多个匹配,只需在每个日期应用function(x)

df1 <- data.frame(ticker = c("MSFT", "MSFT", "MSFT", "MSFT"),
                  date = as.Date(c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04")),
                  stringsAsFactors=F)
df2 <- data.frame(ticker = c("AAPL", "GOOG", "MSFT", "MSFT"),
                  date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-01","2016-01-02")),
                  stringsAsFactors=F)

(m <- df2[(df2$date %in% df1$date) & (df2$ticker %in% df1$ticker), ])
#   ticker       date
# 3   MSFT 2016-01-01
# 4   MSFT 2016-01-02

df1[!(df1$date %in% (sapply(m$date, function(x) x + c(-1,0,1)))), ]
#   ticker       date
# 4   MSFT 2016-01-04