如何删除一个数据框中与另一个数据框中的缺失行相对应的行?

时间:2020-05-17 09:42:03

标签: r dataframe

我有两个带有两列的数据框(日期和数据)。列的长度不同。我想做的是按日期删除df1中不在df2中的行。

将阐明一个例子。这些是我的数据框:

df1 = cbind(data.frame(Date = seq(as.Date("2018-11-1"), as.Date("2020-02-1"), by = "months"), stringsAsFactors = F), data.frame(Data = rnorm(16, 0, 1), stringsAsFactors = F))

         Date        Data
1  2018-11-01  1.09433662
2  2018-12-01 -0.27538189
3  2019-01-01 -0.19712728
4  2019-02-01  0.99852535
5  2019-03-01 -0.50760024
6  2019-04-01 -0.43127396
7  2019-05-01  0.90685965
8  2019-06-01  0.51510503
9  2019-07-01 -0.39070644
10 2019-08-01  1.27976428
11 2019-09-01 -0.63845519
12 2019-10-01 -0.05489751
13 2019-11-01 -0.87745923
14 2019-12-01  0.18082375
15 2020-01-01  0.08852416
16 2020-02-01  1.50827788

df2= cbind(data.frame(Date = df1$Date[c(1:5,7:9,11:13,15:16)]), data.frame(Data = c(1.09433662,-0.27538189, 0.99852535,-0.50760024,-0.43127396, 0.90685965,-0.39070644, 1.27976428,-0.63845519,-0.05489751,-0.87745923, 0.18082375, 1.50827788)))


         Date        Data
1  2018-11-01  1.09433662
2  2018-12-01 -0.27538189
3  2019-01-01  0.99852535
4  2019-02-01 -0.50760024
5  2019-03-01 -0.43127396
6  2019-05-01  0.90685965
7  2019-06-01 -0.39070644
8  2019-07-01  1.27976428
9  2019-09-01 -0.63845519
10 2019-10-01 -0.05489751
11 2019-11-01 -0.87745923
12 2020-01-01  0.18082375
13 2020-02-01  1.50827788

我现在想要的是通过删除不在df1中的行来将df2减少到与df2相同的长度。要删除的行与df2中缺少的月份相对应。

对于df1,结果将是这样:

#df1 where the rows corresponding to the missing months in df2 have been deleted

     Date         Data

1  2018-11-01  1.09433662
2  2018-12-01 -0.27538189
3  2019-01-01 -0.19712728
4  2019-02-01  0.99852535
5  2019-03-01 -0.50760024
6  2019-05-01  0.90685965
7  2019-06-01  0.51510503
8  2019-07-01 -0.39070644
9 2019-09-01 -0.63845519
10 2019-10-01 -0.05489751
11 2019-11-01 -0.87745923
12 2020-01-01  0.08852416
13 2020-02-01  1.50827788

有人可以帮助我吗?

非常感谢!

1 个答案:

答案 0 :(得分:2)

dplyr的

semi_join可以满足您的需求。请注意,您从df2复制了数据作为输出示例。

library(dplyr)

semi_join(df1, df2, by = "Date")

         Date        Data
1  2018-11-01  0.38376758
2  2018-12-01 -0.28738352
3  2019-01-01  1.79556305
4  2019-02-01 -0.34680836
5  2019-03-01  0.57803280
6  2019-05-01  1.96801082
7  2019-06-01  0.38448708
8  2019-07-01  0.39829417
9  2019-09-01  0.94912096
10 2019-10-01 -0.04469681
11 2019-11-01  0.32008546
12 2020-01-01  1.09054839
13 2020-02-01 -1.45438502

anti_join显示应删除的记录。

anti_join(df1, df2, by = "Date")

        Date       Data
1 2019-04-01  2.1303783
2 2019-08-01  1.6907800
3 2019-12-01 -0.8593388