我有两个带有两列的数据框(日期和数据)。列的长度不同。我想做的是按日期删除df1中不在df2中的行。
将阐明一个例子。这些是我的数据框:
df1 = cbind(data.frame(Date = seq(as.Date("2018-11-1"), as.Date("2020-02-1"), by = "months"), stringsAsFactors = F), data.frame(Data = rnorm(16, 0, 1), stringsAsFactors = F))
Date Data
1 2018-11-01 1.09433662
2 2018-12-01 -0.27538189
3 2019-01-01 -0.19712728
4 2019-02-01 0.99852535
5 2019-03-01 -0.50760024
6 2019-04-01 -0.43127396
7 2019-05-01 0.90685965
8 2019-06-01 0.51510503
9 2019-07-01 -0.39070644
10 2019-08-01 1.27976428
11 2019-09-01 -0.63845519
12 2019-10-01 -0.05489751
13 2019-11-01 -0.87745923
14 2019-12-01 0.18082375
15 2020-01-01 0.08852416
16 2020-02-01 1.50827788
df2= cbind(data.frame(Date = df1$Date[c(1:5,7:9,11:13,15:16)]), data.frame(Data = c(1.09433662,-0.27538189, 0.99852535,-0.50760024,-0.43127396, 0.90685965,-0.39070644, 1.27976428,-0.63845519,-0.05489751,-0.87745923, 0.18082375, 1.50827788)))
Date Data
1 2018-11-01 1.09433662
2 2018-12-01 -0.27538189
3 2019-01-01 0.99852535
4 2019-02-01 -0.50760024
5 2019-03-01 -0.43127396
6 2019-05-01 0.90685965
7 2019-06-01 -0.39070644
8 2019-07-01 1.27976428
9 2019-09-01 -0.63845519
10 2019-10-01 -0.05489751
11 2019-11-01 -0.87745923
12 2020-01-01 0.18082375
13 2020-02-01 1.50827788
我现在想要的是通过删除不在df1
中的行来将df2
减少到与df2
相同的长度。要删除的行与df2
中缺少的月份相对应。
对于df1
,结果将是这样:
#df1 where the rows corresponding to the missing months in df2 have been deleted
Date Data
1 2018-11-01 1.09433662
2 2018-12-01 -0.27538189
3 2019-01-01 -0.19712728
4 2019-02-01 0.99852535
5 2019-03-01 -0.50760024
6 2019-05-01 0.90685965
7 2019-06-01 0.51510503
8 2019-07-01 -0.39070644
9 2019-09-01 -0.63845519
10 2019-10-01 -0.05489751
11 2019-11-01 -0.87745923
12 2020-01-01 0.08852416
13 2020-02-01 1.50827788
有人可以帮助我吗?
非常感谢!
答案 0 :(得分:2)
semi_join
可以满足您的需求。请注意,您从df2复制了数据作为输出示例。
library(dplyr)
semi_join(df1, df2, by = "Date")
Date Data
1 2018-11-01 0.38376758
2 2018-12-01 -0.28738352
3 2019-01-01 1.79556305
4 2019-02-01 -0.34680836
5 2019-03-01 0.57803280
6 2019-05-01 1.96801082
7 2019-06-01 0.38448708
8 2019-07-01 0.39829417
9 2019-09-01 0.94912096
10 2019-10-01 -0.04469681
11 2019-11-01 0.32008546
12 2020-01-01 1.09054839
13 2020-02-01 -1.45438502
和anti_join
显示应删除的记录。
anti_join(df1, df2, by = "Date")
Date Data
1 2019-04-01 2.1303783
2 2019-08-01 1.6907800
3 2019-12-01 -0.8593388