我正在尝试选择两个数据帧的通用值。我有一个big_df和一个small_df
我要获取的是一个数据帧,其中两个数据帧中只有“ ID”值是公共的,我只想保留big_df而不是small_df。
library(dplyr)
df3 <- merge(big_df, small_df, by =("ID"))
> df3
ID Age Name Colour
1 1 21 a blue
2 4 20 d green
3 8 87 h red
4 9 9 i black
big_df <- data.frame("ID" = 1:10, "Age" = c(21,15,1,20,34,45,67,87,9,77), "Name" = c("a","b","c","d","e","f","g","h","i","l"))
> big_df
ID Age Name
1 1 21 a
2 2 15 b
3 3 1 c
4 4 20 d
5 5 34 e
6 6 45 f
7 7 67 g
8 8 87 h
9 9 9 i
10 10 77 l
small_df <- data.frame("ID" = c(1,4,8,9), "Colour" = c("blue","green","red","black"))
> small_df
ID Colour
1 1 blue
2 4 green
3 8 red
4 9 black
我想要的是,没有颜色信息
> df3
ID Age Name
1 1 21 a
2 4 20 d
3 8 87 h
4 9 9 i
答案 0 :(得分:2)
dplyr
的{{1}}专为此目的
semi_join()
答案 1 :(得分:1)
我觉得您真正需要的是:
#check which big IDs exist in small IDs and subset
big_df[big_df$ID %in% unique(small_df$ID), ]
# ID Age Name
#1 1 21 a
#4 4 20 d
#8 8 87 h
#9 9 9 i
因此,在这种情况下,我认为您不需要加入。