按照另一个数据帧中的名称过滤一个数据帧,同时保留多个类别

时间:2016-04-14 21:26:28

标签: r dplyr

我有两个数据框,希望在animals中的table1列中animals列中table2列过滤table1列,同时保留table1中的多个类别的动物1}}对于猫和狗。最终结果应与match相同,但删除了“lion”。应该还有两只“猫”和两只“狗”。

我不知道如何作为一个新手来处理这个问题。我觉得答案涉及dplyr函数或某种类型的连接?如果可能,我希望使用reshape2dplyr方法,特别是如果有办法使用merge连接函数。我对table1 <- data.frame(id=c(1:7), animal=c("cat","cat","dog","dog","parakeet","lion","duck")) table2 <- data.frame(id=c(1:4), animal=c("cat","dog","parakeet","duck")) 基函数也不是很熟悉。

以下是两个数据框的代码:

ping 'www.ibm.com'                                                     
Verifying connection to host system e2874.x.akamaiedge.net at address  
  23.64.119.102.                                                       
PING reply 1 from 23.64.119.102 took 20 ms. 256 bytes. TTL 55.         
PING reply 2 from 23.64.119.102 took 20 ms. 256 bytes. TTL 55.         
PING reply 3 from 23.64.119.102 took 20 ms. 256 bytes. TTL 55.         
PING reply 4 from 23.64.119.102 took 20 ms. 256 bytes. TTL 55.         
PING reply 5 from 23.64.119.102 took 20 ms. 256 bytes. TTL 55.         
Round-trip (in milliseconds) min/avg/max = 20/20/20.                   
Connection verification statistics: 5 of 5 successful (100 %).    

3 个答案:

答案 0 :(得分:4)

你可以像这样使用%in%

table1[table1$animal %in% table2$animal,]

  id   animal
1  1      cat
2  2      cat
3  3      dog
4  4      dog
5  5 parakeet
7  7     duck

答案 1 :(得分:1)

使用data.table

library(data.table)
setDT(table1)[table2[-1], on = "animal"]
#   id   animal
#1:  1      cat
#2:  2      cat
#3:  3      dog
#4:  4      dog
#5:  5 parakeet
#6:  7     duck

答案 2 :(得分:0)

您可以在dplyr中使用semi_join执行此操作,这将

      return all rows from ‘x’ where there are matching
      values in ‘y’, keeping just columns from ‘x’.

      A semi join differs from an inner join because an inner join
      will return one row of ‘x’ for each matching row of ‘y’,
      where a semi join will never duplicate rows of ‘x’.

但是首先,转换您的数据,以便看起来像字符串(但实际上是因素)的列是实际上字符串。您可以table1[] <- lapply(table1, as.character)table2[] <- lapply(table2, as.character)执行此操作。或者,当您通过

创建data.frame时
table1 <- data.frame(id=c(1:7), animal=c("cat","cat","dog","dog","parakeet","lion","duck"),
                     stringsAsFactors=FALSE)
table2 <- data.frame(id=c(1:4), animal=c("cat","dog","parakeet","duck"),
                     stringsAsFactors=FALSE)

然后,你可以做

library(dplyr)
semi_join(table1, table2, by = "animal")

  id   animal
1  1      cat
2  2      cat
3  3      dog
4  4      dog
5  5 parakeet
6  7     duck

如果你不这样做(例如,如果你加入一个因素),代码将提供警告,因为table1table2有因素,而不是字符串。不应忽略此警告,因为在某些版本的dplyr中,对字符的强制不一致。在*join中使用dplyr函数之前,请将data.frame因子列转换为字符。

<强> PS

您还可以使用%in中的filter来提供相同的结果table1 %>% filter(animal %in% table2$animal)