我有两个数据框,希望在animals
中的table1
列中animals
列中table2
列过滤table1
列,同时保留table1
中的多个类别的动物1}}对于猫和狗。最终结果应与match
相同,但删除了“lion”。应该还有两只“猫”和两只“狗”。
我不知道如何作为一个新手来处理这个问题。我觉得答案涉及dplyr
函数或某种类型的连接?如果可能,我希望使用reshape2
或dplyr
方法,特别是如果有办法使用merge
连接函数。我对table1 <- data.frame(id=c(1:7), animal=c("cat","cat","dog","dog","parakeet","lion","duck"))
table2 <- data.frame(id=c(1:4), animal=c("cat","dog","parakeet","duck"))
基函数也不是很熟悉。
以下是两个数据框的代码:
ping 'www.ibm.com'
Verifying connection to host system e2874.x.akamaiedge.net at address
23.64.119.102.
PING reply 1 from 23.64.119.102 took 20 ms. 256 bytes. TTL 55.
PING reply 2 from 23.64.119.102 took 20 ms. 256 bytes. TTL 55.
PING reply 3 from 23.64.119.102 took 20 ms. 256 bytes. TTL 55.
PING reply 4 from 23.64.119.102 took 20 ms. 256 bytes. TTL 55.
PING reply 5 from 23.64.119.102 took 20 ms. 256 bytes. TTL 55.
Round-trip (in milliseconds) min/avg/max = 20/20/20.
Connection verification statistics: 5 of 5 successful (100 %).
答案 0 :(得分:4)
你可以像这样使用%in%
:
table1[table1$animal %in% table2$animal,]
id animal
1 1 cat
2 2 cat
3 3 dog
4 4 dog
5 5 parakeet
7 7 duck
答案 1 :(得分:1)
使用data.table
library(data.table)
setDT(table1)[table2[-1], on = "animal"]
# id animal
#1: 1 cat
#2: 2 cat
#3: 3 dog
#4: 4 dog
#5: 5 parakeet
#6: 7 duck
答案 2 :(得分:0)
您可以在dplyr中使用semi_join
执行此操作,这将
return all rows from ‘x’ where there are matching
values in ‘y’, keeping just columns from ‘x’.
A semi join differs from an inner join because an inner join
will return one row of ‘x’ for each matching row of ‘y’,
where a semi join will never duplicate rows of ‘x’.
但是首先,转换您的数据,以便看起来像字符串(但实际上是因素)的列是实际上字符串。您可以table1[] <- lapply(table1, as.character)
和table2[] <- lapply(table2, as.character)
执行此操作。或者,当您通过
table1 <- data.frame(id=c(1:7), animal=c("cat","cat","dog","dog","parakeet","lion","duck"),
stringsAsFactors=FALSE)
table2 <- data.frame(id=c(1:4), animal=c("cat","dog","parakeet","duck"),
stringsAsFactors=FALSE)
然后,你可以做
library(dplyr)
semi_join(table1, table2, by = "animal")
给
id animal
1 1 cat
2 2 cat
3 3 dog
4 4 dog
5 5 parakeet
6 7 duck
如果你不这样做(例如,如果你加入一个因素),代码将提供警告,因为table1
和table2
有因素,而不是字符串。不应忽略此警告,因为在某些版本的dplyr中,对字符的强制不一致。在*join
中使用dplyr
函数之前,请将data.frame因子列转换为字符。
<强> PS 强>
您还可以使用%in
中的filter
来提供相同的结果table1 %>% filter(animal %in% table2$animal)