我有两个数据帧,如下所示:
df1 <- data.frame(fruit=c("apple", "blackberry", "orange", "pear", "grape"),
color=c("black", "purple", "blue", "green", "red"),
quantity1=c(1120, 7600, 21409, 120498, 25345),
quantity2=c(1200, 7898, 21500, 140985, 27098),
taste=c("sweet", "bitter", "sour", "salty", "spicy"))
df2 <- data.frame(fruit=c("apple", "orange", "pear"),
color=c("black", "yellow", "green"),
quantity=c(1145, 65094, 120500))
我想基于df2中的行删除df1中的行,它们必须符合所有3个条件:
我的示例的输出应类似于:
df3 <- data.frame(fruit=c("blackberry", "orange", "grape"),
color=c("purple", "blue", "red"),
quantity1=c(7600, 21409, 25345),
quantity2=c(21500, 7898, 27098),
taste=c("bitter", "sour", "spicy"))
答案 0 :(得分:0)
通过data.table
,我们可以使用非等额联接
library(data.table)
setDT(df1)[!df2, on = .(fruit, color, quantity1 <= quantity,
quantity2 >= quantity)]
# fruit color quantity1 quantity2 taste
#1: blackberry purple 7600 7898 bitter
#2: orange blue 21409 21500 sour
#3: grape red 25345 27098 spicy
或者对fuzzy_anti_join
使用与this帖子中所示的方法相同的方法
答案 1 :(得分:0)
您可以使用fuzzy_anti_join
软件包中的fuzzyjoin
:
fuzzyjoin::fuzzy_anti_join(df1, df2,
by = c('fruit', 'color','quantity1' = 'quantity', 'quantity2' = 'quantity'),
match_fun = list(`==`, `==`, `<=`, `>=`))
# A tibble: 3 x 5
# fruit color quantity1 quantity2 taste
# <chr> <chr> <dbl> <dbl> <chr>
#1 blackberry purple 7600 7898 bitter
#2 orange blue 21409 21500 sour
#3 grape red 25345 27098 spicy
答案 2 :(得分:0)
我想知道是否也可以使用tidyverse
:
library(tidyverse)
df1 %>%
left_join(df2, by = c("fruit", "color")) %>%
filter(is.na(quantity) | quantity <= quantity1 | quantity >= quantity2)
#> fruit color quantity1 quantity2 taste quantity
#> 1 blackberry purple 7600 7898 bitter NA
#> 2 orange blue 21409 21500 sour NA
#> 3 grape red 25345 27098 spicy NA