检查列值是否在其他两个列值的(范围)之间

时间:2018-10-19 03:19:10

标签: r

我有一个看起来像这样的数据框(数据框X):

id  number  found
1   5225    NA
2   2222    NA
3   3121    NA

我还有另一个看起来像这样的数据框(数据框Y):

id  number1  number2    
1   4000     6000
3   2500     3300
3   7000     8000

我要执行的操作是:对于“数据框X”的“数字”列中的每个值,搜索它是否等于或在数据框Y的“数字1”和“数字2”对值中的任何一个之间或之间。此“ number1”和“ number2”对值,其相应的“ id”必须与数据帧X中的“ id”匹配。如果所有设置都正确,那么我想在相应行的“ found”列中插入“ YES”在数据框X中:

id  number  found
1   5225    YES
2   2222    NA
3   3121    YES

我将如何去做?感谢您的帮助。

4 个答案:

答案 0 :(得分:6)

这里是使用fuzzy_join

的选项
library(fuzzy_join)
library(dplyr)
fuzzy_left_join(X, Y[-1], by = c("number" = "number1", "number" = "number2"), 
     match_fun  =list(`>=`, `<=`)) %>% 
    mutate(found = c(NA, "YES")[(!is.na(number1)) + 1]) %>% 
    select(names(X))
#    id number found
#1  1   5225   YES
#2  2   2222  <NA>
#3  3   3121   YES

或者另一个选择是使用data.table

进行非等额联接
library(data.table)
setDT(X)[, found := NULL]
X[Y, found := "YES", on = .(number >= number1, number <= number2)]
X
#   id number found
#1:  1   5225   YES
#2:  2   2222  <NA>
#3:  3   3121   YES

数据

X <- structure(list(id = 1:3, number = c(5225L, 2222L, 3121L), found = c(NA, 
  NA, NA)), class = "data.frame", row.names = c(NA, -3L))

Y <- structure(list(id = 1:3, number1 = c(4000L, 2500L, 7000L), number2 = c(6000L, 
    3300L, 8000L)), class = "data.frame", row.names = c(NA, -3L))

答案 1 :(得分:4)

我们可以使用x$number遍历每个sapply并检查它们是否在anyy$number1的{​​{1}}范围内,并相应地给出值。

y$number2

使用相同的逻辑,但使用x$found <- ifelse(sapply(x$number, function(p) any(y$number1 <= p & y$number2 >= p)),"YES", NA) x # id number found #1 1 5225 YES #2 2 2222 <NA> #3 3 3121 YES

replace

编辑

如果我们还想比较x$found <- replace(x$found, sapply(x$number, function(p) any(y$number1 <= p & y$number2 >= p)), "YES") 的值,我们可以做

id

答案 2 :(得分:4)

使用tidyverse函数,尤其是map_chr遍历每个数字:

library(tidyverse)
tbl1 <- read_table2(
"id   number  found
1    5225     NA
2    2222     NA
3    3121     NA"
)
tbl2 <- read_table2(
"id  number1  number2
1    4000   6000
2    2500   3300
3    7000   8000"
)

tbl1 %>%
  mutate(found = map_chr(
    .x = number,
    .f = ~ if_else(
      condition = any(.x > tbl2$number1 & .x < tbl2$number2),
      true = "YES",
      false = NA_character_
    )
  ))
#> # A tibble: 3 x 3
#>      id number found
#>   <int>  <int> <chr>
#> 1     1   5225 YES  
#> 2     2   2222 <NA> 
#> 3     3   3121 YES

reprex package(v0.2.0)于2018-10-18创建。

答案 3 :(得分:3)

使用sqldf

library(sqldf)
sql <- "SELECT DISTINCT x.id, x.number, "
sql <- paste0(sql, "CASE WHEN y.id IS NOT NULL THEN 'YES' END AS found ")
sql <- paste0(sql, "FROM X x LEFT JOIN Y y ON x.number BETWEEN y.number1 AND y.number2")
X <- sqldf(sql)

enter image description here