我有以下两个数据帧dat1
和dat2
:
library(tidyverse)
dat1 <- tribble(
~"subj", ~"drive", ~"measure",
"A", 1, 1,
"A", 1, 2,
"A", 1, 3,
"A", 1, 4,
"A", 1, 5,
"A", 2, 1,
"A", 2, 2,
"A", 2, 3,
"A", 2, 4,
"A", 2, 5,
"B", 1, 1,
"B", 1, 2,
"B", 1, 3,
"B", 1, 4,
"B", 1, 5,
"B", 2, 1,
"B", 2, 2,
"B", 2, 3,
"B", 2, 4,
"B", 2, 5,
)
dat2 <- tribble(
~"subj", ~"drive", ~"measure",
"A", 1, 3,
"B", 2, 4
)
我正在尝试根据以下条件过滤dat1
中的记录:
subj
的drive
和dat1
列应与subj
的{{1}}和drive
列相匹配,并且dat2
中的measure
值应在dat1
中measure
值的范围内。在此示例中,将范围相隔一个单位。因此,我的结果数据框将如下所示:
dat2
我知道result <- tribble(
~"subj", ~"drive", ~"measure",
"A", 1, 2,
"A", 1, 3,
"A", 1, 4,
"B", 2, 3,
"B", 2, 4,
"B", 2, 5
)
,但是它不允许我根据范围进行过滤。有什么想法可以解决这个问题吗?基于dplyr::semi_join()
的解决方案将会很棒!
答案 0 :(得分:4)
编辑为使用GG的注释中提到的本机sqldf字符串替换,而不是sprintf。
library(sqldf)
check_range <- 1
fn$sqldf('
select one.*
from dat1 one
join dat2 two
on one.subj = two.subj
and one.drive = two.drive
and one.measure - two.measure between -`check_range` and `check_range`
')
# subj drive measure
# 1 A 1 2
# 2 A 1 3
# 3 A 1 4
# 4 B 2 3
# 5 B 2 4
# 6 B 2 5
答案 1 :(得分:3)
一种选择是先做inner_join
,然后再使用between
library(dplyr)
inner_join(dat1, dat2, by = c('subj', 'drive')) %>%
group_by(subj, drive) %>%
filter(between(measure.x, first(measure.y)-1, first(measure.y) + 1)) %>%
select(measure = measure.x)
# A tibble: 6 x 3
# Groups: subj, drive [2]
# subj drive measure
# <chr> <dbl> <dbl>
#1 A 1 2
#2 A 1 3
#3 A 1 4
#4 B 2 3
#5 B 2 4
#6 B 2 5
或带有data.table
library(data.table)
setDT(dat1)[setDT(dat2), .SD[between(measure, i.measure -1,
i.measure + 1)], on = .(subj, drive), by = .EACHI]
# subj drive measure
#1: A 1 2
#2: A 1 3
#3: A 1 4
#4: B 2 3
#5: B 2 4
#6: B 2 5
答案 2 :(得分:1)
为了完整起见,这也是使用非装备联接的解决方案:
library(data.table)
range <- 1
idx <- setDT(dat1)[
setDT(dat2)[, .(subj, drive, lower = measure - range, upper = measure + range)],
on = .(subj, drive, measure >= lower, measure <= upper), which = TRUE]
dat1[idx]
subj drive measure 1: A 1 2 2: A 1 3 3: A 1 4 4: B 2 3 5: B 2 4 6: B 2 5