我正在尝试从一张表(myChickWts)收集体重值,该表是在每个血液样本记录在另一张表(鸡血)之前一周收集的。我想获取从每个星期到每个血样的血液日期和相关权重的列表。我尝试了几种不同的方法,并不断在结果中包括血液样本日期的日期之后。
在此示例中,匹配项返回的日期都在血腥日期之前(1 / 9、1 / 11、1 / 13)和之后(1/15)。如何匹配这两个表?我也尝试了difference_join,但它在其他结果之前7天和之后7天返回了结果,再次返回的不是我想要的结果。
Chick Date.x (blood) Date.y (weight) Chick.y Weight.y
10 2019-01-14 2019-01-09 10 74
10 2019-01-14 2019-01-11 10 81
10 2019-01-14 2019-01-13 10 89
10 2019-01-14 2019-01-15 10 96
library(tidyverse)
library(lubridate)
library(fuzzyjoin)
导入数据(用于reprex的示例数据)
mychickwts <- datasets::ChickWeight %>%
mutate(Date = date("2019-01-01") + Time) %>%
select(Date, Chick, weight) %>%
filter(Chick <= 10)
chickblood <- data.frame(
Chick = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7,
8, 8, 8, 9, 9, 9, 10, 10, 10),
Date = date(c("2019-01-01", "2019-01-12", "2019-01-22", "2019-01-06",
"2019-01-15", "2019-01-22", "2019-01-05", "2019-01-07",
"2019-01-14", "2019-01-03", "2019-01-08", "2019-01-11", "2019-01-02",
"2019-01-20", "2019-01-23", "2019-01-12", "2019-01-16",
"2019-01-18", "2019-01-10", "2019-01-10", "2019-01-22", "2019-01-03",
"2019-01-04", "2019-01-08", "2019-01-06", "2019-01-14",
"2019-01-17", "2019-01-02", "2019-01-14", "2019-01-21")))
确定体重日期是否在血液检查日期的前一周。
compare <- function(a, b) {
(a - b) <= 7
}
获取过去7天内每个血液日期和所有匹配体重的表格。这不起作用。
chickblood %>%
fuzzy_left_join(
mychickwts,
by = c(
"Chick" = "Chick",
"Date" = "Date"
),
match_fun = list(`==`, `compare`)
)
我也尝试了difference_join,但是在这种情况下,我似乎无法弄清楚如何使其与小鸡匹配,并且它在日期之前和之后都返回。
chickblood %>%
difference_join(mychickwts, by = "Date",
max_dist = 7
)
我尝试使用lubridate的%within%,但没有运气。这会返回一个错误,我不确定为什么会这样。
chickblood %>%
fuzzy_left_join(
mychickwts,
by = c("Chick" = "Chick",
"Date" = "Date"),
match_fun = list("==", "%within%")
) %>%
arrange(Date.x)
Error in which(m) : argument to 'which' is not logical
答案 0 :(得分:0)
由于数据集不太大,您可以对“小鸡”进行常规的左联接,然后确定权重日期是否在血液检查日期之前的一周。从那里,您可以只保留所需的行。
library(tidyverse)
library(lubridate)
library(fuzzyjoin)
mychickwts$Chick <- as.numeric(mychickwts$Chick)
chickblood %>%
left_join(mychickwts, by = "Chick", suffix = c(".blood", ".wt")) %>%
mutate(wt_days_prior = Date.blood - Date.wt) %>%
mutate(wt_in_week_prior = wt_days_prior <= 7 & wt_days_prior >= 0) %>%
filter(wt_in_week_prior)
或者,如果您想在单个联接中执行此操作,则可能会执行类似的操作。
chickblood %>%
fuzzy_left_join(mychickwts, by = c("Chick", "Date"),
match_fun = list(`==`, function(x, y) x - y >= 0 & x - y <= 7)
)
将fuzzy_left_join
替换为fuzzy_inner_join
,以仅保留在前一周有体重日期的血液检查数据。