从包含每个组的多个观察的数据开始,如下所示:
set.seed(1)
my.df <- data.frame(
timepoint = rep(c(0, 1, 2), each= 3),
counts = round(rnorm(9, 50, 10), 0)
)
> my.df
timepoint counts
1 0 44
2 0 52
3 0 42
4 1 66
5 1 53
6 1 42
7 2 55
8 2 57
9 2 56
要在相对于timepoint
的每个timepoint == 0
处执行摘要计算,对于每个组,我需要传递timepoint == 0
的计数向量和组的计数向量(例如timepoint == 0
)任意函数,例如
NonsenseFunction <- function(x, y){
(mean(x) - mean(y)) / (1 - mean(y))
}
我可以使用dplyr
:
library(dplyr)
my.df %>%
group_by(timepoint) %>%
mutate(rep = paste0("r", 1:n())) %>%
left_join(x = ., y = filter(., timepoint == 0), by = "rep") %>%
group_by(timepoint.x) %>%
summarise(result = NonsenseFunction(counts.x, counts.y))
或data.table
:
library(data.table)
my.dt <- data.table(my.df)
my.dt[, rep := paste0("r", 1:length(counts)), by = timepoint]
merge(my.dt, my.dt[timepoint == 0], by = "rep", all = TRUE)[
, NonsenseFunction(counts.x, counts.y), by = timepoint.x]
仅当组之间的观察数相同时才有效。无论如何,观察结果并不匹配,因此使用临时rep
变量似乎很容易。
对于更一般的情况,我需要将基线值的矢量和组的值传递给任意(更复杂)的函数,是否有惯用的data.table
或{{1}使用所有组的分组操作的方式吗?
答案 0 :(得分:3)
这是直截了当的data.table方法:
my.dt[, f(counts, my.dt[timepoint==0, counts]), by=timepoint]
对于每个小组,这可能会一次又一次地抓住my.dt[timepoint==0, counts]
。您可以提前保存该值:
v = my.dt[timepoint==0, counts]
my.dt[, f(counts, v), by=timepoint]
...或者如果您不想将v
添加到环境中,可能
with(list(v = my.dt[timepoint==0, counts]),
my.dt[, f(counts, v), by=timepoint]
)
答案 1 :(得分:1)
您可以使用第二个参数将您感兴趣的组中的向量用作常量。
my.df %>%
group_by(timepoint) %>%
mutate(response = NonsenseFunction(counts, my.df$counts[my.df$timepoint == 0]))
或者如果您想事先制作它:
constant = = my.df$counts[my.df$timepoint == 0]
my.df %>%
group_by(timepoint) %>%
mutate(response = NonsenseFunction(counts, constant))
答案 2 :(得分:0)
你可以尝试,
library(dplyr)
my.df %>%
mutate(new = mean(counts[timepoint == 0])) %>%
group_by(timepoint) %>%
summarise(result = NonsenseFunction(counts, new))
# A tibble: 3 × 2
# timepoint result
# <dbl> <dbl>
#1 0 0.0000000
#2 1 0.1398601
#3 2 0.2097902