Question

从包含每个组的多个观察的数据开始，如下所示：

set.seed(1)
my.df <- data.frame(
  timepoint = rep(c(0, 1, 2), each= 3),
  counts = round(rnorm(9, 50, 10), 0)
)
> my.df
  timepoint counts
1         0     44
2         0     52
3         0     42
4         1     66
5         1     53
6         1     42
7         2     55
8         2     57
9         2     56

要在相对于timepoint的每个timepoint == 0处执行摘要计算，对于每个组，我需要传递timepoint == 0的计数向量和组的计数向量（例如timepoint == 0）任意函数，例如

NonsenseFunction <- function(x, y){
  (mean(x) - mean(y)) / (1 - mean(y))
}

我可以使用dplyr：

从此表中获取所需的输出

library(dplyr)
my.df %>%
  group_by(timepoint) %>%
  mutate(rep = paste0("r", 1:n())) %>%
  left_join(x = ., y = filter(., timepoint == 0), by = "rep") %>%
  group_by(timepoint.x) %>%
  summarise(result = NonsenseFunction(counts.x, counts.y))

或data.table：

library(data.table)
my.dt <- data.table(my.df)
my.dt[, rep := paste0("r", 1:length(counts)), by = timepoint]
merge(my.dt, my.dt[timepoint == 0], by = "rep", all = TRUE)[
  , NonsenseFunction(counts.x, counts.y), by = timepoint.x]

仅当组之间的观察数相同时才有效。无论如何，观察结果并不匹配，因此使用临时rep变量似乎很容易。

对于更一般的情况，我需要将基线值的矢量和组的值传递给任意（更复杂）的函数，是否有惯用的data.table或{{1}使用所有组的分组操作的方式吗？

Answer 1

这是直截了当的data.table方法：

my.dt[, f(counts, my.dt[timepoint==0, counts]), by=timepoint]

对于每个小组，这可能会一次又一次地抓住my.dt[timepoint==0, counts]。您可以提前保存该值：

v = my.dt[timepoint==0, counts]
my.dt[, f(counts, v), by=timepoint]

...或者如果您不想将v添加到环境中，可能

with(list(v = my.dt[timepoint==0, counts]), 
  my.dt[, f(counts, v), by=timepoint]
)

Answer 2

您可以使用第二个参数将您感兴趣的组中的向量用作常量。

my.df %>%
    group_by(timepoint) %>%
    mutate(response = NonsenseFunction(counts, my.df$counts[my.df$timepoint == 0]))

或者如果您想事先制作它：

constant = = my.df$counts[my.df$timepoint == 0]
my.df %>%
    group_by(timepoint) %>%
    mutate(response = NonsenseFunction(counts, constant))

Answer 3

你可以尝试，

library(dplyr)
my.df %>% 
    mutate(new = mean(counts[timepoint == 0])) %>% 
    group_by(timepoint) %>% 
    summarise(result = NonsenseFunction(counts, new))

# A tibble: 3 × 2
#  timepoint    result
#      <dbl>     <dbl>
#1         0 0.0000000
#2         1 0.1398601
#3         2 0.2097902

相对于＆＃34;基线＆＃34;所有组的分组操作组，有多个观察

3 个答案: