dplyr:分组减去值(基于预过滤的行)

时间:2018-04-11 11:51:25

标签: r filter group-by dplyr mutate

我坚持(可能)非常明显的事情,但我无法弄清楚实际问题是什么。

DF <- data.frame(Gene = c(rep("A",8), rep("X",8)),
             Genotype = c(rep("WT",4),rep("mut",4),rep("WT",4),rep("mut",4)),
             TimePoint = c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4),
             Value = c(12.5,12.33,11,10,23.22,22,21.2,45.3,22,12,23,21.2,23.2,45.3,21,22))

我想做什么: 从组中的所有值中减去TimePoint == 1对应的值(此处:Group = Gene,Genotype)。

我希望输出如下代码所示:

DF %>% group_by(Gene, Genotype) %>% mutate(Diff = Value - first(Value))

但是,我会根据给定的TimePoint选择值,而不是第一个函数,而不必是每个组中的第一个。

我的想法是做这样的事情,但实际上并没有按预期使用分组数据:

DF %>% group_by(Gene, Genotype) %>% mutate(Diff = Value - filter(.,TimePoint == 1)$Value)

我真的不知道为什么分组数据没有正确地传输到过滤器语句?

1 个答案:

答案 0 :(得分:1)

分组步骤后,将“&#39;值”分组。通过使用&#39; TimePoint&#39;创建逻辑向量。即TimePoint == 1并从&#39;值&#39;

中减去它
DF %>%
   group_by(Gene, Genotype) %>%
   mutate((Diff = Value - Value[TimePoint == 1]))

或另一个选项是match来获取索引

DF %>%
   group_by(Gene, Genotype) %>% 
   mutate((Diff = Value - Value[match(1, TimePoint)]))

如果我们确实需要使用filter,请过滤数据集,然后执行right_join并获得差异

DF %>%
   filter(TimePoint == 1) %>% 
   select(Gene, Genotype, Value1 = Value)  %>% 
   right_join(DF) %>% 
   mutate(Diff = Value - Value1) %>%
   select(-Value1)

以OP filter提取的&#39;价值&#39;在filter未遵循group_by约束并且仅通过回收减去

之后