dplyr / data table根据条件

时间:2017-11-28 15:50:11

标签: r dplyr data.table

我的数据框如下所示:

quant_final_means <- data.frame( exposure_time_factor = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("200ms", "500ms"), class = "factor"),
protein_factor = c("background", "background", "EpQ_11_prot_0.25", "EpQ_11_prot_0.25", "EpQ_11_prot_0.5", "EpQ_11_prot_0.5", "EpQ_11_prot_1", "EpQ_11_prot_1", "rK39_prot_0.01", "rK39_prot_0.01", "rK39_prot_0.1", "rK39_prot_0.1", "serum", "serum", "background", "background", "EpQ_11_prot_0.25", "EpQ_11_prot_0.25", "EpQ_11_prot_0.5", "EpQ_11_prot_0.5", "EpQ_11_prot_1", "EpQ_11_prot_1", "rK39_prot_0.01", "rK39_prot_0.01", "rK39_prot_0.1", "rK39_prot_0.1", "serum", "serum"),
serum_factor = c("NEHC", "VL", "NEHC", "VL", "NEHC", "VL", "NEHC", "VL", "NEHC", "VL", "NEHC", "VL", "NEHC", "VL", "NEHC", "VL", "NEHC", "VL", "NEHC", "VL", "NEHC", "VL", "NEHC", "VL", "NEHC", "VL", "NEHC", "VL"),
avg_fluorescence = c(24139.615, 25796.83875, 24242.2557142857, 26019.7985714286, 25369.1971428571, 30682.4342857143, 26148.9542857143, 29101.9914285714, 24121.2328571429, 32350.1428571429, 24142.0014285714, 62122.6628571429, 57192.968, 53372.702, 40067.6985714286, 38922.4814285714, 40243.0528571429, 38932.78, 42290.35, 48867.015, 43334.3925, 46181.4542857143, 40383.8257142857, 57257.7614285714, 40378.8071428571, 65535, 65535, 65524.968) ) 

基本上我要做的是创建另一个列(称为avg_fluorescence_minus_background),其中我将减去background值(取决于exposure_time_factorserum_factor)来自每行的avg_fluorescence

例如,考虑第三行(exposure_time_factor=="200ms"serum_factor=="NEHC"我会得到24242.26-24139.62 = 102.64。对于第四行(exposure_time_factor=="200ms"serum_factor=="VL"我会有26019.80 - 25796.84 = 222.96等等,对于表格的所有行。

exposure_time_factor=="200ms开始,我尝试了以下代码:

quant_final_means %>% filter(exposure_time_factor=="200ms") %>% mutate(avg_fluorescence_minus_background = ifelse(test = serum_factor=="NEHC", yes = avg_fluorescence - (filter(protein_factor=="background", serum_factor=="NEHC")) %>% select(avg_fluorescence)), no = avg_fluorescence - (filter(protein_factor=="background", serum_factor=="VL")) %>% select(avg_fluorescence))

但是在尝试运行此代码时出现以下错误消息:

Error in mutate_impl(.data, dots) : 
  no applicable method for 'filter_' applied to an object of class "logical"

dplyrdata.table

的任何解决方案

1 个答案:

答案 0 :(得分:2)

我们可以通过serum_factor操作来创建一个组,然后创建列

library(dplyr)
quant_final_means %>% 
    filter(exposure_time_factor=="200ms") %>% 
    group_by(serum_factor) %>% 
    mutate(avg_fluorescence_minus_background = avg_fluorescence -
                                         avg_fluorescence[protein_factor=='background'])

spread到&#39;范围&#39;格式,然后这可以很容易地减去,最后改为长期&#39;格式为gather

library(dplyr)
library(tidyr)
quant_final_means %>% 
     filter(exposure_time_factor=="200ms")  %>%
     spread(serum_factor, avg_fluorescence) %>%
     mutate_at(vars('NEHC', 'VL'), funs(. - .[protein_factor=='background'])) %>%
     gather(serum_factor, avg_fluorescence, NEHC:VL)