从另一列的值子集创建新列,匹配特定条件

时间:2021-07-07 22:49:36

标签: r data-wrangling

我不知道术语,但我需要将与我的治疗列的对照相关的值放到一个新列中,其中对照值与相同的位置和基因名称相关联。每次处理都会重复控制值,但这没关系。目标是在未来简化一些 z-score 操作。

这就是我所拥有的:

GENE    TREATMENT   POSITION  VALUE
gene1   treatmenta  1         112
gene1   treatmenta  2         134
gene1   treatmentb  1         124
gene1   treatmentb  2         115
gene1   control     1         205
gene1   control     2         223
gene2   treatmenta  1         123
gene2   treatmenta  2         149
gene2   treatmentb  1         132
gene2   treatmentb  2         116
gene2   control     1         258
gene2   control     2         235

这就是我想要的:

GENE    TREATMENT   POSITION  VALUE  CTRL_VALUE
gene1   treatmenta  1         112    205
gene1   treatmenta  2         134    223
gene1   treatmentb  1         124    205
gene1   treatmentb  2         115    223
gene2   treatmenta  1         123    258
gene2   treatmenta  2         149    235
gene2   treatmentb  1         132    258
gene2   treatmentb  2         116    235

我尝试摆弄 dplyr 的 inner_join 和 left_join,但值不匹配。我该如何解决这个问题?

1 个答案:

答案 0 :(得分:0)

我们按'GENE'、'POSITION'分组,根据围绕TREATMENT列构建的逻辑表达式创建新列'CTRL_VALUE',得到相应的'VALUE',然后filter取出'治疗'是'控制'

library(dplyr)
df1 %>%
     group_by(GENE, POSITION) %>% 
     mutate(CTRL_VALUE = VALUE[TREATMENT == "control"]) %>% 
     ungroup %>%
     filter(TREATMENT != 'control')

-输出

# A tibble: 8 x 5
  GENE  TREATMENT  POSITION VALUE CTRL_VALUE
  <chr> <chr>         <int> <int>      <int>
1 gene1 treatmenta        1   112        205
2 gene1 treatmenta        2   134        223
3 gene1 treatmentb        1   124        205
4 gene1 treatmentb        2   115        223
5 gene2 treatmenta        1   123        258
6 gene2 treatmenta        2   149        235
7 gene2 treatmentb        1   132        258
8 gene2 treatmentb        2   116        235

数据

df1 <- structure(list(GENE = c("gene1", "gene1", "gene1", "gene1", "gene1", 
"gene1", "gene2", "gene2", "gene2", "gene2", "gene2", "gene2"
), TREATMENT = c("treatmenta", "treatmenta", "treatmentb", "treatmentb", 
"control", "control", "treatmenta", "treatmenta", "treatmentb", 
"treatmentb", "control", "control"), POSITION = c(1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), VALUE = c(112L, 134L, 124L, 
115L, 205L, 223L, 123L, 149L, 132L, 116L, 258L, 235L)), 
class = "data.frame", row.names = c(NA, 
-12L))
相关问题