我不知道术语,但我需要将与我的治疗列的对照相关的值放到一个新列中,其中对照值与相同的位置和基因名称相关联。每次处理都会重复控制值,但这没关系。目标是在未来简化一些 z-score 操作。
这就是我所拥有的:
GENE TREATMENT POSITION VALUE
gene1 treatmenta 1 112
gene1 treatmenta 2 134
gene1 treatmentb 1 124
gene1 treatmentb 2 115
gene1 control 1 205
gene1 control 2 223
gene2 treatmenta 1 123
gene2 treatmenta 2 149
gene2 treatmentb 1 132
gene2 treatmentb 2 116
gene2 control 1 258
gene2 control 2 235
这就是我想要的:
GENE TREATMENT POSITION VALUE CTRL_VALUE
gene1 treatmenta 1 112 205
gene1 treatmenta 2 134 223
gene1 treatmentb 1 124 205
gene1 treatmentb 2 115 223
gene2 treatmenta 1 123 258
gene2 treatmenta 2 149 235
gene2 treatmentb 1 132 258
gene2 treatmentb 2 116 235
我尝试摆弄 dplyr 的 inner_join 和 left_join,但值不匹配。我该如何解决这个问题?
答案 0 :(得分:0)
我们按'GENE'、'POSITION'分组,根据围绕TREATMENT列构建的逻辑表达式创建新列'CTRL_VALUE',得到相应的'VALUE',然后filter
取出'治疗'是'控制'
library(dplyr)
df1 %>%
group_by(GENE, POSITION) %>%
mutate(CTRL_VALUE = VALUE[TREATMENT == "control"]) %>%
ungroup %>%
filter(TREATMENT != 'control')
-输出
# A tibble: 8 x 5
GENE TREATMENT POSITION VALUE CTRL_VALUE
<chr> <chr> <int> <int> <int>
1 gene1 treatmenta 1 112 205
2 gene1 treatmenta 2 134 223
3 gene1 treatmentb 1 124 205
4 gene1 treatmentb 2 115 223
5 gene2 treatmenta 1 123 258
6 gene2 treatmenta 2 149 235
7 gene2 treatmentb 1 132 258
8 gene2 treatmentb 2 116 235
df1 <- structure(list(GENE = c("gene1", "gene1", "gene1", "gene1", "gene1",
"gene1", "gene2", "gene2", "gene2", "gene2", "gene2", "gene2"
), TREATMENT = c("treatmenta", "treatmenta", "treatmentb", "treatmentb",
"control", "control", "treatmenta", "treatmenta", "treatmentb",
"treatmentb", "control", "control"), POSITION = c(1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), VALUE = c(112L, 134L, 124L,
115L, 205L, 223L, 123L, 149L, 132L, 116L, 258L, 235L)),
class = "data.frame", row.names = c(NA,
-12L))