我有这样的数据框,我想添加一列gene_richness_relative
。在此列中,gene_richness
处的days == 0
值应设置为100%作为计算基础。其他日子的相对值应该反映变化
我从一天后的data.frame开始:
str(df)
'data.frame': 584 obs. of 5 variables:
$ gene : Factor w/ 64 levels "araD","arfA",..: 1 2 3 4 8 9 10 11 12 13 ...
$ sample : Factor w/ 11 levels "","A1","A2","A3",..: 10 10 10 10 10 10 10 10 10 10 ...
$ days : num 0 0 0 0 0 0 0 0 0 0 ...
$ treatment : Factor w/ 2 levels "control","glyph": 1 1 1 1 1 1 1 1 1 1 ...
$ gene_richness: int 6 11 9 3 20 7 2 28 38 9 ...
看起来像这样:
gene sample days treatment gene_richness
1 araD B8 0 control 6
2 arfA B8 0 control 11
3 artI B8 0 control 9
4 bcsZ B8 0 control 3
5 czcD B8 0 control 20
6 fdhA B8 0 control 7
7 fdm B8 0 control 2
8 gyrA B8 0 control 28
9 gyrB B8 0 control 38
10 katE B8 0 control 9
11 merA B8 0 control 15
12 mlhB B8 0 control 6
13 mntB B8 0 control 11
14 nirS B8 0 control 10
15 norB B8 0 control 9
16 nosZ B8 0 control 7
17 nuoF B8 0 control 16
18 phnA B8 0 control 2
19 phnC B8 0 control 13
20 phnD B8 0 control 19
21 phnE B8 0 control 36
22 phnF B8 0 control 8
23 phnG B8 0 control 11
24 phnH B8 0 control 13
25 phnI B8 0 control 17
26 phnJ B8 0 control 15
27 phnK B8 0 control 13
28 phnL B8 0 control 13
29 phnM B8 0 control 19
30 phnN B8 0 control 8
申请:
df2 <- df[with(df, order(gene)), ]
我收到此输出
'data.frame': 584 obs. of 5 variables:
$ gene : Factor w/ 64 levels "araD","arfA",..: 1 1 1 1 1 1 1 1 1 1 ...
$ sample : Factor w/ 11 levels "","A1","A2","A3",..: 10 11 9 2 3 4 5 6 7 8 ...
$ days : num 0 22 71 0 3 7 14 22 43 71 ...
$ treatment : Factor w/ 2 levels "control","glyph": 1 1 1 2 2 2 2 2 2 2 ...
$ gene_richness: int 6 5 5 7 7 7 8 8 6 7 ...
看起来像这样:
gene sample days treatment gene_richness
1 araD B8 0 control 6
59 araD B9 22 control 5
117 araD B10 71 control 5
174 araD A1 0 glyph 7
230 araD A2 3 glyph 7
289 araD A3 7 glyph 7
347 araD A4 14 glyph 8
407 araD A5 22 glyph 8
466 araD A6 43 glyph 6
526 araD A7 71 glyph 7
2 arfA B8 0 control 11
60 arfA B9 22 control 4
118 arfA B10 71 control 4
175 arfA A1 0 glyph 6
231 arfA A2 3 glyph 8
290 arfA A3 7 glyph 10
348 arfA A4 14 glyph 11
408 arfA A5 22 glyph 9
467 arfA A6 43 glyph 6
527 arfA A7 71 glyph 5
3 artI B8 0 control 9
61 artI B9 22 control 8
119 artI B10 71 control 9
176 artI A1 0 glyph 4
232 artI A2 3 glyph 5
291 artI A3 7 glyph 5
349 artI A4 14 glyph 9
409 artI A5 22 glyph 7
468 artI A6 43 glyph 10
528 artI A7 71 glyph 15
所需的输出看起来像这样,与
完美配合library(data.table)
df2 <- setDT(df2)
df2[,gene_richness_relative := gene_richness/gene_richness[days == 0]*100, by = .(gene,treatment)]
来自丹尼斯的回答。
gene sample days treatment gene_richness gene_richness_relative
1: araD B8 0 control 6 100.00000
2: araD B9 22 control 5 83.33333
3: araD B10 71 control 5 83.33333
4: araD A1 0 glyph 7 100.00000
5: araD A2 3 glyph 7 100.00000
---
580: ydiF A3 7 glyph 3 100.00000
581: ydiF A4 14 glyph 2 66.66667
582: ydiF A5 22 glyph 5 166.66667
583: ydiF A6 43 glyph 4 133.33333
584: ydiF A7 71 glyph 4 133.33333
但是
library(dplyr)
df %>%
group_by(gene,treatment) %>%
mutate(gene_richness_relative = gene_richness/gene_richness[days == 0]*100)
返回
Fehler in mutate_impl(.data, dots) :
Column `gene_richness_relative` must be length 2 (the group size) or one, not 0
我真的很高兴,因为data.table方式有效,但你知道dplyr的问题是什么吗?
答案 0 :(得分:2)
library(dplyr)
df %>%
group_by(gene,treatment) %>%
mutate(gene_richness_relative = gene_richness/gene_richness[days == 0]*100)
# A tibble: 20 x 6
# Groups: gene, treatment [4]
gene sample days treatment gene_richness gene_richness_relative
<fctr> <fctr> <int> <fctr> <int> <dbl>
1 araD B8 0 control 6 100.00000
2 araD B9 22 control 5 83.33333
3 araD B10 71 control 5 83.33333
4 araD A1 0 treated 7 100.00000
5 araD A2 3 treated 7 100.00000
6 araD A3 7 treated 7 100.00000
7 araD A4 14 treated 8 114.28571
8 araD A5 22 treated 8 114.28571
或使用data.table
library(data.table)
df <- setDT(df)
df[,gene_richness_relative := gene_richness/gene_richness[days == 0]*100, by = .(gene,treatment)]