具有df,如下所示
df <- read.table(text="name id_final id1 id2 id3
sample1 10.96311 4.767571 3.692556 2.966773
sample2 10.83782 11.61998 11.402257 10.301068
sample3 13.98669 12.123346 10.299306 8.85533
sample4 13.97313 12.200774 11.874366 11.013115
sample5 13.89532 10.712515 9.102278 9.832699
sample6 13.86255 11.808834 9.180613 8.813621", header=T, sep='\t')
head(df)
> head(df)
name id_final id1 id2 id3
1 sample1 10.96311 4.767571 3.692556 2.966773
2 sample2 10.83782 11.619980 11.402257 10.301068
3 sample3 13.98669 12.123346 10.299306 8.855330
4 sample4 13.97313 12.200774 11.874366 11.013115
5 sample5 13.89532 10.712515 9.102278 9.832699
6 sample6 13.86255 11.808834 9.180613 8.813621
需要做一些基本的数学运算,将每列与id_final列相除 并使用后缀with_log创建新列,这可以通过简单的mutate完成,如下所示。
df <- df %>%
mutate(id1_log = log2(id1/id_final),
id2_log = log2(id2/id_final),
id3_log = log2(id3/id_final))
head(df)
> head(df)
name id_final id1 id2 id3 id1_log id2_log id3_log
1 sample1 10.96311 4.767571 3.692556 2.966773 -1.2013308 -1.56996541 -1.88569067
2 sample2 10.83782 11.619980 11.402257 10.301068 0.1005330 0.07324483 -0.07328067
3 sample3 13.98669 12.123346 10.299306 8.855330 -0.2062667 -0.44150746 -0.65943661
4 sample4 13.97313 12.200774 11.874366 11.013115 -0.1956825 -0.23480474 -0.34343264
5 sample5 13.89532 10.712515 9.102278 9.832699 -0.3753018 -0.61029950 -0.49893967
6 sample6 13.86255 11.808834 9.180613 8.813621 -0.2313261 -0.59453027 -0.65338590
在给定的示例中,如果只有3列,这很容易,如果我有3列以上,我将如何自动执行此操作,每次键入此命令都不是很优雅。
mutate(id1_log = log2(id1/id_final),
id2_log = log2(id2/id_final),
id3_log = log2(id3/id_final))
为了提供更大的图像,我正在尝试编写一个可以在具有多个id1 ... n列的多个文件中使用的函数
答案 0 :(得分:2)
可以做到:
library(dplyr)
df %>% mutate_at(vars(matches("id\\d+$")), list(log = ~ log2(. / id_final)))
我们更改(用mutate_at
一次)所需的列-这些都与正则表达式id\\d+$
相匹配,而正则表达式id
基本上与以数字结尾并以id_final
开头的列名匹配(例如,避免捕获id_..
或任何其他log
列。
然后,我们提供包含所需转换的列表。您可以为转换提供一个名称,然后该名称会自动附加到列名称中。我们说_log
,所以列的结尾自动得到 name id_final id1 id2 id3 id1_log id2_log id3_log
1 sample1 10.96311 4.767571 3.692556 2.966773 -1.2013308 -1.56996541 -1.88569067
2 sample2 10.83782 11.619980 11.402257 10.301068 0.1005330 0.07324483 -0.07328067
3 sample3 13.98669 12.123346 10.299306 8.855330 -0.2062667 -0.44150746 -0.65943661
4 sample4 13.97313 12.200774 11.874366 11.013115 -0.1956825 -0.23480474 -0.34343264
5 sample5 13.89532 10.712515 9.102278 9.832699 -0.3753018 -0.61029950 -0.49893967
6 sample6 13.86255 11.808834 9.180613 8.813621 -0.2313261 -0.59453027 -0.65338590
;您可以在那里写其他任何东西。
如果您不提供名称,则将修改已经存在的列;如果这样做,您会得到像我们这样的其他人。
输出:
$user
答案 1 :(得分:1)
这是一个data.table
选项:
library(data.table)
cols <- names(df)[3:5] # first, select columns you are interested in (or names(df)[grepl("id\\d+$", names(df))])
setDT(df)[, paste(cols, "log", sep = "_") := lapply(.SD, function(x) log2(x/id_final)),
.SDcols = cols][] # apply { function(x) log2(x/id_final) } to selected columns
# output
name id_final id1 id2 id3 id1_log id2_log id3_log
1: sample1 10.96311 4.767571 3.692556 2.966773 -1.2013308 -1.56996541 -1.88569067
2: sample2 10.83782 11.619980 11.402257 10.301068 0.1005330 0.07324483 -0.07328067
3: sample3 13.98669 12.123346 10.299306 8.855330 -0.2062667 -0.44150746 -0.65943661
4: sample4 13.97313 12.200774 11.874366 11.013115 -0.1956825 -0.23480474 -0.34343264
5: sample5 13.89532 10.712515 9.102278 9.832699 -0.3753018 -0.61029950 -0.49893967
6: sample6 13.86255 11.808834 9.180613 8.813621 -0.2313261 -0.59453027 -0.65338590