我最近开始学习R,所以请原谅我,如果这对某人来说是一个新手问题。我想从列#34; Bladder"中提取行。与其他列相比,哪个值高出5倍以上。
gene Adrenal Amygdala Bladder BoneMarrow
1007_s_at 10.46973369 11.26483864 100.43303178 9.907426976
1053_at 6.446570421 6.462840464 6.570665594 7.068326351
117_at 8.018137441 7.738652705 7.604989675 8.38937883
121_at 10.78168853 10.3223056 10.38043102 10.73936285
1255_g_at 5.625038847 6.132930765 5.526885199 5.448521716
1294_at 8.37142904 8.1019947 8.549260758 8.697436419
1316_at 6.237386633 6.429011484 6.083330287 6.295933456
1320_at 6.206410651 6.139873183 6.328348899 6.251521738
1405_i_at 6.588370219 5.949622255 7.420451672 8.823058974
预期结果
gene Adrenal Amygdala Bladder BoneMarrow
1007_s_at 10.46973369 11.26483864 100.43303178 9.907426976
我得到了这个答案有用,但我不知道如何申请多个栏目 select only rows if its value in a particular column is less than its value in the other column
感谢。
答案 0 :(得分:2)
您希望根据条件执行数据的子集。在这里,我假设您的数据位于名为df
的数据框中:
df[df$Bladder > apply(5 * subset(df, select=-c(gene, Bladder)), 1, max),]
这将选择df
列超过其他列最大值5倍的Bladder
行。我们使用Bladder
命令选择gene
和subset
以外的所有列,然后使用max
使用apply
计算行MARGIN
设置为1
(即第一个边距或多个行)。
使用帖子中的更新数据,我们得到:
## gene Adrenal Amygdala Bladder BoneMarrow
##1 1007_s_at 10.43303 11.26484 100.4697 9.907427
数据是:
df <- structure(list(gene = structure(1:9, .Label = c("1007_s_at",
"1053_at", "117_at", "121_at", "1255_g_at", "1294_at", "1316_at",
"1320_at", "1405_i_at"), class = "factor"), Adrenal = c(10.43303178,
6.446570421, 8.018137441, 10.78168853, 5.625038847, 8.37142904,
6.237386633, 6.206410651, 6.588370219), Amygdala = c(11.26483864,
6.462840464, 7.738652705, 10.3223056, 6.132930765, 8.1019947,
6.429011484, 6.139873183, 5.949622255), Bladder = c(100.46973369,
6.570665594, 7.604989675, 10.38043102, 5.526885199, 8.549260758,
6.083330287, 6.328348899, 7.420451672), BoneMarrow = c(9.907426976,
7.068326351, 8.38937883, 10.73936285, 5.448521716, 8.697436419,
6.295933456, 6.251521738, 8.823058974)), .Names = c("gene", "Adrenal",
"Amygdala", "Bladder", "BoneMarrow"), class = "data.frame", row.names = c(NA,
-9L))
答案 1 :(得分:0)
这个问题并没有得到很好的解答,所以我的回答可能不是你所期待的,但我认为你想要达到的目标非常重要和简单。我的示例使用了dplyer
库,它简化了数据框中值的过滤和选择。请注意,我在第一行中更改了 BoneMarrow 的值,这样 Bladder 将会大五倍以上。
大多数代码只是设置示例以便它可以重现;第一行和最后一行是问题的实际答案。
library(dplyr)
txt=
"gene,Adrenal,Amygdala,Bladder,BoneMarrow
1007_s_at,100.46973369,11.26483864,10.43303178,1.907426976
1053_at,6.446570421,6.462840464,6.570665594,7.068326351
117_at,8.018137441,7.738652705,7.604989675,8.38937883
121_at,10.78168853,10.3223056,10.38043102,10.73936285
1255_g_at,5.625038847,6.132930765,5.526885199,5.448521716
1294_at,8.37142904,8.1019947,8.549260758,8.697436419
1316_at,6.237386633,6.429011484,6.083330287,6.295933456
1320_at,6.206410651,6.139873183,6.328348899,6.251521738
1405_i_at,6.588370219,5.949622255,7.420451672,8.823058974"
df = read.table(textConnection(txt), header=TRUE, sep=',')
filter(df, Bladder >= BoneMarrow * 5) %>% select(Bladder)