Question

我最近开始学习R，所以请原谅我，如果这对某人来说是一个新手问题。我想从列＃34; Bladder＆＃34;中提取行。与其他列相比，哪个值高出5倍以上。

gene     Adrenal    Amygdala    Bladder BoneMarrow
1007_s_at   10.46973369 11.26483864 100.43303178    9.907426976
1053_at 6.446570421 6.462840464 6.570665594 7.068326351
117_at  8.018137441 7.738652705 7.604989675 8.38937883
121_at  10.78168853 10.3223056  10.38043102 10.73936285
1255_g_at   5.625038847 6.132930765 5.526885199 5.448521716
1294_at 8.37142904  8.1019947   8.549260758 8.697436419
1316_at 6.237386633 6.429011484 6.083330287 6.295933456
1320_at 6.206410651 6.139873183 6.328348899 6.251521738
1405_i_at   6.588370219 5.949622255 7.420451672 8.823058974

预期结果

gene     Adrenal    Amygdala    Bladder BoneMarrow
1007_s_at   10.46973369 11.26483864 100.43303178    9.907426976

我得到了这个答案有用，但我不知道如何申请多个栏目 select only rows if its value in a particular column is less than its value in the other column

感谢。

Answer 1

您希望根据条件执行数据的子集。在这里，我假设您的数据位于名为df的数据框中：

df[df$Bladder > apply(5 * subset(df, select=-c(gene, Bladder)), 1, max),]

这将选择df列超过其他列最大值5倍的Bladder行。我们使用Bladder命令选择gene和subset以外的所有列，然后使用max使用apply计算行MARGIN设置为1（即第一个边距或多个行）。

使用帖子中的更新数据，我们得到：

##       gene  Adrenal Amygdala  Bladder BoneMarrow
##1 1007_s_at 10.43303 11.26484 100.4697   9.907427

数据是：

df <- structure(list(gene = structure(1:9, .Label = c("1007_s_at", 
"1053_at", "117_at", "121_at", "1255_g_at", "1294_at", "1316_at", 
"1320_at", "1405_i_at"), class = "factor"), Adrenal = c(10.43303178, 
6.446570421, 8.018137441, 10.78168853, 5.625038847, 8.37142904, 
6.237386633, 6.206410651, 6.588370219), Amygdala = c(11.26483864, 
6.462840464, 7.738652705, 10.3223056, 6.132930765, 8.1019947, 
6.429011484, 6.139873183, 5.949622255), Bladder = c(100.46973369, 
6.570665594, 7.604989675, 10.38043102, 5.526885199, 8.549260758, 
6.083330287, 6.328348899, 7.420451672), BoneMarrow = c(9.907426976, 
7.068326351, 8.38937883, 10.73936285, 5.448521716, 8.697436419, 
6.295933456, 6.251521738, 8.823058974)), .Names = c("gene", "Adrenal", 
"Amygdala", "Bladder", "BoneMarrow"), class = "data.frame", row.names = c(NA, 
-9L))

Answer 2

这个问题并没有得到很好的解答，所以我的回答可能不是你所期待的，但我认为你想要达到的目标非常重要和简单。我的示例使用了dplyer库，它简化了数据框中值的过滤和选择。请注意，我在第一行中更改了 BoneMarrow 的值，这样 Bladder 将会大五倍以上。

大多数代码只是设置示例以便它可以重现;第一行和最后一行是问题的实际答案。

library(dplyr)

txt=
"gene,Adrenal,Amygdala,Bladder,BoneMarrow
1007_s_at,100.46973369,11.26483864,10.43303178,1.907426976
1053_at,6.446570421,6.462840464,6.570665594,7.068326351
117_at,8.018137441,7.738652705,7.604989675,8.38937883
121_at,10.78168853,10.3223056,10.38043102,10.73936285
1255_g_at,5.625038847,6.132930765,5.526885199,5.448521716
1294_at,8.37142904,8.1019947,8.549260758,8.697436419
1316_at,6.237386633,6.429011484,6.083330287,6.295933456
1320_at,6.206410651,6.139873183,6.328348899,6.251521738
1405_i_at,6.588370219,5.949622255,7.420451672,8.823058974"

df = read.table(textConnection(txt), header=TRUE, sep=',')

filter(df, Bladder >= BoneMarrow * 5) %>% select(Bladder)

如果特定列中的值大于其他列中的值的5倍，则仅选择行

2 个答案: