从数据框中提取列并对其进行排序

时间:2013-05-06 12:08:01

标签: r dataframe subset

我有一个如下所示的数据框:

structure(list(Mash_pear = c(0.192474082559755, 0.679726904159742, 
0.778564545349054, 0.573745352397321, 0.56633658385284, 0.472559997318901, 
0.462635414367878, 0.562128414492567, 0.354624921832056, 0.64532681437697
), tRap_pear = c(0.0350096175177328, 0.234255507711743, 0.23714999195134, 
0.185536020521134, 0.191585098617356, 0.201402054387186, 0.220911538536031, 
0.216072802572045, 0.132247101763063, 0.172753098431029), Beeml_pear = c(0.179209909971615, 
0.79129167285928, 0.856908302056589, 0.729078080521886, 0.709346164378725, 
0.669599784720647, 0.585348196746785, 0.639355942917055, 0.544909349368496, 
0.794652394149651), Mash_pear2080 = c(0.823944540480775, 0.816630852343513, 
0.81134728399675, 0.801065036203532, 0.799630945085954, 0.799195606444727, 
0.798637867344115, 0.798478922129054, 0.798090734787886, 0.797673368802285
)), .Names = c("Mash_pear", "tRap_pear", "Beeml_pear", "Mash_pear2080"
), row.names = c("Aft1", "Alx3_3418.2", "Alx4_1744.1", "Arid3a_3875.1_v1_primary", 
"Arid3a_3875.1_v2_primary", "Arid3a_3875.2_v1_primary", "Arid3a_3875.2_v2_primary", 
"Arid5a_3770.2_v1_primary", "Arid5a_3770.2_v2_primary", "Aro80"
), class = "data.frame")

现在我有了对这些分数进行排名的想法,但每个列都应该单独排名,保持行名称不变。 所以我试图逐个提取所有列并对它们进行排序。我尝试订购1列时遇到的问题正在发生。即我的数据帧消失并成为数值的向量,正如我已经指出的那样,我需要数据框(rownames)保持原样,只是有序。我现在正在处理的代码在这里:

rowname<-rownames(pearframe)
col1<-subset(pearframe, select=1)[order(pearframe),]
col2<-subset(pearframe, select=2)[order(pearframe),]
col3<-subset(pearframe, select=3)[order(pearframe),]
col4<-subset(pearframe, select=4)[order(pearframe),]

这删除了我的rownames和原始数据框架结构。这使我无法对我的数据进行排名。所以实际的问题是:我如何对每列的数据帧进行排序/排序,并创建4个新帧,每个帧有1个有序列。最终我希望有一个表格,其中存在每个排名框架的rownames和分数。

2 个答案:

答案 0 :(得分:4)

我需要两次使用drop=FALSE,我认为:

subset(pearframe, select=1,drop=FALSE)[order(pearframe[,1]),,drop=FALSE]

其他情况看起来一样,两个地方都会增加1。

编辑:此外,这更简洁:

pearframe[order(pearframe[,1]),1,drop=FALSE]

EDIT2:这就是如何使用这种方法制作最终的data.frame:

col_list <- list()
for (i in 1:4){
    col_list[[i]] <- pearframe[order(pearframe[,i]),i,drop=FALSE]
    col_rnname <- paste(names(pearframe)[[i]],"rn",sep=".")
    col_list[[i]][[col_rnname]] <- rownames(col_list[[i]])
    rownames(col_list[[i]]) <- NULL
}
col_mat <- do.call(cbind,col_list)

答案 1 :(得分:4)

另一种方法是利用data.frame只是一堆列表这一事实。您可以使用lapply,这会为您提供data.frames的列表。您可以按列名访问每个名称,并根据需要将其分配给新的df:

ranks <- lapply( df , function(x) data.frame( rank = rownames(df)[ order( x ) ] , score = x[ order(x) ] ) )
names(ranks) <- names(df)

head(ranks[["Mash_pear"]])
#                     rank     score
#1                     Aft1 0.1924741
#2 Arid5a_3770.2_v2_primary 0.3546249
#3 Arid3a_3875.2_v2_primary 0.4626354
#4 Arid3a_3875.2_v1_primary 0.4725600
#5 Arid5a_3770.2_v1_primary 0.5621284
#6 Arid3a_3875.1_v2_primary 0.5663366

head(ranks[["tRap_pear"]])
#                     rank      score
#1                     Aft1 0.03500962
#2 Arid5a_3770.2_v2_primary 0.13224710
#3                    Aro80 0.17275310
#4 Arid3a_3875.1_v1_primary 0.18553602
#5 Arid3a_3875.1_v2_primary 0.19158510
#6 Arid3a_3875.2_v1_primary 0.20140205