我有一个染色体名称载体
q<-c("1","10","11","12","13","14","15","16","17",
"18","19","20","21","22","2","3","4","5","6",
"7","8","9","X","Y","M")
我想将它们排序为
q<-c("1","2","3","4","5","6","7","8","9","10","11",
"12","13","14","15","16","17","18","19","20",
"21","22","X","Y","M")
我试着自己做订单
chrOrder <-c((1:22),"X","Y","M")
并像
一样使用它factor(cbind(q),levels=chrOrder)
但我仍然无法得到它。
编辑..... 我有类似的Senario但是很轻松。我有一个三列的数据框,名称,染色体,开始
df <-data.frame(name =c("a","a","a","b","b","b"), chrom = c(1,2,10,1,3,"X"), start=c(100,200,300,500,300,200))
我需要首先按名称排序,然后是染色体和开始。 结果应该像
name chrom start
a 1 100
a 10 300
a 2 200
b 1 500
b 3 300
b X 200
我不知道如何在下面使用chrOrder:
indata <- df[do.call(order,df[,c(name, chrom, start)]),];
答案 0 :(得分:3)
你的方法很好;你需要sort
得到的因素。您还应该设置ordered=TRUE
:
sort(factor(q,levels=chrOrder, ordered=TRUE))
不,正如已经指出的那样,你不必使用有序因子,但它肯定没有错 - 而且它可以说更好。这种情况的因素是你有明确定义的水平。请参阅this previous question on on factor
vs character
。
现在你已经编辑了你的问题,因为排序很简单,因此情况的情况更加强烈:
df <- data.frame(name=c("a","a","a","b","b","b"),
chrom = c(1,2,10,1,3,"X"),
start=c(100,200,300,500,300,200))
chrOrder <-c((1:22),"X","Y","M")
df$chrom <- factor(df$chrom, chrOrder, ordered=TRUE)
df[do.call(order, df[, c("name", "chrom", "start")]), ]
考虑到因子的水平,R确切知道如何对元素进行排序。
我跟随你的排序方法,但你可能想知道有更漂亮的方法,例如:
library(plyr)
df <- arrange(df, name, chrom, start)
答案 1 :(得分:3)
factor
和cbind
在这里没有做任何事情(嗯,factor
会这样做,但它并不是立即有用的。)
在您的具体情况下,只是说q <- chrOrder
解决问题,不是吗?
更一般地说,您可以使用match
获取由另一个向量x
中的项目排序的向量y
中的项目索引:
> match(chrOrder, q)
[1] 1 15 16 17 18 19 20 21 22 2 3 4 5 6 7 8 9 10 11 12 13 14 23 24 25
现在,您可以使用这些索引编入q
并获取订单:
> q[match(chrOrder, q)]
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15"
[16] "16" "17" "18" "19" "20" "21" "22" "X" "Y" "M"
...所以这是一般方法。例如,作为一个更有用的示例:假设您实际上有一个带有data.frame
列的chr
个基因,您可以按如下方式对数据框的行进行排序:
> # Some test data
> df <- data.frame(chr = q, value = rnbinom(length(q), 1, 0.01),
+ row.names = paste('gene', seq_along(q)))
> df <- df[match(chrOrder, df$chr), ]
> head(df)
chr value
gene 1 1 270
gene 15 2 51
gene 16 3 115
gene 17 4 15
gene 18 5 196
gene 19 6 34
...数据框列现在按其所需的顺序按其chr
列排序。