R:按字母顺序和数字顺序排列字符和矢量的字符串向量

时间:2014-01-14 06:29:14

标签: r sorting vector

我有一个包含字符和数字值的字符串向量。例如:

a=c("ILLUMINA:420:C2D7UACXX:1:1102:14591:91480","ILLUMINA:420:C2D7UACXX:1:1102:14592:3881","ILLUMINA:420:C2D7UACXX:1:1102:14592:37103","ILLUMINA:420:C2D7UACXX:1:1102:14592:37356")

我想订购矢量,以便字符按字母顺序排序,数字按数字排序。字符串的结构总是具有以下格式: "ILLUMINA:420:C2D7UACXX:1:<number>:<number>:<number>",所以实际上该命令仅适用于最后三个冒号分隔的数字。

我确实尝试mixedsort {gtools},但结果与使用sort

相同
  

sort.int,即:

> mixedsort(a)
[1] "ILLUMINA:420:C2D7UACXX:1:1102:14591:91480" "ILLUMINA:420:C2D7UACXX:1:1102:14592:37103"
[3] "ILLUMINA:420:C2D7UACXX:1:1102:14592:37356" "ILLUMINA:420:C2D7UACXX:1:1102:14592:3881"

显然,正确的顺序应该是:

[1] "ILLUMINA:420:C2D7UACXX:1:1102:14591:91480" "ILLUMINA:420:C2D7UACXX:1:1102:14592:3881" 
[3] "ILLUMINA:420:C2D7UACXX:1:1102:14592:37103" "ILLUMINA:420:C2D7UACXX:1:1102:14592:37356"

有没有立即解决方案?

3 个答案:

答案 0 :(得分:3)

编辑在OP澄清后完全更改解决方案

您可以提取最后3个元素和顺序,然后创建data.frame:

dat = read.table(text=sub('.*:1:([0-9]+):([0-9]+):([0-9]+)','\\1|\\2|\\3',a),sep='|')
 dat
    V1    V2    V3
1 1102 14591 91480
2 1102 14592  3881
3 1102 14592 37103
4 1102 14592 37356

然后您使用3列订购:

 a[with(dat,order(V1,V2,V3))]
[1] "ILLUMINA:420:C2D7UACXX:1:1102:14591:91480" "ILLUMINA:420:C2D7UACXX:1:1102:14592:3881" 
[3] "ILLUMINA:420:C2D7UACXX:1:1102:14592:37103" "ILLUMINA:420:C2D7UACXX:1:1102:14592:37356"

答案 1 :(得分:1)

gtools :: mixedsort确实适用于你的情况,实际上是:

> a=c("ILLUMINA:420:C2D7UACXX:1:1102:14591:91480",
      "ILLUMINA:420:C2D7UACXX:1:1102:14592:3881",
      "ILLUMINA:420:C2D7UACXX:1:1102:14592:37103",
      "ILLUMINA:420:C2D7UACXX:1:1102:14592:37356")
> 
> mixedsort(a)
[1] "ILLUMINA:420:C2D7UACXX:1:1102:14591:91480"
[2] "ILLUMINA:420:C2D7UACXX:1:1102:14592:3881" 
[3] "ILLUMINA:420:C2D7UACXX:1:1102:14592:37103"
[4] "ILLUMINA:420:C2D7UACXX:1:1102:14592:37356"

我正在使用gtools_3.4.2和R-3.2.0

答案 2 :(得分:0)

这是一个更快的解决方案:

fields.list = strsplit(a,split=":")
sort.dt = data.table(t(sapply(fields.list,function(x) as.numeric(c(x[5],x[6],x[7])))))
sorted.a = v[with(sort.dt,order(V1,V2,V3))]
> sorted.a
[1] "ILLUMINA:420:C2D7UACXX:1:1102:14591:91480" "ILLUMINA:420:C2D7UACXX:1:1102:14592:3881"  "ILLUMINA:420:C2D7UACXX:1:1102:14592:37103"
[4] "ILLUMINA:420:C2D7UACXX:1:1102:14592:37356"