如何使用与公共列相同的前10个字符移动所有行?

时间:2015-09-30 15:28:22

标签: unix awk sed grep

我有一个文件,其中有一列如下所示:

chr1 106623419
chr1 106623434
chr1 106623436
chr1 110611528
chr1 110611536
chr1 110611550
chr1 110611552
chr1 111216608
chr1 111216621
chr1 111216624
chr1 111216627
chr1 111216628

我想排序所以我选择所有共享相同前10个字符的行并将它们放在他们自己的列中

chr1 106623419  chr1 110611528  chr1 111216608
chr1 106623434  chr1 110611536  chr1 111216621
chr1 106623436  chr1 110611550  chr1 111216624
                chr1 110611552  chr1 111216627
                                chr1 111216628

3 个答案:

答案 0 :(得分:1)

Perl解决方案:

perl -ne 'chomp;
          push @{ $h{ substr $_, 0, 10 } }, substr $_, 10;
          }{
          while (grep @$_, values %h) {
              for my $p (keys %h) {
                  $s = shift @{ $h{$p} };
                  print $s ? "$p$s" : "\t", "\t";
              }
              print "\n";
          }' input.file

工作原理:它创建前缀的哈希映射 - >后缀数组。输入结束后(}{),它会逐个从这些数组中移出值并将它们打印到列。如果数组中没有值,则会打印选项卡。

答案 1 :(得分:0)

gawk第4版中的另一个解决方案

gawk 'BEGIN{max=0; OFS="\t"}
{
    key = int($2/1000000);
    d[key][length(d[key])+1] = $0;
    if(length(d[key])>max) 
        max = length(d[key]);
}
END{
    PROCINFO["sorted_in"] = "@ind_num_asc";
    for(i=1; i<=max; ++i){ 
        line = "";
        flag = 0;
        for(j in d){
            line = line (flag?OFS:"") d[j][i];
            flag = 1;
        } 
        print line;
    }
}' file

你得到:

chr1 106623419\tchr1 110611528\tchr1 111216608
chr1 106623434\tchr1 110611536\tchr1 111216621
chr1 106623436\tchr1 110611550\tchr1 111216624
\tchr1 110611552\tchr1 111216627
\t\tchr1 111216628
换句话说,

chr1 106623419  chr1 110611528  chr1 111216608
chr1 106623434  chr1 110611536  chr1 111216621
chr1 106623436  chr1 110611550  chr1 111216624
                chr1 110611552  chr1 111216627
                                chr1 111216628

<强>加成

只是为了好玩:在线 - python

from itertools import izip_longest, groupby
from string import strip
input = "file"

print("\n".join("\t".join(grp) for grp in izip_longest(*[map(strip, v) for k, v in groupby(open(input), key=lambda x: int(int(x.strip().split()[-1])/1000000))], fillvalue="")))

答案 2 :(得分:0)

只是为了好玩,红宝石:

ruby -e '
  groups = File.readlines(ARGV.shift)
               .map(&:chomp)
               .group_by {|item| item[0..9]}
               .values
               .sort
  max = groups.map(&:size).max
  # to transpose, the lists must all be the same length
  groups.collect {|list| list.fill("", list.length, max - list.length)}
        .transpose
        .each {|list| puts "%14s  %14s  %14s" % list}
' file