我有一个文件,其中有一列如下所示:
chr1 106623419
chr1 106623434
chr1 106623436
chr1 110611528
chr1 110611536
chr1 110611550
chr1 110611552
chr1 111216608
chr1 111216621
chr1 111216624
chr1 111216627
chr1 111216628
我想排序所以我选择所有共享相同前10个字符的行并将它们放在他们自己的列中
chr1 106623419 chr1 110611528 chr1 111216608
chr1 106623434 chr1 110611536 chr1 111216621
chr1 106623436 chr1 110611550 chr1 111216624
chr1 110611552 chr1 111216627
chr1 111216628
答案 0 :(得分:1)
Perl解决方案:
perl -ne 'chomp;
push @{ $h{ substr $_, 0, 10 } }, substr $_, 10;
}{
while (grep @$_, values %h) {
for my $p (keys %h) {
$s = shift @{ $h{$p} };
print $s ? "$p$s" : "\t", "\t";
}
print "\n";
}' input.file
工作原理:它创建前缀的哈希映射 - >后缀数组。输入结束后(}{
),它会逐个从这些数组中移出值并将它们打印到列。如果数组中没有值,则会打印选项卡。
答案 1 :(得分:0)
gawk
第4版中的另一个解决方案
gawk 'BEGIN{max=0; OFS="\t"}
{
key = int($2/1000000);
d[key][length(d[key])+1] = $0;
if(length(d[key])>max)
max = length(d[key]);
}
END{
PROCINFO["sorted_in"] = "@ind_num_asc";
for(i=1; i<=max; ++i){
line = "";
flag = 0;
for(j in d){
line = line (flag?OFS:"") d[j][i];
flag = 1;
}
print line;
}
}' file
你得到:
chr1 106623419\tchr1 110611528\tchr1 111216608 chr1 106623434\tchr1 110611536\tchr1 111216621 chr1 106623436\tchr1 110611550\tchr1 111216624 \tchr1 110611552\tchr1 111216627 \t\tchr1 111216628换句话说,
:
chr1 106623419 chr1 110611528 chr1 111216608 chr1 106623434 chr1 110611536 chr1 111216621 chr1 106623436 chr1 110611550 chr1 111216624 chr1 110611552 chr1 111216627 chr1 111216628
<强>加成强>
只是为了好玩:在线 - python
from itertools import izip_longest, groupby
from string import strip
input = "file"
print("\n".join("\t".join(grp) for grp in izip_longest(*[map(strip, v) for k, v in groupby(open(input), key=lambda x: int(int(x.strip().split()[-1])/1000000))], fillvalue="")))
答案 2 :(得分:0)
只是为了好玩,红宝石:
ruby -e '
groups = File.readlines(ARGV.shift)
.map(&:chomp)
.group_by {|item| item[0..9]}
.values
.sort
max = groups.map(&:size).max
# to transpose, the lists must all be the same length
groups.collect {|list| list.fill("", list.length, max - list.length)}
.transpose
.each {|list| puts "%14s %14s %14s" % list}
' file