我正在寻找这个问题的解决方案: 我有一个文件(制表符分隔),就像我在下面的blockquote中显示的那样。如你看到的 有与第一部分匹配的线条(粗体字段)。
chr4 164440449 165354407 G1 P8002-51-75
chr1 220871675 220962596 G2 P2368-132-84
chr1 220871675 220962596 G2 P2369-152-116
chr1 220871675 220962596 G2 P2371-180-82
chr1 220871675 220962596 G2 P2372-223-129
chr1 220871675 220962596 G2 P2373-153-96
chr1 220871675 220962596 G2 P2370-104-78
chr5 126198405 126416440 G3 P9333-135-146
chr5 126198405 126416440 G3 P9334-151-116
使用AWK或PERL如何设法获得以下输出,保留以制表符分隔的格式???一般的概念是尝试根据它的第一部分统一行,并追加最后一个字段
chr4 164440449 165354407 G1 P8002-51-75
chr1 220871675 220962596 G2 P2368-132-84 P2369-152-116 P2371-180-82 P2372-223-129 P2373-153-96 P2370-104-78
chr5 126198405 126416440 G3 P9333-135-146 P9334-151-116
一般的概念是尝试根据它的第一部分统一行,并附加最后一个字段
答案 0 :(得分:2)
while (<DATA>) {
($x, $y) = /^(.*)\s([-\w]+)$/;
push @{$hash{$x}}, $y;
}
while (($k, $v) = each %hash) {
print $k, join("\t", @{$v}), "\n";
}
__DATA__
chr4 164440449 165354407 G1 P8002-51-75
chr1 220871675 220962596 G2 P2368-132-84
chr1 220871675 220962596 G2 P2369-152-116
chr1 220871675 220962596 G2 P2371-180-82
chr1 220871675 220962596 G2 P2372-223-129
chr1 220871675 220962596 G2 P2373-153-96
chr1 220871675 220962596 G2 P2370-104-78
chr5 126198405 126416440 G3 P9333-135-146
chr5 126198405 126416440 G3 P9334-151-116
答案 1 :(得分:1)
使用perl
的一种方式:
perl -ane '
## Save all fields but the last one as the key to compare between rows.
$key = join qq|\t|, @F[ 0 .. $#F - 1 ];
## In first line or when current key is equal to previous key, save last
## field in an array and stop processing current row.
if ( $. == 1 || $key eq $pkey ) {
$pkey = $key;
push @value, $F[ $#F ];
next unless eof;
}
## At this point, keys between rows are different, so print previous
## key with its values and begin to save the new one.
printf qq|%s\n|, join qq|\t|, $pkey, @value;
@value = ();
push @value, $F[ $#F ];
## Exception: Last line with a new key, print it.
if ( eof && $pkey ne $key ) {
printf qq|%s\n|, join qq|\t|, $key, @value;
}
## Save previous key.
$pkey = $key;
' infile
假设infile
包含您问题的数据,输出将为:
chr4 164440449 165354407 G1 P8002-51-75
chr1 220871675 220962596 G2 P2368-132-84 P2369-152-116 P2371-180-82 P2372-223-129 P2373-153-96 P2370-104-78
chr5 126198405 126416440 G3 P9333-135-146 P9334-151-116