我有一个带有排序行的大型7列文本文件,如下所示:
gi|352964122|gb|JH286168.1| 00884 C C 14 1.00 u
gi|352964122|gb|JH286168.1| 00884 C C 26 0.76 p
gi|352964122|gb|JH286168.1| 00884 C C 33 0.89 f
gi|352964122|gb|JH286168.1| 00885 G G 14 1.00 u
gi|352964122|gb|JH286168.1| 00885 A A 30 0.84 f
gi|352964122|gb|JH286168.1| 00886 T T 31 0.81 f
我需要做的是,如果前两列在连续行中相同,则将其余列附加到第一行。可以有1个,2个或3个“相似”行,如果小于3,我需要占位符来保持列完好无损。所以上面的内容如下所示:
gi|352964122|gb|JH286168.1| 00884 C C 14 1.00 u C C 26 0.76 p C C 33 0.89 f
gi|352964122|gb|JH286168.1| 00885 G G 14 1.00 u - - - ------------ G G 33 0.89 f
gi|352964122|gb|JH286168.1| 00886 T T 31 0.81 f - - - ---- - - - ------ - - -- ----- - -
我用AWK尝试了很多方法,但是不能完全理解它。怎么可能这样做?
答案 0 :(得分:1)
这应该这样做: (编辑:我没注意到你需要占位符。我会调查它......)
awk '
$1 == last1 && $2 == last2 {
printf " %s %s %s %s %s",$3,$4,$5,$6,$7;
last1 = $1; last2 = $2;
next;
}
{
$1 = $1; # normalize spacing
printf "%s%s", NR==1?"":"\n", $0;
last1 = $1; last2 = $2;
}
END { print ""; }
' file
答案 1 :(得分:1)
我不确定你是如何获得第二排的,但这至少与我如何理解目标相符:
awk '
{
head=$1 " " $2
tail=$3 " " $4 " " $5 " " $6 " "$7
if(previous!=head) {
if(previous!="") printf("%s %s %s %s\n",previous,p[1],p[2],p[3])
previous=head
i=1
p[i]=tail
p[2]=p[3]="- - - -"
} else {
i=i+1
p[i]=tail
}
}
END { printf("%s %s %s %s\n",previous,p[1],p[2],p[3]) }'
输出:
gi|352964122|gb|JH286168.1| 00884 C C 14 1.00 u C C 26 0.76 p C C 33 0.89 f
gi|352964122|gb|JH286168.1| 00885 G G 14 1.00 u A A 30 0.84 f - - - -
gi|352964122|gb|JH286168.1| 00886 T T 31 0.81 f - - - - - - - -
答案 2 :(得分:1)
$ cat tst.awk
BEGIN { maxRecs = 3 }
function prta( i, dflt) {
dflt = a[1]
gsub(/[^[:space:]]+/,"-",dflt)
printf "%s ", key
for (i=1; i<=maxRecs; i++) {
printf "%s%s", (i in a ? a[i] : dflt), (i<maxRecs ? OFS : ORS)
delete a[i]
}
numRecs = 0
}
{ key = $1 FS $2 }
prev && (key != prev) { prta() }
{
$1 = $1
sub(/([^[:space:]]+[[:space:]]+){2}/,"")
a[++numRecs] = $0
prev = key
}
END { prta() }
$
$ awk -f tst.awk file
gi|352964122|gb|JH286168.1| 00885 C C 14 1.00 u C C 26 0.76 p C C 33 0.89 f
gi|352964122|gb|JH286168.1| 00886 G G 14 1.00 u A A 30 0.84 f - - - - -
gi|352964122|gb|JH286168.1| 00886 T T 31 0.81 f - - - - - - - - - -