awk检查文本文件中标题行的顺序

时间:2017-04-22 13:24:47

标签: bash awk

在下面的bash中,我尝试使用awk来验证headers文件tab-delimited之间key的顺序是否完全相同(text files }具有字段和print FILENAME的顺序,通常在目录中为3。

如果订单正确或在文件之间找到匹配项,则print FILENAME具有预期的字段顺序,但如果订单在文件之间不匹配,则$i会导致“订单of $ i不正确“,其中key是使用Index Chr Start End Ref Alt Inheritance Score 作为订单时无序的字段。谢谢:))

Index   Chr Start   End Ref Alt Inheritance Score
1   1   10  100 A   -   .   2

FILE1.TXT

Index   Chr Start   End Ref Alt Inheritance
1   1   10  100 A   -   .   2
2   1   20  100 A   -   .   5

FILE2.TXT

Index   Chr Start   End Ref Alt Inheritance
1   1   10  100 A   -   .   2
2   1   20  100 A   -   .   5
3   1   75  100 A   -   .   2
4   1   25  100 A   -   .   5

file3.txt

for f in /home/cmccabe/Desktop/validate/*.txt ; do
bname=`basename $f`
 awk '
  FNR==NR {
   order=(awk '!seen[$0]++ {lines[i++]=$0}
    END {for (i in lines) if (seen[lines[i]]==1) print lines[i]})'
       k=(awk '!seen[$0]++ {lines[i++]=$0}
    END {for (i in lines) if (seen[lines[i]]==1) print lines[i]})'
        if($order==$k) print FILENAME " has expected order of fields"
        else
        print FILENAME " order of $i is not correct"
}' key $f
done

AWK

/home/cmccabe/Desktop/validate/file1.txt has expected order of fields
/home/cmccabe/Desktop/validate/file2.txt order of Score is not correct
/home/cmccabe/Desktop/validate/file3.txt order of Score is not correct

所需的输出

<div class="div1">
   <p>Hello There</p>
</div>
<div class="div-main">
   <p>Hello There</p>
</div>

2 个答案:

答案 0 :(得分:1)

鉴于这些输入,您可以执行以下操作:

awk 'FNR==NR{hn=split($0,header); next} 
     FNR==1 {n=split($0,fh)
            for(i=1;i<=hn; i++)
                if (fh[i]!=header[i]) {
                    printf "%s: order of %s is not correct\n" ,FILENAME, header[i]
                    next}
            if (hn==n)
                print FILENAME, "has expected order of fields"
            else
                print FILENAME, "has extra fields"  
                next              
                }' key f{1..3}

打印:

f1 has expected order of fields
f2 order of Score is not correct
f3 order of Score is not correct

答案 1 :(得分:1)

$ cat tst.awk
NR==FNR { split($0,keys); next }
FNR==1 {
    allmatched = 1
    for (i=1; i in keys; i++) {
        if ($i != keys[i] ) {
            printf "%s order of %s is not correct\n", FILENAME, keys[i]
            allmatched = 0
        }
    }
    if ( allmatched ) {
        printf "%s has expected order of fields\n", FILENAME
    }
    nextfile
}

$ awk -f tst.awk key file1 file2 file3
file1 has expected order of fields
file2 order of Score is not correct
file3 order of Score is not correct

以上使用nextfile的GNU awk来提高效率。使用其他awks只需删除该语句并接受将读取整个文件。

你没有在你的示例中包含一个标题出现在文件但是没有出现在键中的情况,所以我认为这不会发生,所以你不需要脚本来处理它。