比较文件和打印类

时间:2012-07-20 09:47:31

标签: awk

我有 文件1:

id position 
a1 21
a1 39
a1 77
b1 88
b1 122
c1 22

文件2

id  class  position1 position2
a1  Xfact   1           40
a1  Xred    41          66
a1  xbreak  69          89
b1  Xbreak  77          133
b1  Xred    140         199
c1  Xfact   1           15
c1  Xbreak  19          35

我想要这样的东西 输出:

id  position  class
a1   21        Xfact
a1   39        Xfact
a1   77        Xbreak
b1   88        Xbreak
b1   122       Xbreak
c1   22        Xbreak

我需要一个简单的awk脚本,它从file1打印id和位置,从file1获取位置并将其与文件2位置进行比较。如果文件1中的位置位于文件2中位置1和2的范围内。打印相应的课程

1 个答案:

答案 0 :(得分:0)

使用awk的一种方法。这不是一个简单的脚本。简要说明过程:关键点是变量'all_ranges',当重置从保存数据的范围文件中读取时,设置时,停止该过程并开始从'id-position'读取 file,检查数组数据中的位置,如果匹配范围则打印。我试图避免多次处理范围文件并通过块进行处理,这使得它更复杂。

编辑添加我假设两个文件中的id字段都已排序。否则这个脚本会失败,你需要另一种方法。

script.awk的内容:

BEGIN {
    ## Arguments:
    ## ARGV[0] = awk
    ## ARGV[1] = <first_input_argument>
    ## ARGV[2] = <second_input_argument>
    ## ARGC = 3
    f2 = ARGV[ --ARGC ];

    all_ranges = 0

    ## Read first line from file with ranges to get 'class' header.
    getline line <f2
    split( line, fields )
    class_header = fields[2];
}

## Special case for the header.
FNR == 1 {
    printf "%s\t%s\n", $0, class_header;
    next;
}

## Data.
FNR > 1 {

    while ( 1 ) {

        if ( ! all_ranges ) {

            ## Read line from file with range positions.
            ret = getline line <f2

            ## Check error.
            if ( ret == -1 ) {
                printf "%s\n", "ERROR: " ERRNO
                close( f2 );
                exit 1;
            }

            ## Check end of file.
            if ( ret == 0 ) {
                break;
            }

            ## Split line in spaces.
            num = split( line, fields )
            if ( num != 4 ) {
                printf "%s\n", "ERROR: Bad format of file " f2;
                exit 2;
            }

            range_id = fields[1];
            if ( $1 == fields[1] ) {
                ranges[ fields[3], fields[4] ] = fields[2];
                continue;
            }
            else {
                all_ranges = 1
            }
        }

        if ( range_id == $1 ) {
            delete ranges;
            ranges[ fields[3], fields[4] ] = fields[2];
            all_ranges = 0;
            continue;
        }        

        for ( range in ranges ) {
            split( range, pos, SUBSEP )
            if ( $2 >= pos[1] && $2 <= pos[2] ) {
                printf "%s\t%s\n", $0, ranges[ range ];
                break;
            }  
        }
        break;
    }
}

END {
    for ( range in ranges ) {
        split( range, pos, SUBSEP )
        if ( $2 >= pos[1] && $2 <= pos[2] ) {
            printf "%s\t%s\n", $0, ranges[ range ];
            break;
        }  
    }
}

像以下一样运行:

awk -f script.awk file1 file2 | column -t

以下结果:

id  position  class
a1  21        Xfact
a1  39        Xfact
a1  77        xbreak
b1  88        Xbreak
b1  122       Xbreak
c1  22        Xbreak