awk |比较其他文件中的字段后更新字段编号

时间:2013-02-24 06:57:27

标签: awk

  • 输入文件1:file1.txt
    MH = 919767,的 918975
    DL = 919922
    HR = 919891,919394,919812
    KR = 919999,918888

  • 输入File2:file2.txt
    AEC,919922783456,A5,B3 ,,, ASF
    ABC,918975583456,A1,B1 ,,, ABF
    AECI,919998546783,A2,B4 ,,, WSF

  • 输出文件
    AEC,919922783456,A5,B3,的 DL ,, ASF
    ABC,918975583456,A1,B1,的 MH ,, ABF
    AECI,919998546783,A2,B4,的 NOMATCH ,, WSF

  • 备注

    • 需要比较输入file1.txt - 第二个字段中的电话号码(输入file2.txt - 第二个字段 - 仅初始6个数字)" =" separted)即可。如果电话号码的初始6位数匹配,则OUTPUT应包含从文件(输入文件1)到第5个字段
    • 输出的2位数代码
    • File1.txt具有单一代码(例如MH),适用于多个电话号码的内容。

2 个答案:

答案 0 :(得分:1)

尝试类似:

awk '
  NR==FNR{
    for(i=2; i<=NF; i++) A[$i]=$1
    next
  } 
  {
    $5="NOMATCH"
    for(i in A) if ($2~"^" i) $5=A[i]
  } 
  1
' FS='[=,]' file1 FS=, OFS=, file2

答案 1 :(得分:1)

如果您有GNU awk,请尝试以下操作。像:

一样运行
awk -f script.awk file1.txt file2.txt

script.awk的内容:

BEGIN {
     FS="[=,]"
     OFS=","
}

FNR==NR {
    for(i=2;i<=NF;i++) {
        a[$1][$i]
    }
    next
}

{
    $5 = "NOMATCH"
    for(j in a) {
        for (k in a[j]) {
            if (substr($2,0,6) == k) {
                $5 = j
            }
        }
    }
}1

或者,这是单行:

awk -F "[=,]" 'FNR==NR { for(i=2;i<=NF;i++) a[$1][$i]; next } { $5 = "NOMATCH"; for(j in a) for (k in a[j]) if (substr($2,0,6) == k) $5 = j }1' OFS=, file1.txt file2.txt

结果:

aec,919922783456,a5,b3,DL,,asf
abc,918975583456,a1,b1,MH,,abf
aeci,919998546783,a2,b4,NOMATCH,,wsf

如果您有“旧”awk,请尝试以下操作。像:

一样运行
awk -f script.awk file1.txt file2.txt

script.awk的内容:

BEGIN {
     # set the field separator to either an equals sign or a comma
     FS="[=,]"
     # set the output field separator to a comma
     OFS=","
}

# for the first file in the arguments list
FNR==NR {
    # loop through all the fields, starting at field two
    for(i=2;i<=NF;i++) {

        # add field one and each field to a pseudo-multidimensional array
        a[$1,$i]
    }

    # skip processing the rest of the code
    next
}


# for the second file in the arguments list
{
    # set the default value for field 5
    $5 = "NOMATCH"

    # loop though the array
    for(j in a) {

        # split the array keys into another array
        split(j,b,SUBSEP)

        # if the first six digits of field two equal the value stored in this array
        if (substr($2,0,6) == b[2]) {

            # assign field five 
            $5 = b[1]
        }
    }

# return true, therefore print by default
}1

或者,这是单行:

awk -F "[=,]" 'FNR==NR { for(i=2;i<=NF;i++) a[$1,$i]; next } { $5 = "NOMATCH"; for(j in a) { split(j,b,SUBSEP); if (substr($2,0,6) == b[2]) $5 = b[1] } }1' OFS=, file1.txt file2.txt

结果:

aec,919922783456,a5,b3,DL,,asf
abc,918975583456,a1,b1,MH,,abf
aeci,919998546783,a2,b4,NOMATCH,,wsf