如何使用awk连接两个文件?

时间:2018-11-01 10:11:17

标签: awk

有两个文件:1.txt和2.txt

1.txt具有以下格式的项目及其顺序:

item-code|order-value|label

2.txt具有以下形式的项目及其属性:

item-code|property-A|property-B| ... |property-Z

例如,1.txt如下所示:

ITEM-CODE|_o_o_|prefLabel-EN-ANSI
6|8719|disparlure
7|3300|acids,-bases,-and-salts
8|3299|chemical-compounds

2.txt看起来像这样:

ITEM-CODE|TERM|AV-FTC|DB-PEDIA-IRI|LCSH-1|LCSH-2|LCSH-3|LCSH-4|LCSH-5|LCSH-6|LCSH-7|GACS-IRI
2|positive-sense,-single-stranded-RNA-viruses|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|http://id.agrisemantics.org/gacs/C4028
4|negative-sense,-single-stranded-RNA-viruses|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|http://id.agrisemantics.org/gacs/C3806
6|disparlure|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_
7|acids,-bases,-and-salts|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_
8|chemical-compounds|c_49870|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|http://id.agrisemantics.org/gacs/C29686

sample 3.txt(结果-见下文)看起来像这样:

ITEM-CODE|TERM|AV-FTC|DB-PEDIA-IRI|LCSH-1|LCSH-2|LCSH-3|LCSH-4|LCSH-5|LCSH-6|LCSH-7|GACS-IRI|_o_o_
2|positive-sense,-single-stranded-RNA-viruses|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|http://id.agrisemantics.org/gacs/C4028|NULL
4|negative-sense,-single-stranded-RNA-viruses|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|http://id.agrisemantics.org/gacs/C3806|NULL
6|disparlure|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|8719

awk函数:

BEGIN { FS=OFS="|" }
NR==FNR{
    a[$1]=$2
    next
}
{
    if ($1 in a)
        $(NF+1)=a[$1]
    else 
        $(NF+1)="NULL"
    print
}

生成:

item-code|label|property-A|property-B| ... |property-Z|order-value

如果1.txt中没有商品代码与2.txt中的商品代码匹配,则将NULL替换为缺少的订单值

如何修改awk函数以将1.txt保留在左侧(“常量”),将2.txt保留在右侧(“变量”),并生成如下结果:

item-code|order-value|label|property-A|property-B| ... |property-Z

或者,如果没有属性值可用于商品代码,则

item-code|order-value|label|NULL

命令如下:

C:\gnu\GnuWin32\bin\awk.exe -f a.awk 1.txt 2.txt > 3.txt

其中a.awk是上面的awk函数。

在Win10上运行awk并使用双引号

2 个答案:

答案 0 :(得分:1)

您可以使用join来做到这一点。

1.txt

1|48000|first
2|67500|second
3|81990|third
4|55000|fourth

2.txt

1|fred|sara|anthony
3|steve|jane|mike
4|tim

然后运行:

join -a 1 -e "NULL"  -t '|' -o 1.1,1.2,1.3,2.2,2.3,2.4 1.txt 2.txt

采样结果

1|48000|first|fred|sara|anthony
2|67500|second|NULL|NULL|NULL
3|81990|third|steve|jane|mike
4|55000|fourth|tim|NULL|NULL

答案 1 :(得分:1)

请您尝试以下。

awk '
BEGIN{
  FS=OFS="|"
}
FNR==1 && ++count==1{
  val=$2
  next
}
FNR==1 && ++count==2{
  print $0,val
  next
}
FNR==NR{
  a[$1]=$2
  next
}
{
  print $0,a[$1]?a[$1]:"NULL"
}
' 1.txt 2.txt

说明: 现在也为上述代码添加了说明。

awk '                           ##Starting awk program here.
BEGIN{                          ##Starting BEGIN section for awk program here.
  FS=OFS="|"                    ##Setting field separator and output field separator as pipe here.
}                               ##Closing BEGIN section here.
FNR==1 && ++count==1{           ##Checking condition if FNR==1 and variable count value is 1 means first Input_file header is being read.
  val=$2                        ##Creating variable val and setting its value as $2 here.
  next                          ##Next will skip all further statements from here onwards.
}                               ##Closing this condition block.
FNR==1 && ++count==2{           ##Checking condition where FNR==1 and count variable value is 2 here.
  print $0,val                  ##Printing current line with variable val here.
  next                          ##Next will skip all further statements from here.
}
FNR==NR{                        ##Checking condition FNR==NR which will be TRUE when 1.txt is being read.
  a[$1]=$2                      ##Creating an array named a whose index is $1 and value is $2.
  next                          ##next will skip all further statements from here.
}
{
  print $0,a[$1]?a[$1]:"NULL"   ##Printing current line and printing value of a[$1] if a[$1] is having no value then print NULL.
}
' 1.txt 2.txt                   ##Mentioning Input_file names here.