有两个文件:1.txt和2.txt
1.txt具有以下格式的项目及其顺序:
item-code|order-value|label
2.txt具有以下形式的项目及其属性:
item-code|property-A|property-B| ... |property-Z
例如,1.txt如下所示:
ITEM-CODE|_o_o_|prefLabel-EN-ANSI
6|8719|disparlure
7|3300|acids,-bases,-and-salts
8|3299|chemical-compounds
2.txt看起来像这样:
ITEM-CODE|TERM|AV-FTC|DB-PEDIA-IRI|LCSH-1|LCSH-2|LCSH-3|LCSH-4|LCSH-5|LCSH-6|LCSH-7|GACS-IRI
2|positive-sense,-single-stranded-RNA-viruses|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|http://id.agrisemantics.org/gacs/C4028
4|negative-sense,-single-stranded-RNA-viruses|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|http://id.agrisemantics.org/gacs/C3806
6|disparlure|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_
7|acids,-bases,-and-salts|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_
8|chemical-compounds|c_49870|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|http://id.agrisemantics.org/gacs/C29686
sample 3.txt(结果-见下文)看起来像这样:
ITEM-CODE|TERM|AV-FTC|DB-PEDIA-IRI|LCSH-1|LCSH-2|LCSH-3|LCSH-4|LCSH-5|LCSH-6|LCSH-7|GACS-IRI|_o_o_
2|positive-sense,-single-stranded-RNA-viruses|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|http://id.agrisemantics.org/gacs/C4028|NULL
4|negative-sense,-single-stranded-RNA-viruses|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|http://id.agrisemantics.org/gacs/C3806|NULL
6|disparlure|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|8719
此awk
函数:
BEGIN { FS=OFS="|" }
NR==FNR{
a[$1]=$2
next
}
{
if ($1 in a)
$(NF+1)=a[$1]
else
$(NF+1)="NULL"
print
}
生成:
item-code|label|property-A|property-B| ... |property-Z|order-value
如果1.txt中没有商品代码与2.txt中的商品代码匹配,则将NULL替换为缺少的订单值
如何修改awk
函数以将1.txt保留在左侧(“常量”),将2.txt保留在右侧(“变量”),并生成如下结果:
item-code|order-value|label|property-A|property-B| ... |property-Z
或者,如果没有属性值可用于商品代码,则
item-code|order-value|label|NULL
命令如下:
C:\gnu\GnuWin32\bin\awk.exe -f a.awk 1.txt 2.txt > 3.txt
其中a.awk
是上面的awk
函数。
在Win10上运行awk并使用双引号
答案 0 :(得分:1)
您可以使用join
来做到这一点。
1.txt
1|48000|first
2|67500|second
3|81990|third
4|55000|fourth
2.txt
1|fred|sara|anthony
3|steve|jane|mike
4|tim
然后运行:
join -a 1 -e "NULL" -t '|' -o 1.1,1.2,1.3,2.2,2.3,2.4 1.txt 2.txt
采样结果
1|48000|first|fred|sara|anthony
2|67500|second|NULL|NULL|NULL
3|81990|third|steve|jane|mike
4|55000|fourth|tim|NULL|NULL
答案 1 :(得分:1)
请您尝试以下。
awk '
BEGIN{
FS=OFS="|"
}
FNR==1 && ++count==1{
val=$2
next
}
FNR==1 && ++count==2{
print $0,val
next
}
FNR==NR{
a[$1]=$2
next
}
{
print $0,a[$1]?a[$1]:"NULL"
}
' 1.txt 2.txt
说明: 现在也为上述代码添加了说明。
awk ' ##Starting awk program here.
BEGIN{ ##Starting BEGIN section for awk program here.
FS=OFS="|" ##Setting field separator and output field separator as pipe here.
} ##Closing BEGIN section here.
FNR==1 && ++count==1{ ##Checking condition if FNR==1 and variable count value is 1 means first Input_file header is being read.
val=$2 ##Creating variable val and setting its value as $2 here.
next ##Next will skip all further statements from here onwards.
} ##Closing this condition block.
FNR==1 && ++count==2{ ##Checking condition where FNR==1 and count variable value is 2 here.
print $0,val ##Printing current line with variable val here.
next ##Next will skip all further statements from here.
}
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when 1.txt is being read.
a[$1]=$2 ##Creating an array named a whose index is $1 and value is $2.
next ##next will skip all further statements from here.
}
{
print $0,a[$1]?a[$1]:"NULL" ##Printing current line and printing value of a[$1] if a[$1] is having no value then print NULL.
}
' 1.txt 2.txt ##Mentioning Input_file names here.