我正在尝试使用awk
添加$4
,$5
,$6
字段以及tab-delimeted
file2
中的标头file2
$2
中的行$3
中存在匹配的file1
值。我在每一行添加了评论以及我对最新情况的理解。谢谢你:)。
file1 tab-delimeted
ID Name Number
0-0 A,A 123456
2-2 B,B 789123
4-4 C,C 456789
file2 tab-delimeted
ID Number Name Info1 Info2 Info3 Info4
0-0 123456 A,A aaaaa bbbbb ccccc eeeee
1-1 111111 Z,Z aaa bbb ccc eee
2-2 789123 B,B aaaaa bb,bbb ccccc eeeee
3-3 222222 Y,Y aaa bb,bb cc e
4-4 456789 C,C aaa bb ccc eeee
所需的输出 tab-delimeted
ID Name Number Info1 Info2 Info3
0-0 A,A 123456 aaaaa bbbbb ccccc
2-2 B,B 789123 aaaaa bb,bbb ccccc
4-4 C,C 456789 aaa bb ccc
awk
awk -F"\t" '$3 in a{ # read $3 value of file1 into array a
a[$3]=a[$2]; # match $3 array a from file1 with $2 value in file2
next # process next line
} # close block
{ print $1,$2,a[$2],$4,$5,$6 # print desired output
} # close block
END { # start block
for ( i in a) { # create for loop i to print
print a[i] # print for each matching line in i
} # close block
}' file1 file2
答案 0 :(得分:2)
$ awk -v OFS="\t" 'NR==FNR{a[$3]=$0;next}$2 in a{print a[$2],$4,$5,$6}' file1 file2
ID Name Number Info1 Info2 Info3
0-0 A,A 123456 aaaaa bbbbb ccccc
2-2 B,B 789123 aaaaa bb,bbb ccccc
4-4 C,C 456789 aaa bb ccc
说明:
$ awk -v OFS="\t" ' # tab as OFS also
NR==FNR{ # for file1
a[$3]=$0 # hash $0 to a using $3 as key
next # no further processing for this record
}
$2 in a { # if $2 found in a
print a[$2],$4,$5,$6 # output as requested
}' file1 file2 # mind the file order
答案 1 :(得分:1)
尝试:再一次尝试读取file2然后再读取file1。
awk -F"\t" 'FNR==NR{a[$1,$3,$2]=$4 OFS $5 OFS $6;next} (($1,$2,$3) in a){print $1,$2,$3,a[$1,$2,$3]}' OFS="\t" file2 file1
将在几分钟内添加解释。
编辑:添加非单线形式的解决方案以及解释。
awk -F"\t" 'FNR==NR{ ####Checking condition FNR==NR which will be only true when first file named file2 is being read. Because FNR and NR both represent the number of lines for a Input_file, only difference is FNR value will be RESET whenever it is starting to read next Input_file and NR value will be keep on increasing till all the Input_files are being read.
a[$1,$3,$2]=$4 OFS $5 OFS $6; ####Creating an array named a whose index is $1,$3 and $2 and value is $4,$5 and $6. Where OFS is output field separator, whose default value is space.
next ####next is awk built-in keyword which will NOT allow cursor to go further and will skip all next statements.
}
(($1,$2,$3) in a){ ####Checking a condition which will be only checked when 2nd Input_file is being read. So checking here if $1, $2 and $3 is present in array a, then do following.
print $1,$2,$3,a[$1,$2,$3]####print the value of $1, $2,$3 and array a value whose index is $1,$2 and $3.
}
' OFS="\t" file2 file1 ####Mentioning the Input_files here.