我有两个文件,如下所示,以制表符分隔:
档案A
123,789 aa b c d
123 aa b c d
234 a b c d
345 aa b c d
456 a b c d
....
档案B
123 add c d e
345 add e f g
789 sub e f g
...
我想基于第1列将文件B中的第2列添加到文件A,以便输出如下所示: 输出:
123,789 add,sub aa b c d
123 add aa b c d
234 a b c d
345 add aa b c d
456 a b c d
....
我尝试使用:
awk 'NR==FNR{a[$1]=$2;next}{$2=a[$1]FS$2;print}' OFS='\t' fileB file A
给出输出:
123,789 aa b c d
123 add aa b c d
234 a b c d
345 add aa b c d
456 a b c d
问题是在文件A的column1中有多个字符串用逗号分隔的列.awk代码将它视为单个字符串,因为它与fileB不匹配。有人可以编辑awk代码或任何修复。感谢。
答案 0 :(得分:1)
我会说
awk -F '\t' 'BEGIN { OFS = FS } NR == FNR { saved[$1] = $2; next } { n = split($1, a, ","); sep = ""; field = ""; for(i = 1; i <= n; ++i) { if(a[i] in saved) { field = field sep saved[a[i]]; sep = "," } } $1 = $1 OFS field } 1' fileB fileA
那是:
BEGIN { OFS = FS } # Output separated like input
NR == FNR { # while processing fileB:
saved[$1] = $2 # just remember stuff
next
}
{ # while processing fileA:
n = split($1, a, ",") # split first field at commas
sep = "" # reset temps
field = ""
for(i = 1; i <= n; ++i) { # wade through comma-separated parts of $1
if(a[i] in saved) { # if a corresponding line existed in fileB
field = field sep saved[a[i]] # append it to the new field
sep = "," # from the second forward, separate by ","
}
}
$1 = $1 OFS field # insert the new field into the line
}
1 # then print.