我有一个这样的制表文件,joj001.txt:
C00299 map01
C00125 map65
C00299 map13
和一个csv文件dora.csv:
V1 V2 V3
D12 C00299 4
E10 C01832 5
当列V2包含键时,我想添加所有命中的列(或生成新的csv文件),如下所示:
V1 V2 V3 V4
D12 C00299 4 map01,map13
E10 C01835 5
但是到目前为止,我已经知道了:
$ awk -F'\t' -vOFS="\t" 'FNR==NR{a[$1]=$2; next}{print $0,a[$2]}' joj001.txt mia.csv
V1 V2 V3 V4
D12 C00299 4 map13
E10 C01835 5
如何用逗号分隔所有出现的内容?
谢谢
答案 0 :(得分:0)
您的脚本会覆盖a[$1]
中的值,而不是附加到该值。
有许多附加方法可以代替。例如:
if ( a[$1] ) a[$1] = a[$1] "," $2; else a[$1] = $2
a[$1] = a[$1] ( a[$1] ? "," : "" ) $2
a[$] = ( a[$1] ? a[$1] "," : "" ) $2
a[$1] = a[$1] ? a[$1] "," $2 : $2
a[$1] = a[$1] "," $2;
# then once at the end:
sub(/^,/,"",a[$1])
您还需要插入新的列标题。
所以:
awk -F '\t' -v OFS='\t' '
FNR==NR { a[$1] = a[$1] ? a[$1] "," $2 : $2; next }
FNR==1 { print $0, "V4"; next }
{ print $0, a[$2] }
' joj001.txt mia.csv