我是Awk的新手,我有以下查询。
input.csv
"11111","TRUE","aa"
"456789","TRUE","aa;bb;cc"
"2345","TRUE","bb"
"434566","","cc"
我正在尝试创建一个awk命令,它应该给我以下输出:
output.csv
"11111","TRUE","aa,,"
"456789","TRUE","aa,bb,cc"
"2345","TRUE",",bb,"
"434566","",",,cc"
我需要每次打印前两个参数,但第三个参数是我需要检查的。所以第三个属性可以包含任何aa,bb,cc或它们的组合,如aa; bb或aa; cc或aa; bb; cc或这些都不是,所以我需要确保其中哪些存在并打印这些分离用逗号如果aa; bb在输入文件中我需要aa,bb,如果没有那么我需要,(两个逗号)。
我在每个中使用正则表达式如果检查aa,bb,cc是这些中的任何一个,如果它不存在那么它将附加在varible值中,(逗号)将附加在值中。
我创建了以下命令。
awk 'BEGIN{FS=",";OFS=","} { if( $3 ~ /aa/ ) { value="aa" } else { value="," }; if( $3 ~ /bb/ ) { value="$value,bb" } else { value="$value," };
if( $3 ~ /cc/ ) { value="$value,cc" } else { value="$value,"}; print $1 , $2 , $value}' input.csv > output.csv
但是它给了我以下输出。
"11111","TRUE","11111","TRUE","aa"
"456789","TRUE","456789","TRUE","aa;bb;cc"
"2345","TRUE","2345","TRUE","bb"
"434566","","434566","","cc"
我不确定为什么打印前两个属性两次然后打印第三个值。我能够在shell脚本中完成它,但我需要使用awk。
答案 0 :(得分:1)
$ cat tst.awk
BEGIN {
FS=OFS="\""
split("aa,bb,cc",dflts,/,/)
}
{
delete vals
for (i in dflts) {
vals[i] = ($(NF-1) ~ dflts[i] ? dflts[i] : "")
}
$(NF-1) = vals[1] "," vals[2] "," vals[3]
print
}
$ awk -f tst.awk file
"11111","TRUE","aa,,"
"456789","TRUE","aa,bb,cc"
"2345","TRUE",",bb,"
"434566","",",,cc"
阅读Arnold Robbins撰写的有效Awk编程,第4版。