我必须创建一个awk
脚本来实现以下转换:
FLAG
和FLAG1
FLAG2
列
在以下条件下, FLAG1
和FLAG2
被填充:
if the VAL is ":" then NUM is null
if the VAL is ":" and FLAG "c" then NUM is null and FLAG1 is "c"
if the VAL is ":" and FLAG "u" then NUM is null and FLAG2 is "u"
if the VAL is "14,385" and FLAG "d" then NUM is "14385" and FLAG(both) is null
if the VAL is "14,385" and FLAG "du" then NUM is "14385" and FLAG2 is "u"
if the VAL is ":" and FLAG "cd" then NUM is null and FLAG1 is "c"
if the VAL is ":" and FLAG "bc" then NUM is null and FLAG1 is "c" and FLAG2 is "b"
if the VAL is ":" and FLAG "z" then NUM is 0 and FLAG2 is "z"
csv
输入文件是:
"PRIM", "TRD", "GTR", "VAL", "FLAG"
"TPP", "T5-78", "HT", ":", c
"TCP", "T5-78", "HT", "12,385", c
"TZP", "T5-78", "HT", ":", z
"TNP", "T5-78", "HT", ":", z
"TNP", "T5-78", "HT", ":", cd
"TNP", "T5-78", "HT", ":", du
"TNP", "T5-78", "HT", "12,524,652", dfg
输出.dat
文件应如下所示:
PRIM TRD GTR NUM FLAG1 FLAG2
TPP T5-78 HT null c null
TCP T5-78 HT 12385 c null
TZP T5-78 HT 0 null z
TNP T5-78 HT 0 null z
TNP T5-78 HT null c null
TNP T5-78 HT null null u
TNP T5-78 HT 12524652 null dfg
我尝试过的代码并没有正常工作,因为只有满足前3个要求,而第4个要求不起作用。
BEGIN {
FS=","; OFS="\t";
a["PRIM"]=1;a["TRD"]=1;a["GTR"]=1;a["VAL"]=1;a["FLAG"]=1;
}
NR==1 {
{ $a["VAL"] = "NUMB" ; $a["FLAG"] = "FLAG1" ; $5 = "FLAG2" ; print ; next }
$a["VAL"]=="12,385" && $a["FLAG"] == "d" { $a["VAL"] = "14385" ; $a["FLAG"] = $5 = "" }
$a["VAL"]=="12,385" && $a["FLAG"] == "du" { $a["VAL"] = "14385" ; $a["FLAG"] = "" ; $9 = "u" }
$a["VAL"] != ":" { print ; next }
$a["FLAG"] == "z" { $a["VAL"] = "0" ; $a["FLAG"] = "" ; $5 = "z" }
$a["FLAG"] != "z" { $a["VAL"] = "" }
$NF=substr($NF,1,length($NF)-1);
for(i=1;i<=NF;i++) if($i in a) a[$i]=i;
}
{ print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"],
NR==1?"FLAG1"OFS"FLAG2":($a["FLAG"]?""OFS$a["FLAG"]:$a["FLAG"]);
以下是我认为最新的代码。我现在无法解决的问题是最后一个值(FLAG2)打印在第二行。我试图放OFS
,但它没有解决问题。你能告诉我这个案子有什么问题吗?
BEGIN {
FS=",";
OFS="\t";
a["PRIM"]=1;
a["TRD"]=1;
a["GTR"]=1;
a["VAL"]=1;
a["FLAG"]=1;
a["FLAG1"]=1;
a["FLAG2"]=1;
}
NR==1 {
$NF=substr($NF,1,length($NF)-1);
for(i=1;i<=NF;i++)
#if($i in a)
a[$i]=i;
a["FLAG1"] = i;
a["FLAG2"]=i;
a["FLAG1"] = a["FLAG"]; # just for testing and it is ok
a["FLAG2"] = a["FLAG"]; # just for testing and it is ok
}
{
print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"],
NR==1?"FLAG1":$a["FLAG1"],NR==1?"FLAG2":$a["FLAG2"];
}
输出类似于
PRIM TRD GTR NUM FLAG1 FLAG2
TPP T5-78 HT null c
null
TCP T5-78 HT 12385 c
null
TZP T5-78 HT 0 null
z
经过这么多建议后,这是我的最后一个版本,但它仍然不成功......现在当我添加if语句来满足上述要求时,没有任何反应。我认为if语句要么不正确也要把它放在正确的位置。 如果NR> 1是灾难,则打印值。 你能告诉我我的剧本有什么问题吗?我不得不承认我3天前开始使用这个awk,到目前为止它很痛苦......问题是我自上周以来应该完成这个脚本
BEGIN {
FS=",";
OFS="\t";
a["PRIM"]=1;
a["TRD"]=1;
a["GTR"]=1;
a["VAL"]=1;
a["FLAG"]=1;
a["FLAG1"]=1;
a["FLAG2"]=1;
}
NR==1 {
$NF=substr($NF,1,length($NF)-1);
for(i=1;i<=NF;i++)
#if($i in a)
a[$i]=i;
#a["FLAG1"] = a[i];
#a["FLAG2"]=a[i];
a["FLAG1"] = a["FLAG"];
a["FLAG2"] = a["FLAG"];
}
{
#initialisation of the new flags
a["FLAG1"]=="";
a["FLAG2"]=="";
}
#MY IF STATEMENTS GO HERE - TEST MODE
a["FLAG"] == "cd" {a["FLAG1"]= "c"}
a["FLAG"] == "du" {a["FLAG2"]= "u"}
{
#print header
print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"], NR==1?"FLAG1":$a["FLAG1"],NR==1?"FLAG2":$a["FLAG2"];
}
#print content
NR>1
{
for(j=1;j<=NF;j++)
#if($i in a)
a[$j]=j;
#a["FLAG1"] = a[i];
#a["FLAG2"]=a[i];
a["FLAG1"] = a["FLAG"];
a["FLAG2"] = a["FLAG"];
}
#MY IF STATEMENTS GO HERE - TEST MODE
a["FLAG"] == "cd" {a["FLAG1"]= "c"}
a["FLAG"] == "du" {a["FLAG2"]= "u"}
{
print $a["PRIM"],$a["TRD"],$a["GTR"],$a["VAL"], $a["FLAG1"], $a["FLAG2"]
}
答案 0 :(得分:2)
这要求所有输入字段都有双引号。
$ echo '"PRIM", "TRD", "GTR", "VAL", "FLAG"
"TPP", "T5-78", "HT", ":", "c"
"TCP", "T5-78", "HT", "12,385", "c"
"TZP", "T5-78", "HT", ":", "z"
"TNP", "T5-78", "HT", ":", "z"
"TNP", "T5-78", "HT", ":", "cd"
"TNP", "T5-78", "HT", ":", "du"
"TNP", "T5-78", "HT", "12,524,652", "dfg"' |
awk -F '",[ \t]*"' '
{ sub(/^"/, "", $1); sub(/"$/, "", $NF)}
NR == 1 {
for (i=1; i<=NF; i++) col[$i] = i
print "PRIM TRD GTR NUM FLAG1 FLAG2"
next
}
{
f = $col["FLAG"]
v = $col["VAL"]; gsub(/,/, "", v)
num = "null"; flag1 = "null"; flag2 = "null"
}
v == ":" && f == "c" {flag1 = "c"}
v == ":" && f == "u" {flag2 = "u"}
v == "14385" && f == "d" {num = $4}
v == "14385" && f == "du" {num = $4; flag2 = "u"}
v == ":" && f == "cd" {flag1 = "c"}
v == ":" && f == "bc" {flag1 = "c"; flag2 = "b"}
v == ":" && f == "z" {num = 0; flag2 = "z"}
{print $col["PRIM"],$col["TRD"],$col["GTR"],num,flag1,flag2}
'
PRIM TRD GTR NUM FLAG1 FLAG2
TPP T5-78 HT null c null
TCP T5-78 HT null null null
TZP T5-78 HT null null z
TNP T5-78 HT null null z
TNP T5-78 HT null c null
TNP T5-78 HT null null null
TNP T5-78 HT null null null
我的输出看起来不像你的。检查您的规格并确保样品输入足以覆盖它们。
答案 1 :(得分:0)
最终的工作解决方案与此完全相同:
BEGIN {
FS=",";
OFS="\t";
}
{
# delete the carriage return character from windows. the magic part
sub(/\015$/,"")
sub(/^"/, "", $1)
sub(/"$/, "", $NF)
}
NR==1{
#$NF=substr($NF,1,length($NF)-1);
for (i=1;i<=NF;i++) col[$i]=i
print "PRIM TRD GTR NUM FLAG1 FLAG2"
next
}
{
f=$col["FLAG"];
v=$col["NUM"];
gsub(/,/, "", v)
gsub(/,/, "", f)
flag1 = "";
flag2 = "";
if(substr(v,1,1) == ":" && substr(f,1,1) == "c")
{
flag1 = "c";
flag2 ="";
}
if ( ! ( substr(f,1,1) == "dz" || substr(f,1,1) =="du" || substr(f,1,1) =="cd" || substr(f,1,1) =="c" || substr(f,1,1) =="z") || substr(f,1,1) =="u" || substr(f,1,1) =="d" ) )
{
flag1 = "";
flag2 =f;
}
print $col["PRIM"],$col["TRD"],$col["GTR"],v,flag1,flag2 ;
}