如何通过多个条件将一列拆分成另一列?

时间:2014-04-24 20:01:03

标签: bash shell unix awk

我必须创建一个awk脚本来实现以下转换:

  1. 列顺序是随机的
  2. 没有固定的结构。实际上这是个大问题
  3. 必须在FLAGFLAG1
  4. 中拆分FLAG2列 在以下条件下,
  5. FLAG1FLAG2被填充:

    if the VAL is ":" then NUM is null
    if the VAL is ":" and FLAG "c" then NUM is null and FLAG1 is "c"
    if the VAL is ":" and FLAG "u" then NUM is null and FLAG2 is "u"
    if the VAL is "14,385" and FLAG "d" then NUM is "14385" and FLAG(both) is null
    if the VAL is "14,385" and FLAG "du" then NUM is "14385" and FLAG2 is "u"
    if the VAL is ":" and FLAG "cd" then NUM is null and FLAG1 is "c"
    if the VAL is ":" and FLAG "bc" then NUM is null and FLAG1 is "c" and FLAG2 is "b"
    if the VAL is ":" and FLAG "z" then NUM is 0 and FLAG2 is "z"
    
  6. csv输入文件是:

    "PRIM",  "TRD",   "GTR",   "VAL",   "FLAG"
    "TPP",   "T5-78", "HT",    ":",   c
    "TCP",   "T5-78", "HT",    "12,385",  c
    "TZP",   "T5-78", "HT",    ":",   z
    "TNP",   "T5-78", "HT",    ":",   z
    "TNP",   "T5-78", "HT",    ":",   cd
    "TNP",   "T5-78", "HT",    ":",   du
    "TNP",   "T5-78", "HT",    "12,524,652",  dfg
    

    输出.dat文件应如下所示:

    PRIM    TRD GTR NUM FLAG1   FLAG2
    TPP T5-78   HT  null    c   null
    TCP T5-78   HT  12385   c   null
    TZP T5-78   HT  0   null    z
    TNP T5-78   HT  0   null    z
    TNP T5-78   HT  null    c   null
    TNP T5-78   HT  null    null    u
    TNP T5-78   HT  12524652    null    dfg
    

    我尝试过的代码并没有正常工作,因为只有满足前3个要求,而第4个要求不起作用。

    BEGIN {
          FS=","; OFS="\t";
          a["PRIM"]=1;a["TRD"]=1;a["GTR"]=1;a["VAL"]=1;a["FLAG"]=1;
        }
        NR==1 {
    
       { $a["VAL"] = "NUMB" ; $a["FLAG"] = "FLAG1" ; $5 = "FLAG2" ; print ; next }
        $a["VAL"]=="12,385" && $a["FLAG"] == "d"  { $a["VAL"] = "14385" ; $a["FLAG"] = $5 = "" }
        $a["VAL"]=="12,385" && $a["FLAG"] == "du" { $a["VAL"] = "14385" ; $a["FLAG"] = "" ; $9 = "u" }
        $a["VAL"] != ":" { print ; next }
        $a["FLAG"] == "z" { $a["VAL"] = "0" ; $a["FLAG"] = "" ; $5 = "z" }
         $a["FLAG"] != "z" { $a["VAL"] = "" }
    
            $NF=substr($NF,1,length($NF)-1);
            for(i=1;i<=NF;i++) if($i in a) a[$i]=i;
        }
        {   print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"],
            NR==1?"FLAG1"OFS"FLAG2":($a["FLAG"]?""OFS$a["FLAG"]:$a["FLAG"]);
    

    以下是我认为最新的代码。我现在无法解决的问题是最后一个值(FLAG2)打印在第二行。我试图放OFS,但它没有解决问题。你能告诉我这个案子有什么问题吗?

    BEGIN {
    FS=","; 
    OFS="\t";
    a["PRIM"]=1;
    a["TRD"]=1;
    a["GTR"]=1;
    a["VAL"]=1;
    a["FLAG"]=1;
    a["FLAG1"]=1;
    a["FLAG2"]=1;
    }
    
    NR==1 {
        $NF=substr($NF,1,length($NF)-1);
        for(i=1;i<=NF;i++) 
    #if($i in a) 
    a[$i]=i;
    
    a["FLAG1"] = i;
    a["FLAG2"]=i;
    a["FLAG1"] = a["FLAG"];  # just for testing and it is ok
    a["FLAG2"] = a["FLAG"];  # just for testing and it is ok
    
    }
    
    {   
    
    print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"],
        NR==1?"FLAG1":$a["FLAG1"],NR==1?"FLAG2":$a["FLAG2"];
    

    }

    输出类似于

    PRIM    TRD GTR NUM FLAG1   FLAG2
    TPP T5-78   HT  null    c
       null
    TCP T5-78   HT  12385   c
       null
    TZP T5-78   HT  0   null
        z
    

    经过这么多建议后,这是我的最后一个版本,但它仍然不成功......现在当我添加if语句来满足上述要求时,没有任何反应。我认为if语句要么不正确也要把它放在正确的位置。 如果NR> 1是灾难,则打印值。 你能告诉我我的剧本有什么问题吗?我不得不承认我3天前开始使用这个awk,到目前为止它很痛苦......问题是我自上周以来应该完成这个脚本

    BEGIN {
    FS=",";
    OFS="\t";
    
    a["PRIM"]=1;
    a["TRD"]=1;
    a["GTR"]=1;
    a["VAL"]=1;
    a["FLAG"]=1;
    a["FLAG1"]=1;
    a["FLAG2"]=1;
    }
    
    NR==1 {
    
    $NF=substr($NF,1,length($NF)-1);
        for(i=1;i<=NF;i++)
    #if($i in a)
    a[$i]=i;
    
    #a["FLAG1"] = a[i];
    #a["FLAG2"]=a[i];
    
    a["FLAG1"] = a["FLAG"];
    a["FLAG2"] = a["FLAG"];
    }
    
    {
    #initialisation of the new flags
    a["FLAG1"]=="";
    a["FLAG2"]=="";
    }
    
    #MY IF STATEMENTS GO HERE   - TEST MODE   
    
    a["FLAG"] == "cd"   {a["FLAG1"]= "c"}
    a["FLAG"] == "du"   {a["FLAG2"]= "u"}
    
    {  
    #print header
    print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"], NR==1?"FLAG1":$a["FLAG1"],NR==1?"FLAG2":$a["FLAG2"];
    }
    
    #print content
    NR>1
    {
        for(j=1;j<=NF;j++)
    #if($i in a)
    a[$j]=j;
    
    #a["FLAG1"] = a[i];
    #a["FLAG2"]=a[i];
    
    a["FLAG1"] = a["FLAG"];
    a["FLAG2"] = a["FLAG"];
    }
    #MY IF STATEMENTS GO HERE   - TEST MODE   
    
    a["FLAG"] == "cd"   {a["FLAG1"]= "c"}
    a["FLAG"] == "du"   {a["FLAG2"]= "u"}
    
    {
    print $a["PRIM"],$a["TRD"],$a["GTR"],$a["VAL"], $a["FLAG1"], $a["FLAG2"]
    }
    

2 个答案:

答案 0 :(得分:2)

这要求所有输入字段都有双引号。

$ echo '"PRIM",  "TRD",   "GTR",   "VAL",   "FLAG"
"TPP",   "T5-78", "HT",    ":",   "c"
"TCP",   "T5-78", "HT",    "12,385",  "c"
"TZP",   "T5-78", "HT",    ":",   "z"
"TNP",   "T5-78", "HT",    ":",   "z"
"TNP",   "T5-78", "HT",    ":",   "cd"
"TNP",   "T5-78", "HT",    ":",   "du"
"TNP",   "T5-78", "HT",    "12,524,652",  "dfg"' | 
awk -F '",[ \t]*"' '
    { sub(/^"/, "", $1); sub(/"$/, "", $NF)}
    NR == 1 {
        for (i=1; i<=NF; i++) col[$i] = i
        print "PRIM TRD GTR NUM FLAG1 FLAG2"
        next
    } 
    {
        f = $col["FLAG"] 
        v = $col["VAL"]; gsub(/,/, "", v) 
        num = "null"; flag1 = "null"; flag2 = "null"
    }
    v == ":"      &&  f == "c"   {flag1 = "c"}
    v == ":"      &&  f == "u"   {flag2 = "u"} 
    v == "14385"  &&  f == "d"   {num = $4}
    v == "14385"  &&  f == "du"  {num = $4; flag2 = "u"}
    v == ":"      &&  f == "cd"  {flag1 = "c"}
    v == ":"      &&  f == "bc"  {flag1 = "c"; flag2 = "b"}
    v == ":"      &&  f == "z"   {num = 0; flag2 = "z"}
    {print $col["PRIM"],$col["TRD"],$col["GTR"],num,flag1,flag2}
'
PRIM TRD GTR NUM FLAG1 FLAG2
TPP T5-78 HT null c null
TCP T5-78 HT null null null
TZP T5-78 HT null null z
TNP T5-78 HT null null z
TNP T5-78 HT null c null
TNP T5-78 HT null null null
TNP T5-78 HT null null null

我的输出看起来不像你的。检查您的规格并确保样品输入足以覆盖它们。

答案 1 :(得分:0)

最终的工作解决方案与此完全相同:

BEGIN {
FS=",";
OFS="\t";
}
{
# delete the carriage return character from windows. the magic part
sub(/\015$/,"")
sub(/^"/, "", $1)
 sub(/"$/, "", $NF)
}

NR==1{

        #$NF=substr($NF,1,length($NF)-1);
        for (i=1;i<=NF;i++) col[$i]=i
        print "PRIM TRD GTR NUM FLAG1 FLAG2"
        next
}

{

        f=$col["FLAG"];
        v=$col["NUM"];
        gsub(/,/, "", v)
        gsub(/,/, "", f)
       flag1 = "";
       flag2 = "";

   if(substr(v,1,1) == ":" && substr(f,1,1) == "c")
    {
      flag1 = "c";
      flag2 ="";
    }

if ( ! ( substr(f,1,1) == "dz" || substr(f,1,1) =="du" || substr(f,1,1) =="cd" || substr(f,1,1) =="c" || substr(f,1,1) =="z") || substr(f,1,1) =="u" || substr(f,1,1) =="d" ) )
    {
      flag1 = "";
      flag2 =f;

    }

     print $col["PRIM"],$col["TRD"],$col["GTR"],v,flag1,flag2 ;


}