awk基于列合并行

时间:2017-10-05 17:56:21

标签: awk

想要将基于第一列define("USERID", '456'); 的行合并到行中并格式化输出。打印标题时需要生成$1 例如,安哥拉出现count = 3,巴西出现count = 5,赞比亚出现count = 1。 字段$ 1的最大唯一计数为5,因此需要打印标题5次以获得所有字段的正确标题。

打印输出时,要保留Max Unique count of first field.行订单。 我的实际输入文件曾经变化,如10个字段,12个字段等。

Input.csv

original input file

Desired Output.csv

Country,Network,Details,Amount
Angola,voda,xxx,10
Angola,at&t,xxx,20
Angola,mtn,xxx,30
Brazil,voda,yyy,40
Brazil,voda,yyy,50
Brazil,at&t,yyy,60
Brazil,mtn,yyy,70
Brazil,voda,yyy,80
Zambia,tcl,zzz,90

目前,我使用以下两个命令来获取所需的输出并每次根据实际输入文件中的字段数手动更改计数。

步骤:#1

Country,Network,Details,Amount,Country,Network,Details,Amount,Country,Network,Details,Amount,Country,Network,Details,Amount,Country,Network,Details,Amount
Angola,voda,xxx,10,Angola,at&t,xxx,20,Angola,mtn,xxx,30
Brazil,voda,yyy,40,Brazil,voda,yyy,50,Brazil,at&t,yyy,60,Brazil,mtn,yyy,70,Brazil,voda,yyy,80
Zambia,tcl,zzz,90

步骤:#2

awk 'BEGIN { while (count++<5) header=header "Country,Network,Details,Amount,"; print header }' > output.csv

寻找你的建议......

1 个答案:

答案 0 :(得分:4)

awk One-liner:

awk 'BEGIN{FS=OFS=","}FNR==1{n=$0;next}{a[$1]=($1 in a ? a[$1] OFS:"")$0; if(!($1 in b)){o[++i]=$1}; b[$1]++; mx=mx>b[$1]?mx:b[$1] }END{for(i=1; i<=mx; i++)printf("%s%s",n,i==mx?RS:OFS); for(i=1; i in o; i++)print a[o[i]]}' infile

<强>输入:

$ cat infile
Country,Network,Details,Amount
Angola,voda,xxx,10
Angola,at&t,xxx,20
Angola,mtn,xxx,30
Brazil,voda,yyy,40
Brazil,voda,yyy,50
Brazil,at&t,yyy,60
Brazil,mtn,yyy,70
Brazil,voda,yyy,80
Zambia,tcl,zzz,90

<强>输出:

$ awk 'BEGIN{FS=OFS=","}FNR==1{n=$0;next}{a[$1]=($1 in a ? a[$1] OFS:"")$0; if(!($1 in b)){o[++i]=$1}; b[$1]++; mx=mx>b[$1]?mx:b[$1] }END{for(i=1; i<=mx; i++)printf("%s%s",n,i==mx?RS:OFS); for(i=1; i in o; i++)print a[o[i]]}' infile
Country,Network,Details,Amount,Country,Network,Details,Amount,Country,Network,Details,Amount,Country,Network,Details,Amount,Country,Network,Details,Amount
Angola,voda,xxx,10,Angola,at&t,xxx,20,Angola,mtn,xxx,30
Brazil,voda,yyy,40,Brazil,voda,yyy,50,Brazil,at&t,yyy,60,Brazil,mtn,yyy,70,Brazil,voda,yyy,80
Zambia,tcl,zzz,90

更好的可读性:

awk 'BEGIN{
            FS=OFS=","
     }
     FNR==1{
            n=$0;
            next
     }
     {
           a[$1]=($1 in a ? a[$1] OFS:"")$0;
           if(!($1 in b)){ o[++i]=$1 }; 
           b[$1]++; 
           mx=mx>b[$1]?mx:b[$1] 
     }
    END{
           for(i=1; i<=mx; i++)
               printf("%s%s",n,i==mx?RS:OFS); 

            for(i=1; i in o; i++)
                print a[o[i]]
     }' infile

征求意见:

  

想知道,在哪里更改代码,打印“国家”   如果我不需要打印同一个国家,则仅第一次输出   名字第二次,第三次

$ awk 'BEGIN{FS=OFS=","}FNR==1{n=$0;next}{a[$1]=($1 in a ? a[$1] OFS substr($0,index($0,",")+1) : $0); if(!($1 in b)){o[++i]=$1}; b[$1]++; mx=mx>b[$1]?mx:b[$1] }END{for(i=1; i<=mx; i++)printf("%s%s",i==1?n:substr(n,index(n,",")+1),i==mx?RS:OFS); for(i=1; i in o; i++)print a[o[i]]}' infile
Country,Network,Details,Amount,Network,Details,Amount,Network,Details,Amount,Network,Details,Amount,Network,Details,Amount
Angola,voda,xxx,10,at&t,xxx,20,mtn,xxx,30
Brazil,voda,yyy,40,voda,yyy,50,at&t,yyy,60,mtn,yyy,70,voda,yyy,80
Zambia,tcl,zzz,90

<强>修饰的代码:

awk 'BEGIN{
            FS=OFS=","
     }
     FNR==1{
            n=$0;
            next
     }
     {
           # this line modified
           # look for char pos of comma, 

           a[$1]=($1 in a ? a[$1] OFS substr($0,index($0,",")+1) : $0);

           if(!($1 in b)){ o[++i]=$1 }; 

           b[$1]++; 
           mx=mx>b[$1]?mx:b[$1] 
     }
    END{
           for(i=1; i<=mx; i++)
              # this line modified
              printf("%s%s",i==1?n:substr(n,index(n,",")+1),i==mx?RS:OFS);

            for(i=1; i in o; i++)
                print a[o[i]]
     }' infile

与修改相关的说明:

  • index(in, find)
  

在字符串中搜索第一次出现的字符串find,和   返回该事件开始于的字符中的位置   字符串。

  • substr(string, start [, length ])
      

    返回字符串的长度字符长子字符串,从   字符编号开始。