用awk在指定点处断行

时间:2017-02-19 22:36:20

标签: awk split

我有一个包含多行代码的文件:

{
    "playerForm": [
        {"result": "L", "name": "Dicky", "date": "2017-02-10", "results_id": 48},
        {"result": "L", "name": "Dicky", "date": "2017-02-10", "results_id": 47},
        {"result": "L", "name": "Dicky", "date": "2017-02-09", "results_id": 44},
        {"result": "L", "name": "Dicky", "date": "2017-01-16", "results_id": 32},
        {"result": "D", "name": "Dicky", "date": "2016-12‌​-12", "results_id": 4},
        {"result": "W", "name": "Dicky", "date": "2016-12-12", "results_id": 6},
        {"result": "W", "name": "Dicky", "date": "2016-12-12", "results_id": 8}
    ]
}

我需要在字母重复之后(即在每个a,b,c之后)打破这些行,但保留原始名称(字段1):

name1    a1    b3    c6    a3    b4    c9
name2    a7    b8    c7    a9    b10   c13
name3    a12   b9    c8
name4    a4    b34   c19   a7    b2    c10    a3    b5    c67

我尝试了以下几点:

name1    a1    b3    c6    
name1    a3    b4    c9
name2    a7    b8    c7    
name2    a9    b10   c13
name3    a12   b9    c8
name4    a4    b34   c19   
name4    a7    b2    c10    
name4    a3    b5    c67

但是awk -F"\t" '{ for (i=2;i<=NF;i++) print $1"\t"$i }' file 合并了每个字段,有没有办法对它们进行分组?

谢谢。

3 个答案:

答案 0 :(得分:0)

@ starter5:尝试:

awk 'BEGIN{V["a"];V["b"];V["c"]} /name/{R=$0;next} {Q=$0;gsub(/[[:digit:]]/,"",Q)} (Q in V){if(!W[Q]++){A++}} $0{if(A==1 && $0 && R){$0=R OFS $0};printf("%s %s",$0,(A==3?"\n":OFS));;if(A==3){A="";delete W}}' RS='[ +|\n]'  Input_file

以下是非单一的内衬形式的解决方案。

awk 'BEGIN{
                V["a"];
                V["b"];
                V["c"]
          }
                /name/{
                        R=$0;
                        next
                      }
          {
                Q=$0;
                gsub(/[[:digit:]]/,"",Q)
          }
                (Q in V){
                                if(!W[Q]++){
                                                A++
                                           }
                        }
                $0      {
                                if(A==1 && $0 && R){
                                                        $0=R OFS $0
                                                   };
                                printf("%s %s",$0,(A==3?"\n":OFS));;
                                if(A==3)           {
                                                        A="";
                                                        delete W
                                                   }
                        }
    ' RS='[ +|\n]'    Input_file

所以让我们说我们有跟随Input_file(我改变了最后一行)来测试a,b,c是不是按序列进行的,所以它不会断行,直到找到其中三个,看看它和那么请告诉我。

cat  Input_file
name1    a1    b3    c6    a3    b4    c9
name2    a7    b8    c7    a9    b10   c13
name3    a12   b9    c8
name4    a4    b34   a19   a7    b2    c10    a3    b5    c67

输出如下。

name1 a1  b3  c6
name1 a3  b4  c9
name2 a7  b8  c7
name2 a9  b10  c13
name3 a12  b9  c8
name4 a4  b34  a19  a7  b2  c10
name4 a3  b5  c67

答案 1 :(得分:0)

  

我需要在字母重复之后(即每次之后)打破线条   a,b,c),但保留原始名称(字段1):

<强>输入

$ cat file
name1    a1    b3    c6    a3    b4    c9
name2    a7    b8    c7    a9    b10   c13
name3    a12   b9    c8
name4    a4    b34   c19   a7    b2    c10    a3    b5    c67

<强>输出

$ awk 'function _p(){print $1,s; s=""; split("",p)}{for(i=2; i<=NF; i++){ c=substr($i,1,1);if(c in p)_p(); s = (s?s OFS:"") $i; p[c] }_p()}' file
name1 a1 b3 c6
name1 a3 b4 c9
name2 a7 b8 c7
name2 a9 b10 c13
name3 a12 b9 c8
name4 a4 b34 c19
name4 a7 b2 c10
name4 a3 b5 c67

更好的可读版本

awk '
   function _p()
   {
              print $1,s;
              s=""; 
              split("",p)
   }
   {
      for(i=2; i<=NF; i++)
      { 
              c=substr($i,1,1); 
              if(c in p)_p();
              s = (s?s OFS:"") $i; 
              p[c] 
      }
      _p()
   }
    ' file

$ awk 'function _p(){print $1,s; s=p=""}{for(i=2; i<=NF; i++){ c=substr($i,1,1); if(c==p)_p(); s = (s?s OFS:"") $i; if(!p)p=c }_p()}' file
name1 a1 b3 c6
name1 a3 b4 c9
name2 a7 b8 c7
name2 a9 b10 c13
name3 a12 b9 c8
name4 a4 b34 c19
name4 a7 b2 c10
name4 a3 b5 c67

更好的可读版本

awk '
     function _p()
     {
        print $1,s; 
        s=p=""
     }
     {
        for(i=2; i<=NF; i++)
        { 
            c=substr($i,1,1); 
            if(c==p)_p(); 
            s = (s?s OFS:"") $i; 
            if(!p)p=c 
        }
          _p()
     }' file

答案 2 :(得分:0)

{                                   # for any record
    printf $1                       # print name
    c=substr($2,1,1);               # first letter of group
    printf OFS $2                   # first part of first group
    for(i=3; i<=NF; i++) {          # for all the rest fields
        if(index($i,c) != 1)        # if next group has not started
            printf OFS $i           # print this part on same line
        else                        # otherwise
            printf ORS $1 OFS $i    # print name and this part on next line
    }                               # done for all fields
    printf ORS                      # move to next line
}                                   # done for this record

如果某个字母在组中重复,则不起作用。例如,它不适用于a3 b5 a4 c6 a5 b6 a0 b9的{​​{1}}组。{/ p>

这可以像:

一样运行
a b a c