awk:计算两个模式之间特定模式的出现次数

时间:2017-05-30 18:04:24

标签: awk

我有一个包含以下内容的文件:

pattern1  
pattern2  
pattern3  
blah  
blah  
pattern3  
pattern3  
blah  
pattern3  
pattern1  
pattern2  
blah  
pattern3  
pattern1  
pattern2  
blah  
pattern3  
blah  
pattern3  

我想在pattern1, pattern2, pattern3pattern3的每一组中打印pattern1以及pattern2的总出现次数。换句话说,期望的结果:

pattern1  
pattern2  
pattern3: 4  
pattern1  
pattern2  
pattern3: 1  
pattern1  
pattern2  
pattern3: 2  

如何通过awk实现上述目标?

5 个答案:

答案 0 :(得分:1)

假设pattern2行总是在pattern1行之后(考虑到pattern 3两行中没有pattern 1 pattern 2行的awk '/pattern2/{ if(p){ print "pattern3: "a[p] } p++; print "pattern1" ORS $0 } /pattern3/ && p{a[p]++}' file 行应该被忽略):

pattern1
pattern2
pattern3: 4
pattern1
pattern2
pattern3: 1
pattern1
pattern2

输出:

/pattern2/
  • pattern2 - 对遇到与pattern1匹配的每一行执行操作(p++;之后)

    - p - 递增标记pattern 1 pattern 2(指向当前/pattern3/ && p子集)

  • pattern3 - 对每个遇到符合p的行与"有效"执行操作国旗a[p]++

    - pattern3 - 计算当前子集的where (procedureCode = '1234' AND ServiceDate Between DATEADD(year, -1, getdate()) and getdate()) or (procedureCode = '5678' AND ServiceDate Between DATEADD(year, -4, getdate()) and getdate()) or .. etc

答案 1 :(得分:1)

如果你想检查pattern1和pattern2 ......

$ awk 'function pr()   {if(c) print p3":",c}
           /^pattern1/ {p1=$0; pr(); s=c=0; next}
     p1 && /^pattern2/ {print p1 ORS $0; s=1; next}
      s && /^pattern3/ {c++; p3=$0; next}
     END               {pr()}' file

pattern1
pattern2
pattern3: 4
pattern1
pattern2
pattern3: 1
pattern1
pattern2
pattern3: 2

答案 2 :(得分:1)

你的问题不清楚所以我在这里做了几个假设但是@RomanPerekherst让我发布一个解决方案,所以这里给出了一个可能的解释你的要求:

假设您要对所有“模式”进行正则表达式比较,并希望打印输入中与“pattern1”和“pattern2”匹配的行,并且正好是字符串“pattern3”:

$ cat tst.awk
prev~/pattern1/ && /pattern2/ { prt(); hdr=prev ORS $0 }
/pattern3/ { cnt++ }
{ prev=$0 }
END { prt() }
function prt() { if (hdr!="") print hdr ORS "pattern3:", cnt+0; cnt=0 }

$ awk -f tst.awk file
pattern1
pattern2
pattern3: 4
pattern1
pattern2
pattern3: 1
pattern1
pattern2
pattern3: 2

以上也假设虽然你没有在你的例子中显示它,但是pattern2可以与输入中的pattern1分开出现,如果它自己发生,它应该被忽略。如果那不是真的并且它们总是一起出现那么当然解决方案可以更简单,因为那时你不需要对它们进行测试。

答案 3 :(得分:1)

尝试:在这里提供2个解决方案。

第一个解决方案:如果您不想检查pattern1和pattern2是否存在,只需要计算每批中的字符串模式3的计数pattern1到下一次出现的pattern1,然后跟随可以帮助你。

awk '/^pattern2/{
                        print;
                        next
                }
     /^pattern3/{
                        y++;
                        next
                }
     /^pattern1/ && A{
                        print "pattern3: "y;
                        y=A=""
                     }
     /^pattern1/{
                        print;
                        A++;
                }
     END{
                if(A){
                        print "pattern3: "y;
                     }
        }
    '    Input_file

第二个解决方案:如果你想检查pattern1是否存在,然后pattern1应该在每次出现pattern1到下一次出现的pattern1时出现,那么下面的内容可能对你有帮助。

awk '/^pattern2/ && A{
                        VAL=VAL ORS $0;
                        B++;
                        next
                     }
     /^pattern3/ && B{
                        y++;
                        next
                     }
     /^pattern1/ && A && B{
                                print VAL ORS "pattern3: ",y=y?y:0;
                                y=A=B=VAL=""
                          }
     /^pattern1/{
                        VAL=$0;
                        A++;
                }
     END{
                if(A && B){
                        print VAL ORS "pattern3: ",y=y?y:0;
                     }
        }
    '  Input_file

将很快添加解释。

EDIT1:此处也添加解决方案1的说明。

awk '/^pattern2/{                             ##### Checking if current line which starts from pattern2.
                        print;                ##### Then print that line.
                        next                  ##### mentioning next keyword of awk will skip all further statements on this/current line reading and will take the awk's cursor to next line then.
                }
     /^pattern3/{                             ##### Checking if current line which starts from pattern3.
                        y++;                  ##### Incrementing a variable with value 1 each time cursor comes here, so count the values of pattern3 string.
                        next                  ##### next keyword will skip all the further statements for the current line and will take the cursor of awk to next line.
                }
     /^pattern1/ && A{                        ##### Checking if current line which starts from string pattern1 and NOT having NULL value of variable named A.
                        print "pattern3: "y;  ##### If above condition is TRUE then print the pattern3 string and variable y's value, which is the count of pattern3 string.
                        y=A=""                ##### Nullifying the values of y and A here.
                     }
     /^pattern1/{                             ##### Checking if current line starts from string pattern1 here.
                        print;                ##### printing the line.
                        A++;                  ##### Incrementing the value of variable named A with 1, each time it comes in this section.
                }
     END{                                     ##### starting the END block of awk code here.
                if(A){                        ##### Checking if variable A's value is present OR NOT NULL here.
                        print "pattern3: "y;  ##### Then print the string pattern3 and y's value.
                     }
        }
    '    Input_file                           ##### Mentioning the Input_file here.

EDIT2:现在也在这里添加第二个解决方案的解释。

awk '/^pattern2/ && A{                                                ##### Checking if current line starts from string pattern2 and value of variable A is NOT NULL. If both conditions are TRUE then go to following section.
                        VAL=VAL ORS $0;                               ##### Creating a variable named VAL, whose value will be current line's value.
                        B++;                                          ##### Creating a variable named B, whose value will get incremented each time cursor comes in this section.
                        next                                          ##### next keyword of awk will skip all the further statements for the current line which awk is reading.
                     }
     /^pattern3/ && B{                                                ##### Checking if current line starts from string pattern3 and value of variable B is NOT NULL, if yes then enter into the following code section of it, same like if condition.
                        y++;                                          ##### Incrementing the variable named y's value with 1 each time cursor enters into this section, to count the pattern3's value.
                        next                                          ##### next keyword of awk will skip all the further statements for the current line which awk is reading.
                     }
     /^pattern1/ && A && B{                                           ##### Checking if current line starts from pattern1 and value of variables A and B is NOT NULL. If yes, then perform following section.
                                print VAL ORS "pattern3: ",y=y?y:0;   ##### printing the value of variable VAL then ORS(output record separator, whose default value is a new line), string pattern3 with variable y's value. Here I am checking if variable y is having value then simply print it else print 0 on it's place.
                                y=A=B=VAL=""                          ##### Nullifying the values of variables y,A,B,VAL here.
                          }
     /^pattern1/{                                                     ##### Checking if current line starts from sting pattern1, then enter into following section of code.
                        VAL=$0;                                       ##### Assigning the value of variable named VAL to current line.
                        A++;                                          ##### Incrementing the value of variable named A to 1, each time it cursor comes into this section.
                } 
     END{                                                             ##### Mentioning END section of awk code here.
                if(A && B){                                           ##### Checking if variables A and B both are NOT NULL.
                        print VAL ORS "pattern3: ",y=y?y:0;           ##### Then printing the value of variable VAL, ORS then string pattern3 and value of y(if y is having values then y else 0 on it's place).
                     }
        }
    '  Input_file                                                     ##### mentioning the Input_file here.

答案 4 :(得分:1)

您可以通过在awk命令中添加一些if语句来实现目标。 请参阅:

awk 'BEGIN{n=0}                                                                                       
{
  if ($1 == "pattern1" || $1 == "pattern2" )
  { 
    if (n != 0)
    {
      printf "pattern3:%d\n",n;
      n=0;
    }
    print $1
  }
  if ($1 == "pattern3") n++
}
END{
  if (n != 0)
  {
    printf "pattern3:%d\n",n;
  }
}' file