Question

我有一个大文件，其中的部分用###标记分隔。输出有一个空行，一些随机行，另一个空白行，然后我想要一个字的计数行，然后是另一个空行。我可以获得我想要的数据，但我正在使用几个管道，我认为使用awk或sed one liner可以做得更好。你能帮忙吗？

文件示例

...  
sdf  
asdf  
asdf  
asdf  


###################### Usage ###########

a  
asdf    

asdf    
asdf    
70  
80  
90  
100  

################



sfad

asdf  
asff  
...

我的命令是：

awk '/Usage/{flag=1; next}/####/{flag=0} flag' *|
  sed  -n '0,/^$/! p'|
    awk '/^$/{flag=1; next}/^$/{flag=0} flag'|
      wc -l

第一个awk拉出我想要使用的文件中的那部分数据。 sed会跳过第一个空白行。第二个awk将第二个空行与第三个空行之间的数据拉出。然后我得到了我的字数。

Answer 1

尝试以下操作（如果需要，可以将其转换为单行）：

awk '
   FNR==1 { blankCount = inSection = count = 0 } # initialize vars. for every file
   /^#+ Usage #/ { inSection=1; next }           # section start, set flag
   inSection && /^#+/ { print count; nextfile }  # section end, print result, proceed to next file
   inSection {                            # a line inside the section of interest
     if (NF == 0)  { ++blankCount; next } # a blank line, count it and skip
     if (blankCount == 2) { count+=NF }   # a line after the 2nd blank one, count its words
   }
' file                                    # supports multiple input files

注意：

该代码假定每个部分中至少有3个空白行，并且第2个空白行与第3个空白行之间的单词/非空白行或该部分的结尾（以先到者为准）应该是被计算在内使用示例输入，结果为6。
代码计算单词（以空格分隔的字段，反映在NF中，字段数）而不是行，但由于您的输入每行只有一个单词，行数（这是wc -l命令所做的）与单词计数相同。

Answer 2

$ cat tst.awk
/Usage/ { inBlock=1 }
inBlock {
    if (NF) {
        if (numEmptyLines == 2) {
            numWords += NF
        }
    }
    else {
        if (++numEmptyLines == 3) {
           print numWords+0
           inBlock = numEmptyLines = numWords = 0
        }
    }
}

$ awk -f tst.awk file
6

将多实用程序命令优化为单实用程序命令

2 个答案: