GETLINE没有在awk脚本中正常工作

时间:2017-11-19 19:24:21

标签: linux bash awk scripting

我在使用awk处理文件方面遇到了麻烦,因为正确地让解析器步入下一行。 awk实用程序的GETLINE指令似乎在我的awk治疗会话中起作用。

输入文件有7类线:

  • 章节标识符,
  • 章节标题,
  • 章节标识符,
  • 章节标题,
  • 投放时间戳,
  • 记录的年龄,
  • 记录的标签

以下是一个示例:

Section 2                                                              Line 0001
Setup The Java IDE                                                     Line 0002
9.                                                                     Line 0003
How To Install The Java Development Kit (JDK) For Ubuntu Linux         Line 0004
0:12                                                                   Line 0005
2 months ago                                                           Line 0006
Installation of the JDK on Linux                                       Line 0007
0:49                                                                   Line 0008
2 months ago                                                           Line 0009
APT-GET update - first run                                             Line 0010
1:12                                                                   Line 0011
2 months ago                                                           Line 0012
Adding the WEBUP8TEAM repository                                       Line 0013
1:37                                                                   Line 0014
2 months ago                                                           Line 0015
APT-GET update - second run                                            Line 0016
2:01                                                                   Line 0017
2 months ago                                                           Line 0018

在输入文件中有几个部分,每个部分包含几个章节。每一章依次有几十条记录,其特点是交付时间戳,年龄和标签。

我的脚本使用六步序列,每个序列都尝试通过实际跳过传递时间戳和年龄来识别正确类型的行并打印它们,并将章节和节标识符与它们相关联单行标题。

awk代码是这样的:

BEGIN { FoundNewSection = 0
    FoundChapterLN = 0
    TimeStampLN = 0
    PrefixString = "000"
      }

  FoundNewSection==1 { TimeStampLN = 0
           FoundNewSection = 0
              PrefixString = ""
     print "\n   Print Line (" $NF "), " PrefixString "    New Section Printed, no NEXT called"
             }

  FoundChapterLN==1  { TimeStampLN = 0
            FoundChapterLN = 0
              PrefixString = ""
      print "\n   Print Line (" $NF ") " PrefixString "   New Chapter Printed, no NEXT called"
             }

  $1=="Section"      { TimeStampLN = 0
           FoundNewSection = 1
              PrefixString = $1
      print "No Print Line (" $NF ") new Section detected, NEXT called"
      GETLINE
      NEXT
             }

  $1 ~ /^([0-9])+\./ { FoundChapterLN = 1
             PrefixString = $1
      print "No Print Line (" $NF ") new Chapther detected, NEXT called"
      GETLINE
      NEXT
             }

  $3=="ago" || $4=="ago" || /^([0-9])+:([0-9])+/ { TimeStampLN = 1
      print "No Print Line (" $NF ") TimeStamp Line detected, NEXT called"
      GETLINE
      NEXT
                         }


  FoundNewSection == 0 && FoundChapterLN == 0 {
      print "   Print Line (" $NF  ") is normal Line"
                      }

END { print "END : " NR }

当它识别行的类型(第19行,第27行和第34行中的测试)时,应该有效地结束当前行处理,然后转到下一行。我以为" GETLINE; NEXT"这对命令可以做到,但似乎我错了。

经过第一次测试后,我发现尽管有" GETLINE; NEXT"说明,awk实际上并没有跳到下一行,而是继续在同一行上工作,直到它结束整个脚本。这给了我这个输出

<Blank LIne>
No Print Line (0001) new Section detected, NEXT called
<Blank LIne>
   Print Line (0002),     New Section Printed, no NEXT called
   Print Line (0002) is normal Line
No Print Line (0003) new Chapther detected, NEXT called
<Blank LIne>
   Print Line (0004)    New Chapter Printed, no NEXT called
   Print Line (0004) is normal Line
No Print Line (0005) TimeStamp Line detected, NEXT called
   Print Line (0005) is normal Line
No Print Line (0006) TimeStamp Line detected, NEXT called
   Print Line (0006) is normal Line
   Print Line (0007) is normal Line
No Print Line (0008) TimeStamp Line detected, NEXT called
   Print Line (0008) is normal Line
No Print Line (0009) TimeStamp Line detected, NEXT called
   Print Line (0009) is normal Line
   Print Line (0010) is normal Line
No Print Line (0011) TimeStamp Line detected, NEXT called
   Print Line (0011) is normal Line
No Print Line (0012) TimeStamp Line detected, NEXT called
   Print Line (0012) is normal Line
   Print Line (0013) is normal Line
No Print Line (0014) TimeStamp Line detected, NEXT called
   Print Line (0014) is normal Line
No Print Line (0015) TimeStamp Line detected, NEXT called
   Print Line (0015) is normal Line
   Print Line (0016) is normal Line
No Print Line (0017) TimeStamp Line detected, NEXT called
   Print Line (0017) is normal Line
No Print Line (0018) TimeStamp Line detected, NEXT called
   Print Line (0018) is normal Line

而不是这个

<Blank LIne>
No Print Line (0001) new Section detected, NEXT called
<Blank LIne>
   Print Line (0002),     New Section Printed, no NEXT called
No Print Line (0003) new Chapther detected, NEXT called
<Blank LIne>
   Print Line (0004)    New Chapter Printed, no NEXT called
No Print Line (0005) TimeStamp Line detected, NEXT called
No Print Line (0006) TimeStamp Line detected, NEXT called
   Print Line (0007) is normal Line
No Print Line (0008) TimeStamp Line detected, NEXT called
No Print Line (0009) TimeStamp Line detected, NEXT called
   Print Line (0010) is normal Line
No Print Line (0011) TimeStamp Line detected, NEXT called
No Print Line (0012) TimeStamp Line detected, NEXT called
   Print Line (0013) is normal Line
No Print Line (0014) TimeStamp Line detected, NEXT called
No Print Line (0015) TimeStamp Line detected, NEXT called
   Print Line (0016) is normal Line
No Print Line (0017) TimeStamp Line detected, NEXT called
No Print Line (0018) TimeStamp Line detected, NEXT called
实际上,尽管有&#34; GETLINE; NEXT&#34;它仍然使用当前行的内容和同一行的awk代码的其余部分。

我做错了&#34; GETLINE; NEXT&#34;命令?

0 个答案:

没有答案