我在使用awk处理文件方面遇到了麻烦,因为正确地让解析器步入下一行。 awk实用程序的GETLINE指令似乎在我的awk治疗会话中起作用。
输入文件有7类线:
以下是一个示例:
Section 2 Line 0001
Setup The Java IDE Line 0002
9. Line 0003
How To Install The Java Development Kit (JDK) For Ubuntu Linux Line 0004
0:12 Line 0005
2 months ago Line 0006
Installation of the JDK on Linux Line 0007
0:49 Line 0008
2 months ago Line 0009
APT-GET update - first run Line 0010
1:12 Line 0011
2 months ago Line 0012
Adding the WEBUP8TEAM repository Line 0013
1:37 Line 0014
2 months ago Line 0015
APT-GET update - second run Line 0016
2:01 Line 0017
2 months ago Line 0018
在输入文件中有几个部分,每个部分包含几个章节。每一章依次有几十条记录,其特点是交付时间戳,年龄和标签。
我的脚本使用六步序列,每个序列都尝试通过实际跳过传递时间戳和年龄来识别正确类型的行并打印它们,并将章节和节标识符与它们相关联单行标题。
awk代码是这样的:
BEGIN { FoundNewSection = 0
FoundChapterLN = 0
TimeStampLN = 0
PrefixString = "000"
}
FoundNewSection==1 { TimeStampLN = 0
FoundNewSection = 0
PrefixString = ""
print "\n Print Line (" $NF "), " PrefixString " New Section Printed, no NEXT called"
}
FoundChapterLN==1 { TimeStampLN = 0
FoundChapterLN = 0
PrefixString = ""
print "\n Print Line (" $NF ") " PrefixString " New Chapter Printed, no NEXT called"
}
$1=="Section" { TimeStampLN = 0
FoundNewSection = 1
PrefixString = $1
print "No Print Line (" $NF ") new Section detected, NEXT called"
GETLINE
NEXT
}
$1 ~ /^([0-9])+\./ { FoundChapterLN = 1
PrefixString = $1
print "No Print Line (" $NF ") new Chapther detected, NEXT called"
GETLINE
NEXT
}
$3=="ago" || $4=="ago" || /^([0-9])+:([0-9])+/ { TimeStampLN = 1
print "No Print Line (" $NF ") TimeStamp Line detected, NEXT called"
GETLINE
NEXT
}
FoundNewSection == 0 && FoundChapterLN == 0 {
print " Print Line (" $NF ") is normal Line"
}
END { print "END : " NR }
当它识别行的类型(第19行,第27行和第34行中的测试)时,应该有效地结束当前行处理,然后转到下一行。我以为" GETLINE; NEXT"这对命令可以做到,但似乎我错了。
经过第一次测试后,我发现尽管有" GETLINE; NEXT"说明,awk实际上并没有跳到下一行,而是继续在同一行上工作,直到它结束整个脚本。这给了我这个输出
<Blank LIne>
No Print Line (0001) new Section detected, NEXT called
<Blank LIne>
Print Line (0002), New Section Printed, no NEXT called
Print Line (0002) is normal Line
No Print Line (0003) new Chapther detected, NEXT called
<Blank LIne>
Print Line (0004) New Chapter Printed, no NEXT called
Print Line (0004) is normal Line
No Print Line (0005) TimeStamp Line detected, NEXT called
Print Line (0005) is normal Line
No Print Line (0006) TimeStamp Line detected, NEXT called
Print Line (0006) is normal Line
Print Line (0007) is normal Line
No Print Line (0008) TimeStamp Line detected, NEXT called
Print Line (0008) is normal Line
No Print Line (0009) TimeStamp Line detected, NEXT called
Print Line (0009) is normal Line
Print Line (0010) is normal Line
No Print Line (0011) TimeStamp Line detected, NEXT called
Print Line (0011) is normal Line
No Print Line (0012) TimeStamp Line detected, NEXT called
Print Line (0012) is normal Line
Print Line (0013) is normal Line
No Print Line (0014) TimeStamp Line detected, NEXT called
Print Line (0014) is normal Line
No Print Line (0015) TimeStamp Line detected, NEXT called
Print Line (0015) is normal Line
Print Line (0016) is normal Line
No Print Line (0017) TimeStamp Line detected, NEXT called
Print Line (0017) is normal Line
No Print Line (0018) TimeStamp Line detected, NEXT called
Print Line (0018) is normal Line
而不是这个
<Blank LIne>
No Print Line (0001) new Section detected, NEXT called
<Blank LIne>
Print Line (0002), New Section Printed, no NEXT called
No Print Line (0003) new Chapther detected, NEXT called
<Blank LIne>
Print Line (0004) New Chapter Printed, no NEXT called
No Print Line (0005) TimeStamp Line detected, NEXT called
No Print Line (0006) TimeStamp Line detected, NEXT called
Print Line (0007) is normal Line
No Print Line (0008) TimeStamp Line detected, NEXT called
No Print Line (0009) TimeStamp Line detected, NEXT called
Print Line (0010) is normal Line
No Print Line (0011) TimeStamp Line detected, NEXT called
No Print Line (0012) TimeStamp Line detected, NEXT called
Print Line (0013) is normal Line
No Print Line (0014) TimeStamp Line detected, NEXT called
No Print Line (0015) TimeStamp Line detected, NEXT called
Print Line (0016) is normal Line
No Print Line (0017) TimeStamp Line detected, NEXT called
No Print Line (0018) TimeStamp Line detected, NEXT called
实际上,尽管有&#34; GETLINE; NEXT&#34;它仍然使用当前行的内容和同一行的awk代码的其余部分。
我做错了&#34; GETLINE; NEXT&#34;命令?