我有一个文本文件,列出了任何可能的问题。它始终以URL开头,然后以Result和任何错误代码(如果有)结束。我想要做的是通过一个txt文件并获取所有错误:404 Not Found文本块并将所有这些输出到一个单独的文本文件中。我发现了这个:
awk'/ URL /,/ 404未找到/'text.txt> only404.txt
问题是它找到了URL然后停止查找,直到它到达404 Not Found,在下面的情况下还包括Valid:200 OK ...我真正想做的是搜索404 Not Found then反转自己,直到它到达URL。然后它会工作。有什么想法吗?
URL //fonts.googleapis.com/css?family=Lato:300,400,400italic,700'
Parent URL http://example.com, line 12, col 1
Real URL http://fonts.googleapis.com/css?family=Lato:300,400,400italic,700
Check time 1.863 seconds
Warning Access denied by robots.txt, skipping content checks.
Result Valid: 200 OK
URL `/image.png'
Parent URL http://example.com/styles.css, line 1380, col 17
Real URL http://example.com/image.png
Check time 0.443 seconds
Size 1KB
Result Error: 404 Not Found
答案 0 :(得分:3)
这可能适合你:
awk -v RS="" '/404 Not Found/' yourFile
测试:这是你想要的吗?
kent$ cat t
URL //fonts.googleapis.com/css?family=Lato:300,400,400italic,700'
Parent URL http://example.com, line 12, col 1
Real URL http://fonts.googleapis.com/css?family=Lato:300,400,400italic,700
Check time 1.863 seconds
Warning Access denied by robots.txt, skipping content checks.
Result Valid: 200 OK
URL `/image.png'
Parent URL http://example.com/styles.css, line 1380, col 17
Real URL http://example.com/image.png
Check time 0.443 seconds
Size 1KB
Result Error: 404 Not Found
kent$ awk -v RS="" '/404 Not Found/' t
URL `/image.png'
Parent URL http://example.com/styles.css, line 1380, col 17
Real URL http://example.com/image.png
Check time 0.443 seconds
Size 1KB
Result Error: 404 Not Found
答案 1 :(得分:1)
这可能对您有用:
sed '/^\s*URL/,/^\s*Result/{/^\s*URL/{h;d};H;/Error: 404/{g;b}};d' file
URL `/image.png'
Parent URL http://example.com/styles.css, line 1380, col 17
Real URL http://example.com/image.png
Check time 0.443 seconds
Size 1KB
Result Error: 404 Not Found