正则表达式中的真正反转

时间:2012-03-14 15:46:15

标签: regex unix sed awk

我有一个文本文件,列出了任何可能的问题。它始终以URL开头,然后以Result和任何错误代码(如果有)结束。我想要做的是通过一个txt文件并获取所有错误:404 Not Found文本块并将所有这些输出到一个单独的文本文件中。我发现了这个:

  

awk'/ URL /,/ 404未找到/'text.txt> only404.txt

问题是它找到了URL然后停止查找,直到它到达404 Not Found,在下面的情况下还包括Valid:200 OK ...我真正想做的是搜索404 Not Found then反转自己,直到它到达URL。然后它会工作。有什么想法吗?

    URL //fonts.googleapis.com/css?family=Lato:300,400,400italic,700'
    Parent URL http://example.com, line 12, col 1
    Real URL   http://fonts.googleapis.com/css?family=Lato:300,400,400italic,700
    Check time 1.863 seconds
    Warning    Access denied by robots.txt, skipping content checks.
    Result     Valid: 200 OK

    URL   `/image.png'
    Parent URL http://example.com/styles.css, line 1380, col 17
    Real URL   http://example.com/image.png
    Check time 0.443 seconds
    Size       1KB
    Result     Error: 404 Not Found

2 个答案:

答案 0 :(得分:3)

这可能适合你:

 awk -v RS="" '/404 Not Found/' yourFile

测试:这是你想要的吗?

kent$  cat t
    URL //fonts.googleapis.com/css?family=Lato:300,400,400italic,700'
    Parent URL http://example.com, line 12, col 1
    Real URL   http://fonts.googleapis.com/css?family=Lato:300,400,400italic,700
    Check time 1.863 seconds
    Warning    Access denied by robots.txt, skipping content checks.
    Result     Valid: 200 OK

    URL   `/image.png'
    Parent URL http://example.com/styles.css, line 1380, col 17
    Real URL   http://example.com/image.png
    Check time 0.443 seconds
    Size       1KB
    Result     Error: 404 Not Found

kent$  awk -v RS="" '/404 Not Found/' t
    URL   `/image.png'
    Parent URL http://example.com/styles.css, line 1380, col 17
    Real URL   http://example.com/image.png
    Check time 0.443 seconds
    Size       1KB
    Result     Error: 404 Not Found

答案 1 :(得分:1)

这可能对您有用:

sed '/^\s*URL/,/^\s*Result/{/^\s*URL/{h;d};H;/Error: 404/{g;b}};d' file
    URL   `/image.png'
    Parent URL http://example.com/styles.css, line 1380, col 17
    Real URL   http://example.com/image.png
    Check time 0.443 seconds
    Size       1KB
    Result     Error: 404 Not Found