一个正则表达式来grep文件的特定段落

时间:2012-06-13 06:33:50

标签: regex shell unix scripting grep

嗨,我正在研究一个shellcript .. 假设这是我的shell脚本在

上运行的数据
      Ownership
               o Australian Owned
   ?
   Ads for Mining Engineers
   232 results for
mining engineers in All States
   filtered by Mining Engineers [x] category
     * [ ]
                    [34]get directions
       Category:
       [35]Mining Engineers
       [36]Arrow Electrical Services in Wollongong, NSW under Mining
       Engineers logo
            [37]email
            [38]send to mobile
            [39]info
            Compare (0)
     * [ ]
       . [40]Firefly International
       Designers & Manufacturers. Service, Repair & Hire.
       We are the provider of mining engineers in Mt Thorley, NSW.
       25 Thrift Cl, Mt Thorley NSW 2330
       ph: (02) 6574 6660
            [41]http://www.fireflyint.com.au
            [42]get directions
       Category:
       [43]Mining Engineers
       [44]Firefly International in Mt Thorley, NSW under Mining Engineers
       logo
            [45]email
            [46]send to mobile
            [47]info
            Compare (0)
     * [ ]
       [48]Materials Solutions
       Materials Research & Development, Slurry Rheology & Piping Design.
       We are a well established company servicing the mining industry &
       associated manufacturing industries in all areas.
       Thornlie WA 6108
       ph: (08) 6468 4118
            [49]www.materialssolutions.com.au
       Category:
       [50]Mining Engineers
       [51]Materials Solutions in Thornlie, WA under Mining Engineers logo
            [52]email
            [53]send to mobile
            [54]info
            Compare (0)
     * [ ]
       . [55]ATC Williams Pty Ltd
       Our services are available from concept to completion of the works.
       Today, as the rebranded ATC Williams, we continue to expand our
       operations across Australia and in locations around the world.
       Unit 1, 21 Teddington Rd, Burswood WA 6100
       ph: (08) 9355 1383
            [56]www.atcwilliams.com.au
            [57]get directions
       Category:
       [58]Mining Engineers
       [59]ATC Williams Pty Ltd in Burswood, WA under Mining Engineers
       logo
            [60]email
            [61]send to mobile
            [62]info
            Compare (0)

我需要抓住看起来像这样的地址

 * [ ]
       . [55]ATC Williams Pty Ltd
       Our services are available from concept to completion of the works.
       Today, as the rebranded ATC Williams, we continue to expand our
       operations across Australia and in locations around the world.
       Unit 1, 21 Teddington Rd, Burswood WA 6100
       ph: (08) 9355 1383
            [56]www.atcwilliams.com.au

所以我该怎么做.. 我一直在研究像

这样的正则表达式
  

^ *(。?[\ W \ W?\ S?] *)+(。com.au)$

但这没有帮助..它匹配地址当我给输入文件我想要的地址匹配..但是当批量给出时,它没有帮助。 有人可以帮帮我..

2 个答案:

答案 0 :(得分:1)

我发现你的正则表达式存在一些问题

^*(.?[\w\W?\s?]*)+(.com.au)$
 ^ ^           ^ ^ ^   ^
 1 1           2 2 1   1
  1. 需要转义的特殊字符

  2. 贪婪的量词,匹配所有内容直到最后一个“.com.au”,在量词后添加?使其成为不合理的==>匹配尽可能少(意味着直到在行结束时找到的第一个“.com.au”)。

    ==>这是你的主要问题

  3. 您嵌套量词*)+,您不需要

  4. 在您的示例中,“*”和“。”之间有空格,因此要么匹配空格,要么删除点,它将与您的角色类匹配。

  5. 行的开头和“*”之间还有空格

  6. 所以,试试这个

        ^\s*\*([\w\W?\s?]*?)(\.com\.au)$
    

    here on Regexr

答案 1 :(得分:0)

试试这个

^\s*\*\s*\[ \][^\*]+?[.]com[.]au$

<强>解释

^        # Assert position at the beginning of a line (at beginning of the string or after a line break character)
\s       # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   *        # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\*       # Match the character “*” literally
\s       # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   *        # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\[       # Match the character “[” literally
\        # Match the character “ ” literally
\]       # Match the character “]” literally
[^\*]    # Match any character that is NOT a * character
   +?       # Between one and unlimited times, as few times as possible, expanding as needed (lazy)
[.]      # Match the character “.”
com      # Match the characters “com” literally
[.]      # Match the character “.”
au       # Match the characters “au” literally
$        # Assert position at the end of a line (at the end of the string or before a line break character)