Regex .NET尝试捕获具有重复前瞻的组

时间:2013-12-08 19:18:20

标签: .net regex regex-lookarounds

请注意我在这里使用.NET正则表达式引擎

以下是解析字符串:

    <div class="c411Listing" onmouseover="ResidentialListings.enhanceListing(this, 1);" onmouseout="ResidentialListings.degradeListing(this, 1);">

    <div id="Contact1" class="listingDetail">

        <span id="ContactName1" class="c411ListedName"><a href="/res/5068300124/P-DESCHESNES/184421926.html" onclick="utagsave();" onmousedown="utag.link({link_name:'person_name', link_attr1:'in_listing'})" title="P DESCHESNES  on 85 Red Pine Dr">P DESCHESNES</a></span>

        <span class="c411Phone" id="ContactPhone1">(506) 830-2224</span>

        <span class="c411ListingGeo"><span class="adr" id="ContactAddress1">85 Fictive Dr NB</span></span>


        <a class="c411GetDirections c411NoPrint" id="ContactDirections1" href="/map/mapSearch.html?layers=dir&amp;from=85+Red+Pine+Dr+NB&amp;what=P+Deschesnes&amp;where=Canada" onmousedown="utag.link({link_name:'direction', link_attr1:'in_listing'});" rel="nofollow">Get directions&nbsp;<span>&rarr;</span></a>


    </div>
    <div class="c411HoverMarker c411NoPrint" style="display:none;">
        <a href="/res/5068300124/P-DESCHESNES/184421926.html" title="P DESCHESNES"><span>&nbsp;</span></a>
    </div>
</div>




<div class="c411Listing" onmouseover="ResidentialListings.enhanceListing(this, 2, 0);" onmouseout="ResidentialListings.degradeListing(this, 2, 0);">

    <div id="Contact2" class="listingDetail">

        <span id="ContactName2" class="c411ListedName"><a href="/res/4189883202/P-Deschesnes/179906536.html" onclick="utagsave();" onmousedown="utag.link({link_name:'person_name', link_attr1:'in_listing'})" title="P Deschesnes  on 6585 Rue des Orchid&eacute;es">P Deschesnes</a></span>

        <span class="c411Phone" id="ContactPhone2">(418) 987-3202</span>

        <span class="c411ListingGeo"><span class="adr" id="ContactAddress2">1000 Rue des Fictive QC G1X 3Z5</span></span>


        <a class="c411GetDirections c411NoPrint" id="ContactDirections2" href="/map/mapSearch.html?layers=dir&amp;from=1000+Rue+des+Orchid%C3%A9esFictive+QC+G1X+3Z5&amp;what=P+Deschesnes&amp;where=Canada" onmousedown="utag.link({link_name:'direction', link_attr1:'in_listing'});" rel="nofollow">Get directions&nbsp;<span>&rarr;</span></a>


    </div>
    <div class="c411HoverMarker c411NoPrint" style="display:none;">
        <a href="/res/4189883202/P-Deschesnes/179906536.html" title="P Deschesnes"><span>&nbsp;</span></a>
    </div>
</div>




<div class="c411Listing" onmouseover="ResidentialListings.enhanceListing(this, 3, 0);" onmouseout="ResidentialListings.degradeListing(this, 3, 0);">

    <div id="Contact3" class="listingDetail">

        <span id="ContactName3" class="c411ListedName"><a href="/res/4506702257/P-DESCHESNES/181606171.html" onclick="utagsave();" onmousedown="utag.link({link_name:'person_name', link_attr1:'in_listing'})" title="P DESCHESNES  on 1736 Rue Saint-Alexandre">P DESCHESNES</a></span>

        <span class="c411Phone" id="ContactPhone3">(450) 671-1111</span>

        <span class="c411ListingGeo"><span class="adr" id="ContactAddress3">1736 Rue Fictive Longueuil QC J1J 1T2</span></span>


        <a class="c411GetDirections c411NoPrint" id="ContactDirections3" href="/map/mapSearch.html?layers=dir&amp;from=1000+Rue+Saint-Fictive+Longueuil+QC+J1J+1T1&amp;what=P+Deschesnes&amp;where=Canada" onmousedown="utag.link({link_name:'direction', link_attr1:'in_listing'});" rel="nofollow">Get directions&nbsp;<span>&rarr;</span></a>


    </div>
    <div class="c411HoverMarker c411NoPrint" style="display:none;">
        <a href="/res/4506702257/P-DESCHESNES/181606171.html" title="P DESCHESNES"><span>&nbsp;</span></a>
    </div>
</div>

您可以在此处查看重复模式。我希望每个联系人(1,2,3)与3个组内的匹配:联系人姓名,电话和地址。

对于这个例子,我应该得到3个匹配,每个匹配包含姓名,电话和地址,但由于某些原因,我只得到最后一个电话和地址。

到目前为止我的.NET正则表达式:

(?si)(?(?=.*<div id="Contact[\d{1,2}]").*<span id="ContactName[\d{1,2}]\".*title=.*>(.*)</a>.*id="ContactPhone[\d{1,2}]">(.*)</span>.*id="ContactAddress[\d{1,2}]\">(.*)</span>)

你能告诉我我做错了什么吗?

1 个答案:

答案 0 :(得分:0)

对于HTML的非常简单的片段,正则表达式可能很有用。对于更广泛的内容,如您的示例,像Html Agility Pack这样的HTML解析器可能是最强大的解决方案。

有理由不尝试使用正则表达式解析HTML:Using regular expressions to parse HTML: why not?