我想用正则表达式提取alt文本的多个实例但不确定如何

时间:2012-10-12 08:26:26

标签: regex extract phrases

我使用此(?<=alt)[\w\s\,\/\(\)\.]*来提取第一个替代文字。这很棒,但我想提取多个替代文本。 我在visual web ripper中使用正则表达式

我从中提取的代码是

<DIV id=ctl00_ContentRightColumn_CustomFunctionalityFieldControl1_ctl00_ctl00_woodFeatures class="woodFeaturesPanel woodFeaturesPanelSingle" sizcache="23614" sizset="0"><H2>Features:</H2>  <DIV sizcache="23614" sizset="0">  <UL sizcache="23614" sizset="0">  <LI sizcache="23386" sizset="0"><IMG alt="Information board at site" src="/PublishingImages/icon_infoboard.gif">  <LI sizcache="20558" sizset="0"><IMG alt="Parking nearby" src="/PublishingImages/icon_carparknear.gif">  <LI sizcache="23614" sizset="0"><IMG alt=Grassland src="/PublishingImages/icon_grassland.giF">  <LI sizcache="17694" sizset="0"><IMG alt="Is woodland creation site" src="/PublishingImages/icon_woodlandcreation.gif">  <LI sizcache="21680" sizset="0"><IMG alt="Mainly broadleaved woodland" src="/PublishingImages/icon_mainlybroadleaved.gif">  <LI sizcache="20704" sizset="0"><IMG alt="Mainly young woodland" src="/PublishingImages/icon_mainlyyoung.gif">  <LI>  <LI></LI></UL></DIV></DIV>

1 个答案:

答案 0 :(得分:0)

如果没有语言,这很难说,但使用记忆模式可以捕捉到你需要的东西:

/alt=(\w\S*|"([^"]*)")/

使用preg_match_all(),它会得到以下结果:

Array
(
    [0] => Array
        (
            [0] => alt="Information board at site"
            [1] => alt="Parking nearby"
            [2] => alt=Grassland
            [3] => alt="Is woodland creation site"
            [4] => alt="Mainly broadleaved woodland"
            [5] => alt="Mainly young woodland"
        )

    [1] => Array
        (
            [0] => "Information board at site"
            [1] => "Parking nearby"
            [2] => Grassland
            [3] => "Is woodland creation site"
            [4] => "Mainly broadleaved woodland"
            [5] => "Mainly young woodland"
        )

    [2] => Array
        (
            [0] => Information board at site
            [1] => Parking nearby
            [2] =>
            [3] => Is woodland creation site
            [4] => Mainly broadleaved woodland
            [5] => Mainly young woodland
        )

)

第二个内存模式用于双引号括起来的字符串;如果为空,则应该查看第一个内存模式。