我使用此(?<=alt)[\w\s\,\/\(\)\.]*
来提取第一个替代文字。这很棒,但我想提取多个替代文本。
我在visual web ripper中使用正则表达式
我从中提取的代码是
<DIV id=ctl00_ContentRightColumn_CustomFunctionalityFieldControl1_ctl00_ctl00_woodFeatures class="woodFeaturesPanel woodFeaturesPanelSingle" sizcache="23614" sizset="0"><H2>Features:</H2> <DIV sizcache="23614" sizset="0"> <UL sizcache="23614" sizset="0"> <LI sizcache="23386" sizset="0"><IMG alt="Information board at site" src="/PublishingImages/icon_infoboard.gif"> <LI sizcache="20558" sizset="0"><IMG alt="Parking nearby" src="/PublishingImages/icon_carparknear.gif"> <LI sizcache="23614" sizset="0"><IMG alt=Grassland src="/PublishingImages/icon_grassland.giF"> <LI sizcache="17694" sizset="0"><IMG alt="Is woodland creation site" src="/PublishingImages/icon_woodlandcreation.gif"> <LI sizcache="21680" sizset="0"><IMG alt="Mainly broadleaved woodland" src="/PublishingImages/icon_mainlybroadleaved.gif"> <LI sizcache="20704" sizset="0"><IMG alt="Mainly young woodland" src="/PublishingImages/icon_mainlyyoung.gif"> <LI> <LI></LI></UL></DIV></DIV>
答案 0 :(得分:0)
如果没有语言,这很难说,但使用记忆模式可以捕捉到你需要的东西:
/alt=(\w\S*|"([^"]*)")/
使用preg_match_all()
,它会得到以下结果:
Array
(
[0] => Array
(
[0] => alt="Information board at site"
[1] => alt="Parking nearby"
[2] => alt=Grassland
[3] => alt="Is woodland creation site"
[4] => alt="Mainly broadleaved woodland"
[5] => alt="Mainly young woodland"
)
[1] => Array
(
[0] => "Information board at site"
[1] => "Parking nearby"
[2] => Grassland
[3] => "Is woodland creation site"
[4] => "Mainly broadleaved woodland"
[5] => "Mainly young woodland"
)
[2] => Array
(
[0] => Information board at site
[1] => Parking nearby
[2] =>
[3] => Is woodland creation site
[4] => Mainly broadleaved woodland
[5] => Mainly young woodland
)
)
第二个内存模式用于双引号括起来的字符串;如果为空,则应该查看第一个内存模式。