我正在尝试将多行HTML源代码与正则表达式匹配(使用AutoIt)。要匹配的HTML源代码:
<li class="mission">
<div>
<div class="missionTitle">
<h3>Eat a quarter-pounder with cheese</h3>
<div class="missionProgress">
<span>100%</span>
<div class="missionProgressBar" style="width: 100%;"></div>
</div>
</div>
<div class="missionDetails">
<ul class="missionRewards">
<li class="rewardCash">5,000–8,000</li>
<li class="rewardXP">XP +5</li>
</ul>
<div class="fightItems clearfix">
<h5><span>Prerequisites:</span></h5>
<div class="fightItemsWrap">
<div class="fightItem tooltip" title="Sunglasses" data-attack="Attack: 2" data-defence="Defence: 2">
<img src="/img/enhancement/3.jpg" alt="">
<span>× 1</span>
</div>
<div class="fightItem tooltip" title="Broad Shoulders" data-attack="Attack: 0" data-defence="Defence: 3">
<img src="/img/enhancement/1003.jpg" alt="">
<span>× 1</span>
</div>
<div class="fightItem tooltip" title="Irish Fond Anglia" data-attack="Attack: 4" data-defence="Defence: 8">
<img src="/img/enhancement/2004.jpg" alt="">
<span>× 1</span>
</div>
</div>
</div>
<form action="/quest/index/i/kdKJBrgjdGWKqtfDrHEkRM2duXVn1ntH/h/c0b2d58642cd862bfad47abf7110042e/t/1336917311" method="post">
<input type="hidden" id="id" name="id" value="17"/>
<button class="button buttonIcon btnEnergy"><em>5</em></button>
</form>
</div>
</div>
</li>
它在一个页面上出现多次(但<div class="fightItems clearfix">...</div>
内的项目有所不同)。
<h3>Eat a quarter-pounder with cheese</h3>
,<span>100%</span>
和<input type="hidden" id="id" name="id" value="17"/>
。预期结果(对于页面上的每次出现):
$a[0] = "Eat a quarter-pounder with cheese"
$a[1] = "100%"
$a[2] = "17"
我想出了什么:
(?U)(?:<div class="missionTitle">\s+<h3>(.*)</h3>\s+<div class="missionProgress">\s+<span>(.*)</span>)|(?:<form .*\s+.*<input\stype="hidden"\sid="id"\sname="id"\svalue="(\d+)"/>\s+.*\s+</form>)
但是这会留下一些数组项目。我也尝试了(?s)
标志,但它只捕获了第一次出现(并在之后停止匹配)。
答案 0 :(得分:1)
由于.
标志,我没有使用(?s)
来匹配单词或整数。正确的正则表达式是:
(?U)(?s)<div class="missionTitle">\s+<h3>([\w\s]+)</h3>(?:.*)<div class="missionProgress">\s+<span>(\d+)%</span>(?:.*)<input.* value="(\d+)"/>
答案 1 :(得分:0)
正则表达式以匹配多行HTML源代码:
\R
匹配换行符(?>\r\n|\n|\r)
,.
没有(除非设置(?s)
)。\s
匹配空白字符。通常需要一些组合(例如\R\s*?
)。
attribute="([^"]*?)"
)。示例(包含双引号;按Documentation - FAQ - double quotes处理):
(?s)<div class="missionTitle">.*?<h3>(.*?)</h3>.*?<div class="missionProgress">.*?<span>([^<]*?)</span>.*?<input type="hidden" id="id" name="id" value="([^"]*?)"/>
视觉解释:
如果应该在HTML上使用正则表达式(除此之类的简单列表之外)是different question(已完成,T-shirt)。