RegEx PHP在多个span标记之间获取文本

时间:2010-12-29 13:32:18

标签: php regex tags match

我不会说英语。所以,如果我犯了一些错误请抱歉。

在网站上我有一个包含游戏信息的div框:

<span class="noteline">Developer:</span> 
<span class="subline">Gameloft</span> 
<span class="noteline">Genre:</span> 
<span class="subline">Racing/Arcade</span> 
<span class="noteline">Release year:</span> 
<span class="subline">2010</span> 

我需要在<span class="noteline">和结束标记</span>之间获取信息

preg_match("/\<span\sclass=\"subline\"\>(.*)<\/span\>/imsU", $source, $matches);

上面的解决方案工作正常但它只获得带有文本“gameloft”的“subline”;

但我还需要包含文字Racing / Arcade和2010;

的子行

也许这样的事情(这不起作用);

for developer = preg_match("/*(\<span\sclass=\"subline\"\>){1}*(.*)*(<\/span\>){1}*/imsU", $source, $matches);
for genre = preg_match("/*(\<span\sclass=\"subline\"\>){2}*(.*)*(<\/span\>){2}*/imsU", $source, $matches);
像这样......

反正。谢谢你的帮助。

3 个答案:

答案 0 :(得分:1)

regexp的替代方法是使用phpQuery或QueryPath,将其简化为:

foreach ( qp($source)->find("span.subline") as $span ) {
    print $span->text();
}

答案 1 :(得分:1)

正则表达式不适合解析HTML。他们很难做对,他们总是在边缘情况下打破。

我不知道是否有更简单的方法,但这应该与您描述的标记一起使用:

<?php

$fragment = '<span class="noteline">Developer:</span>
<span class="subline">Gameloft</span>
<span class="noteline">Genre:</span>
<span class="subline">Racing/Arcade</span>
<span class="noteline">Release year:</span>
<span class="subline">2010</span>';

libxml_use_internal_errors(TRUE);
$dom = new DOMDocument();
$dom->loadHTML($fragment);
$xml = simplexml_import_dom($dom);
libxml_use_internal_errors(FALSE);

foreach($xml->xpath("//span[@class='subline']") as $item){
    echo (string)$item . PHP_EOL;
}

这假定为class="subline",因此它会因多个类而失败。 (Xpath新手,欢迎改进。)

答案 2 :(得分:0)

试试这个:

preg_match_all("/<span class=\"subline\".*span>/", $html, $matches);

preg_match_all("/<span class=\"noteline\".*span>/", $html, $matches);

我用这种方式尝试了上面的代码:

<?php 

$html = '<span class="noteline">Developer:</span> 
<span class="subline">Gameloft</span> 
<span class="noteline">Genre:</span> 
<span class="subline">Racing/Arcade</span> 
<span class="noteline">Release year:</span> 
<span class="subline">2010</span>';

preg_match_all("/<span class=\"subline\".*span>/", $html, $matches1);

preg_match_all("/<span class=\"noteline\".*span>/", $html, $matches2);

print_r($matches1);
echo "<br>";
print_r($matches2);

?>

我得到的输出是:

Array ( [0] => Array ( [0] => Gameloft [1] => Racing/Arcade [2] => 2010 ) )
Array ( [0] => Array ( [0] => Developer: [1] => Genre: [2] => Release year: ) )