我有一系列html页面,我想从中提取某些产品信息。 HTML构建如下:
<h1 style="margin-top: 20px;">Productinformatie</h1>
<div class="group">
<div class="columns2">
<table width="100%" cellpadding="4" cellspacing="0" border="0" class="product_info_table stripe">
<tr style="background-color: #3c75a6; color: #fff; font-weight: bold;">
<td colspan="2" style="background-color: #3c75a6; border-bottom: 2px solid #9dbeda;">Design</td>
</tr>
<tr class="normal">
<td width="250" valign="top"><b>Kleur van het product</b></td>
<td><div style="max-height: 40px; overflow: hidden;">Zwart, Zilver</div></td>
</tr>
.............
<tr class="normal">
<td width="250" valign="top"><b>Hoogte (achterzijde)</b></td>
<td><div style="max-height: 40px; overflow: hidden;">3 cm</div></td>
</tr>
</table>
</div>
</div>
<div class="group" style="overflow-x: auto; overflow-y: hidden; height: 140px; white-space: nowrap;" id="image_scroll">
我使用此行但未获得结果;我需要了解如何在preg_match_all
中格式化Linebrakes(BR) //Omschrijving <h1 style="margin-top: 20px;">Productinformatie</h1> <div class="group"> <div class="columns2"> </table> </div> </div>
// preg_match_all('/\<h1 style\=\"margin-top\: 20px\;\"\>Productinformatie\<\/h1\>(.*?)\<ul style\=\"list\-style\-type\: none\;\"\>/s', $html, $matchomschrijving);
preg_match_all('/\<h1 style\=\"margin-top\: 20px\;\"\>Productinformatie\<\/h1\>(.*)?\<\/table\>.*?\<\/div\>?\<\/div\>/s', $html, $matchomschrijving);
// $tempomschrijvinghtml = str_replace('"',"'",$matchomschrijving[1][0]);
$tempomschrijvinghtml = MinifyHTML($matchomschrijving[1][0]);
// $tempomschrijving = '<table>';
$tempomschrijving .= $tempomschrijvinghtml;
$tempomschrijving .= '</table></div></div>';
echo 'Omschrijving: ' . $tempomschrijving . '<br>';
感谢。
答案 0 :(得分:0)
要搜索,提取和编辑html,请利用内置的DOMxxx类和html结构。使用XPath语言,您可以有效地定位所需的DOM树部分。例如:
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$xp = new DOMXPath($dom);
$nodeList = $xp->query('//h1[.="Productinformatie"]/following-sibling::div[@class="group"]/div[@class="columns2"]/table[1]');
echo $dom->saveHTML($nodeList->item(0));