Question

我在解析来自其他网站的数据时遇到了一些困难。我可以从中获得第一个平静，但是当试图从第一次切割中取出剩下的部分时，事情就像他们那样停止工作。下面是代码：

$html = file_get_contents("http://www.avto.net/_DEALER/results.asp?broker=12430&star=&izpis=1&oglasrubrika=7&oblika=0&subKAT=0&model="); 

 $pattern = '/<div class=\"contentwrapper\">(.*?)<\/div>/s'; 

preg_match($pattern, $html, $data); 
$form = '/<form.*?>(.*?)<\/form>/s'; 
preg_match($form, $data[1], $cut); 

$pattern2 ='/<table width="730" cellspacing="0" cellpadding="0" border="0">(.*?)<\/table>/s'; 

preg_match_all($pattern2, $cut[1], $tabele); 

echo "<pre>"; 
print_r($cut[0]); 
echo "</pre>"; 

echo "<br />"; 
echo "<br />"; 

echo "<pre>"; 
print_r($tabele); 
echo "</pre>";

我需要contentwrapper类，但是我必须将它清理一下，这样才会显示带有汽车部件的表格，不需要额外的文字或页码。第一个preg_match运行良好，但在尝试获取所有这些表时 - ＆gt; （。*？），结果是没有。欢迎任何提示。我也尝试过使用函数file_get_html（）的“简单HTML DOM解析器”，但是我需要它的方式，我只需要从第一页（不是所有30页......）到目前的项目列表他们在我的页面上。

任何帮助/提示表示赞赏。

Answer 1

首先，不要使用正则表达式来解析html 。

最后但并非最不重要的是，使用DOM和XPath解析它。

示例：

<?php

$html_text = "your html code goes here...";

$d = new DOMDocument();
@$d->loadHTML($html_text);
$xpath = new DOMXPath($d);
$result = $xpath->query("//table");

foreach ($result as $table)
{
    echo $table->textContent;

}

?>

使用preg_match <table> </table>进行选择

1 个答案: