请帮我查一下这段代码。我认为我的正则表达式写了一个问题,但我不知道如何解决它:
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$content = get_data('http://ibongda.vn/lich-thi-dau-bong-da.hs');
$regex = '/<div id="zone-schedule-group-by-season">(.*)<\/div>/';
preg_match($regex, $content, $matches);
$table = $matches[1];
print_r($table);
答案 0 :(得分:2)
我建议不要使用正则表达式。您应该使用DOM执行此任务。
正则表达式的问题是在换行符序列中运行,它会匹配到<
中的</div>
,并且会一直保持回溯并失败。回溯是正则表达式在匹配失败时匹配过程中所做的事情。您需要使用s
(dotall)修饰符来强制点匹配换行符。
$regex = '~<div id="zone-schedule-group-by-season">(.*?)</div>~s';
答案 1 :(得分:1)
我建议不要使用正则表达式来解析这些。您可以使用HTML解析器,DOMDocument
特别是xpath。
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$content = get_data('http://ibongda.vn/lich-thi-dau-bong-da.hs');
$dom = new DOMDocument();
libxml_use_internal_errors(true); // handle errors yourself
$dom->loadHTML($content);
libxml_clear_errors();
$xpath = new DOMXpath($dom);
$table_rows = $xpath->query('//div[@id="zone-schedule-group-by-season"]/table/tbody/tr[@class!="bg-gd" and @class!="table-title"]'); // these are the rows of that table
foreach($table_rows as $rows) { // loop each tr
foreach($rows->childNodes as $td) { // loop each td
if(trim($td->nodeValue) != '') { // don't show empty td
echo trim($td->nodeValue) . '<br/>';
}
}
echo '<hr/>';
}