Question

我尝试练习CURL，但是进展不顺利请告诉我出了什么问题这是我的代码

<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://xxxxxxx.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); 
curl_setopt($ch, CURLOPT_USERAGENT, "Google Bot");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

$downloaded_page = curl_exec($ch);
curl_close($ch);
preg_match_all('/<div\s* class =\"abc\">(.*)<\/div>/', $downloaded_page, $title); 
echo "<pre>";
print($title[1]);  
echo "</pre>";

，警告为Notice: Array to string conversion

我要解析的html就像这样

<div class="abc">
<ul> blablabla </ul>
<ul> blablabla </ul>
<ul> blablabla </ul>
</div>

Answer 1

Don't parse HTML with regex.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.lipsum.com/');
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$html = curl_exec($ch);
curl_close($ch);

$dom = new DOMDocument;
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
# foreach ($xpath->query('//div') as $div) { // all div's in html
foreach ($xpath->query('//div[contains(@class, "abc")]') as $div) { // all div's that have "abc" classname
    // $div->nodeValue contains fetched DIV content
}

Answer 2

preg_match_all返回一个数组数组。

如果您的代码是：

preg_match_all('/<div\s+class="abc">(.*)<\/div>/', $downloaded_page, $title);

您确实想要执行以下操作：

echo "<pre>";
foreach ($title[1] as $realtitle) {
    echo $realtitle . "\n";
}
echo "</pre>";

因为它将搜索所有具有“abc”类的div。我还建议你强化你的正则表达式以使其更强大。

preg_match_all('/<div[^>]+class="abc"[^>]*>(.*)<\/div>/', $downloaded_page, $title);

这将与

匹配 BTW：DomDocument很慢，我发现有时正常情况（取决于你的文件的大小）可以提高40倍的速度。保持简单。

最佳，尼古拉斯

如何使用curl和preg_match _all div内容

2 个答案: