您好我有一个网站的主页,我正在使用Curl阅读,我需要获取该网站的页数。
信息在div中: -
<div class="pager">
<span class="page-numbers current">1</span>
<a href="/users?page=2" title="go to page 2"><span class="page-numbers">2</span></a>
<a href="/users?page=3" title="go to page 3"><span class="page-numbers">3</span></a>
<a href="/users?page=4" title="go to page 4"><span class="page-numbers">4</span></a>
<a href="/users?page=5" title="go to page 5"><span class="page-numbers">5</span></a>
<span class="page-numbers dots">…</span>
<a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a>
<a href="/users?page=2" title="go to page 2"><span class="page-numbers next"> next</span></a>
</div>
我需要的值是15,但这可能是任何数字,具体取决于网站,但总是在同一位置。
如何轻松读取此值并将其分配给PHP中的变量。
由于
乔纳森
答案 0 :(得分:2)
您可以使用PHP's DOM module。使用DOMDocument :: loadhtmlfile()读取页面,然后创建一个DOMXPath对象并查询具有class =“page-numbers”属性的文档中的所有span元素。
(编辑:哎呀,这不是你想要的,请看第二个代码片段)
$html = '<html><head><title>:::</title></head><body>
<div class="pager">
<span class="page-numbers current">1</span>
<a href="/users?page=2" title="go to page 2"><span class="page-numbers">2</span></a>
<a href="/users?page=3" title="go to page 3"><span class="page-numbers">3</span></a>
<a href="/users?page=4" title="go to page 4"><span class="page-numbers">4</span></a>
<a href="/users?page=5" title="go to page 5"><span class="page-numbers">5</span></a>
<span class="page-numbers dots">…</span>
<a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a>
<a href="/users?page=2" title="go to page 2"><span class="page-numbers next"> next</span></a>
</div>
</body></html>';
$doc = new DOMDocument;
// since the content "is already here" we use loadhtml(content)
// instead of loadhtmlfile(url)
$doc->loadhtml($html);
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query('//span[@class="page-numbers"]');
echo 'there are ', $nodelist->length, ' span elements having class="page-numbers"';
编辑:做这个
<a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a>
(最后一个a
元素)始终指向最后一页,即此链接是否包含您要查找的值?
然后,您可以使用XPath表达式选择第二个但最后a
个元素,并从那里选择其子span
元素。
//div[@class="pager"] <- select each <div> where the attribute class equals "pager"
//div[@class="pager"]/a <- select each <a> that is a direct child of the pager div
//div[@class="pager"]/a[position()=last()-1] <- select the <a> that is second but last
//div[@class="pager"]/a[position()=last()-1]/span <- select the direct child <span> of that second but last <a> element in the pager <div>
(您可能想要获取一个好的XPath教程;-))
$doc->loadhtml($html);
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query('//div[@class="pager"]/a[position()=last()-1]/span');
if ( 0 < $nodelist->length ) {
echo $nodelist->item(0)->nodeValue;
}
else {
echo 'not found';
}
答案 1 :(得分:0)
没有直接的功能或简单的方法来做到这一点。您需要构建或使用existing HTML parser来执行此操作。
答案 2 :(得分:0)
您可以使用正则表达式解析它。首先查找<span class="page-numbers">
的所有事件,然后选择最后一个:
// div html code should be in $div_html
preg_match_all('#<span class="page-numbers">(\d+)#', $div_html, $page_numbers);
print_r(end($page_numbers[1])); // prints 15
答案 3 :(得分:0)
这是你可能想要使用xpath的东西 - 这需要将页面加载为dom文档对象:
$domDoc = new DOMDocument();
$domDoc->loadHTMLFile("http://path/to/yourfile.html");
$xp = new DOMXPath($domDoc);
$nodes = $xp->query("//xpath/to/relevant/node");
$value = $nodes[0];
我有一段时间没有写好的xpath,所以你应该做一些阅读来弄清楚那个部分,但这不应该太难。
答案 4 :(得分:0)
也许
$nodes = $dom->getElementsByTagName("span");
$maxPageNum = 0;
foreach($nodes as $node)
{
if( $node.class == "page-numbers" && $node.value > $maxPageNum )
{
$maxPageNum = $node.value;
}
}
我不知道PHP,所以也许访问dom节点的类/内部文本并不容易,但必须有一些方法来获取该信息,这里的伪代码应该可以工作。
答案 5 :(得分:0)
只是想非常感谢Volkerk提供帮助 - 它运作得非常好。我不得不做一些小改动,最后得到了这个: -
function getusers($userurl)
{
$sSourceData = file_get_contents($userurl);
$doc = new DOMDocument();
@$doc->loadHTML($sSourceData);
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query('//div[@class="pager"]/a[position()=last()-1]/span');
if ( 0 < $nodelist->length ) {
$lastpage = $nodelist->item(0)->nodeValue;
$users = $lastpage * 35;
$userurl = $userurl.'?page='.$lastpage;
$sSourceData = file_get_contents($userurl);
$doc = new DOMDocument();
@$doc->loadHTML($sSourceData);
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query('//div[@class="user-details"]');
$users = $users + $nodelist->length;
echo 'there are ', $users , ' users';
}
else {
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query('//div[@class="user-details"]');
echo 'there are ', $nodelist->length, ' users';
}
}