我是php新手。我想要做的是获取分页的链接。页面上有分页,当我们选择页面时,课程链接也会发生变化。如何通过停留在http://ahadith.co.uk/sahihmuslim.php
的主页面来获取分页的网址。
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://ahadith.co.uk/sahihmuslim.php");
//fetches data from the site mentioned above
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
$pattern = "/href=[']([^'][a-zA-Z]+.[a-zA-Z]+.[cid]+=[0-9]+)[']?/";
//this regex brings the links from the above url
preg_match_all($pattern, $output, $matches, PREG_PATTERN_ORDER);
foreach ($matches[1] as $data) {
$homepage = file_get_contents('http://ahadith.co.uk/'.$data);
//all the links data which was caught above using regex has been stored in $homepage
$pattern_chapter= "/(?<=\<h2\>)(\s*.*\s*)(?=\<\/h2\>)/";
//Here I have fetched the chapters from the data stored in $homepage
preg_match_all($pattern_chapter, $homepage, $matches_chapter, PREG_PATTERN_ORDER);
foreach ($matches_chapter[1] as $chapters) {
print_r($chapters);
}
?>
现在我必须从存储在$homepage
中的数据中获取分页链接。就像在这种情况下,分页有44页,我想获得所有44页的链接。这是匹配分页http:\/\/([a-zA-Z]+.[a-zA-Z]+.[a-zA-Z]+.[a-zA-Z]+.[a-zA-Z]+.[cid]+=[0-9]&[a-zA-Z]+=[0-9]&[a-zA-Z]+=[0-9]+)
中的链接的正则表达式
我搜索过很多地方,但找不到任何相关内容。请任何人帮助我。
答案 0 :(得分:0)
使用&#34; HtmlPageDom&#34;。它是第三方库,可以使用DOM轻松操作HTML文档。您可以从任何页面中提取任何类型的数据。
https://github.com/wasinger/htmlpagedom/blob/master/README.md