Question

长话短说，我的客户因为争议而无法访问他们的服务器，他们需要所有的俱乐部照片，所以我可以建立一个新网站。我不得不通过URL下载它们，它们由PHP输出处理，它提供不同的大小以减少服务器负载。

其中有超过3000种，我不打算浪费时间逐一做这件事。

所以，我决定编写一个快速且[非常]脏的PHP脚本，它将使用Campaign.where("qualification.male_gender: true")抓取页面，查找图像的链接，跨每个专辑，然后跨专辑子页面。 / p>

一切正常，除了脚本的一个特定部分，它在相册页面上查找：

（1）指向图像的链接，

DOMDocument

（2）指向后续页面的链接，即

<div class='imagethumb'>
    <a href="/gallery/index.php?album=blowout1&image=blahblah.jpg" title="Blahblah>
        <img src="/gallery/index.php?album=blowout1&image=blahblah_thumb.jpg />
    </a>
</div>

（3）专辑的链接＆＃34;最后一页＆＃34;或＆＃34; ...＆＃34;

<li>
    <a href="/gallery/index.php?album=beginning&amp;page=2" title="Page 2">2</a>
</li>

这是脚本的相关部分：

<li>
    <a href="/gallery/index.php?album=recognition&page=9" title="Page 9">...</a>
</li>

如果脚本找到子页面链接，它会添加到//$url is an argument in the function wrapping this script //look on albums for links foreach ($album_links as $a_url) { $album_html = file_get_contents($a_url['url']); $album = new DOMDocument; $album->loadHTML($album_html); $i_links = $album->getElementsByTagName('a'); $album_title = $album->getElementsByTagName('title')->item(0)->textContent; //to keep track of the number of sub-page links found, exclude page 1 $num_page_lnks = 1; //search through all links on the page, look for: foreach ($i_links as $link) { //Links contained in div with class='imagethumb' if ($link->parentNode->getAttribute('class') == 'imagethumb' ) { array_push($image_links, ["album" => str_replace(" | ", "", $album_title), "title" => $link->getAttribute('title'), "url" => "http://" . parse_url($url, PHP_URL_HOST) . $link->getAttribute('href') . "&p=*full-image"]); } //links contained in li with no class, has a page number in the title, and is not a "..." link elseif ($link->parentNode->getAttribute('class') == '' && preg_match('/Page\040\d*/', $link->getAttribute('title')) && $link->textContent != "...") { //add to the number of sub page links found $num_page_lnks++; array_push($image_page_links, "http://" . parse_url($url, PHP_URL_HOST) . $link->getAttribute('href')); } //links containing the text "..." (link to last album page, if more than 7 pages) elseif($link->textContent == "...") { //Parse the url into parts $url_parse=[]; parse_str($link->getAttribute('href'), $url_parse); //Last Page links appear when greater than 7 pages, so start at 8 ($num_page_links + 1) for ($count = ($num_page_lnks + 1); $count < ($url_parse['page'] + 1); $count++) { array_push($image_page_links, "http://" . parse_url($url, PHP_URL_HOST) . preg_replace("/[^\=]\d+$/", $count, $link->getAttribute('href'))); } } } unset($album); unset($album_html); unset($i_links); }，这样当它找到$num_page_links链接时，它会知道在创建中间链接时从哪里开始页面链接

这就是回归：

"..."

在该对象中确实有适量的子页面，但问题是：

当有7个或更少的相册页面（6个子页面）时，脚本效果很好
当有8个专辑页面（7个子页面）时，脚本可以正常使用
当有9个专辑页面时（8个子页面 - [1]当前页面，[2] [3] [4] [5] [6] [7] [...]最后一页（ 9）），脚本加倍[/ li>
当有10个或更多相册页面时，没问题。

我无法弄清楚我做错了什么。

编辑：

以下是{ "0": "http://club.website.com/gallery/index.php?album=beginning&page=2", "1": "http://club.website.com/gallery/index.php?album=beginning&page=3", "2": "http://club.website.com/gallery/index.php?album=history&page=2", "3": "http://club.website.com/gallery/index.php?album=history&page=3", "4": "http://club.website.com/gallery/index.php?album=history&page=4", "5": "http://club.website.com/gallery/index.php?album=history&page=5", "6": "http://club.website.com/gallery/index.php?album=history&page=6", "7": "http://club.website.com/gallery/index.php?album=history&page=7", "8": "http://club.website.com/gallery/index.php?album=memorial&page=2", "9": "http://club.website.com/gallery/index.php?album=memorial&page=3", "10": "http://club.website.com/gallery/index.php?album=memorial&page=4", "11": "http://club.website.com/gallery/index.php?album=memorial&page=5", "12": "http://club.website.com/gallery/index.php?album=memorial&page=6", "13": "http://club.website.com/gallery/index.php?album=memorial&page=7", "14": "http://club.website.com/gallery/index.php?album=memorial&page=9", "15": "http://club.website.com/gallery/index.php?album=memorial&page=9", "16": "http://club.website.com/gallery/index.php?album=members&page=2", "17": "http://club.website.com/gallery/index.php?album=members&page=3", "18": "http://club.website.com/gallery/index.php?album=members&page=4", "19": "http://club.website.com/gallery/index.php?album=members&page=5", "20": "http://club.website.com/gallery/index.php?album=members&page=6", "21": "http://club.website.com/gallery/index.php?album=members&page=7", "22": "http://club.website.com/gallery/index.php?album=members&page=8", "23": "http://club.website.com/gallery/index.php?album=members&page=9", "24": "http://club.website.com/gallery/index.php?album=members&page=10", "25": "http://club.website.com/gallery/index.php?album=members&page=11", "26": "http://club.website.com/gallery/index.php?album=toy_run&page=2", "27": "http://club.website.com/gallery/index.php?album=toy_run&page=3", "28": "http://club.website.com/gallery/index.php?album=toy_run&page=4", "29": "http://club.website.com/gallery/index.php?album=toy_run&page=5", "30": "http://club.website.com/gallery/index.php?album=toy_run&page=6", "31": "http://club.website.com/gallery/index.php?album=toy_run&page=7", "32": "http://club.website.com/gallery/index.php?album=toy_run&page=8", "33": "http://club.website.com/gallery/index.php?album=recognition&page=2", "34": "http://club.website.com/gallery/index.php?album=recognition&page=3", "35": "http://club.website.com/gallery/index.php?album=recognition&page=4", "36": "http://club.website.com/gallery/index.php?album=recognition&page=5", "37": "http://club.website.com/gallery/index.php?album=recognition&page=6", "38": "http://club.website.com/gallery/index.php?album=recognition&page=7", "39": "http://club.website.com/gallery/index.php?album=recognition&page=9", "40": "http://club.website.com/gallery/index.php?album=recognition&page=9", "41": "http://club.website.com/gallery/index.php?album=blowout1&page=2", "42": "http://club.website.com/gallery/index.php?album=blowout1&page=3", "43": "http://club.website.com/gallery/index.php?album=blowout1&page=4", "44": "http://club.website.com/gallery/index.php?album=blowout1&page=5", "45": "http://club.website.com/gallery/index.php?album=blowout1&page=6", "46": "http://club.website.com/gallery/index.php?album=blowout1&page=7", "47": "http://club.website.com/gallery/index.php?album=blowout1&page=8", "48": "http://club.website.com/gallery/index.php?album=blowout1&page=9", "49": "http://club.website.com/gallery/index.php?album=blowout1&page=10" }的源HTML代码：

$i_links

Answer 1

问题出在最后一个嵌套循环中：

//Last Page links appear when greater than 7 pages, so start at 8 ($num_page_links + 1)
for ($count = ($num_page_lnks + 1); $count < ($url_parse['page'] + 1); $count++) {
     array_push($image_page_links,  "http://" . parse_url($url, PHP_URL_HOST) . preg_replace("/[^\=]\d+$/", $count, $link->getAttribute('href')));
 }

当您到达第7个子链接（包含文字内容＆＃34; ...＆＃34;）时，$num_page_lnks变量的值为7而$url_parse['page']的值为{{ 1}}。因此，将有两次迭代，其中9变量将分配$count，然后 - 8。
但是......这些链接保持不变：

因为你的正则表达式没有做出预期的替换。

"http://club.website.com/gallery/index.php?album=recognition&page=9"
"http://club.website.com/gallery/index.php?album=recognition&page=9"

将正则表达式模式更改为此var_dump(preg_replace("/[^\=]\d+$/",8,"/gallery/index.php?album=recognition&page=9")); // will output: string(47) "/gallery/index.php?album=recognition&page=9"或考虑其他逻辑。

在Foreach循环中遇到preg_replace（）问题

1 个答案: