Question

我有一个脚本，它接受some.txt文件并读取链接并返回我的网站反向链接是否存在。但问题是，它非常缓慢，我想提高它的速度。有没有办法提高速度？

<?php
ini_set('max_execution_time', 3000);
$source = file_get_contents("your-backlinks.txt");
$needle = "http://www.submitage.com";   //without http as I have imploded the http later in the script
$new = explode("\n",$source);
foreach ($new as $check) {
$a = file_get_contents(trim($check));
if (strpos($a,$needle)) {
$found[] = $check;
     } else {
     $notfound[] = $check;
            }
                        }
echo "Matches that were found: \n ".implode("\n",$found)."\n";
echo "Matches that were not found \n". implode("\n",$notfound);
?>

Answer 1

您最大的瓶颈是您按顺序执行HTTP请求，而不是并行执行。 curl能够并行执行多个请求。以下是来自the documentation的示例，它非常适合使用循环并实际收集结果。我不能保证这是正确的，我只是保证我已经正确地遵循了文档：

$mh = curl_multi_init();
$handles = array();

foreach($new as $check){
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, $check);
  curl_setopt($ch, CURLOPT_HEADER, 0);
  curl_multi_add_handle($mh,$ch);
  $handles[$check]=$ch;
}

// verbatim from the demo
$active = null;
//execute the handles
do {
    $mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);

while ($active && $mrc == CURLM_OK) {
    if (curl_multi_select($mh) != -1) {
        do {
            $mrc = curl_multi_exec($mh, $active);
        } while ($mrc == CURLM_CALL_MULTI_PERFORM);
    }
}
// end of verbatim code

for($handles as $check => $ch){
  $a = curl_multi_getcontent($ch)
  ...
}

Answer 2

除了一些虚拟多线程解决方案之外，你将无法通过优化PHP来提高操作速度。

但是，您可以创建一个允许您将检查作为后台任务运行的队列系统。而不是在迭代它们时检查URL，而是将它们添加到队列中。然后编写一个cron脚本，逐个抓取队列中未经检查的URL，检查它们是否包含对域的引用并保存结果。

提高我的脚本速度

2 个答案: