我有一个脚本,它接受some.txt文件并读取链接并返回我的网站反向链接是否存在。但问题是,它非常缓慢,我想提高它的速度。有没有办法提高速度?
<?php
ini_set('max_execution_time', 3000);
$source = file_get_contents("your-backlinks.txt");
$needle = "http://www.submitage.com"; //without http as I have imploded the http later in the script
$new = explode("\n",$source);
foreach ($new as $check) {
$a = file_get_contents(trim($check));
if (strpos($a,$needle)) {
$found[] = $check;
} else {
$notfound[] = $check;
}
}
echo "Matches that were found: \n ".implode("\n",$found)."\n";
echo "Matches that were not found \n". implode("\n",$notfound);
?>
答案 0 :(得分:2)
您最大的瓶颈是您按顺序执行HTTP请求,而不是并行执行。 curl
能够并行执行多个请求。以下是来自the documentation的示例,它非常适合使用循环并实际收集结果。我不能保证这是正确的,我只是保证我已经正确地遵循了文档:
$mh = curl_multi_init();
$handles = array();
foreach($new as $check){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $check);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_multi_add_handle($mh,$ch);
$handles[$check]=$ch;
}
// verbatim from the demo
$active = null;
//execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
// end of verbatim code
for($handles as $check => $ch){
$a = curl_multi_getcontent($ch)
...
}
答案 1 :(得分:0)
除了一些虚拟多线程解决方案之外,你将无法通过优化PHP来提高操作速度。
但是,您可以创建一个允许您将检查作为后台任务运行的队列系统。而不是在迭代它们时检查URL,而是将它们添加到队列中。然后编写一个cron脚本,逐个抓取队列中未经检查的URL,检查它们是否包含对域的引用并保存结果。