我正在遍历一系列域,并检查它们是否处于离线/离线状态。目前,我正在使用foreach
循环和file_get_contents()
进行此操作。这似乎可行,但是我认为它有点慢,而且我不确定我在做什么是否是合适的方法,没有更好的方法来解决这个问题。
示例测试:
<?php
$domains = [
"stackoverflow.com", // 49 KB
"google.com", // 68 KB
"facebook.com", // 112 KB
"zyxwv.com", // 0
"youtube.com", // 38 KB
"imdb.com", // 56 KB
"zyxwv1234.com", // 0
"mozilla.org", // 152 KB
"amazon.com", // 100 KB
"github.com", // 80 KB > total = 655 KB
];
ini_set("max_execution_time", count($domains) * 10); // 10 seconds for each domain
$states = [];
foreach ($domains as $domain) {
$sw_dom_start = microtime(1);
$sw_dom_elapsed = null;
try {
$contents = @file_get_contents("http://{$domain}");
$sw_dom_stop = microtime(1);
$sw_dom_elapsed = $sw_dom_stop - $sw_dom_start;
if ($contents) {
$states[] = [$domain, "online", $sw_dom_elapsed];
} else {
$states[] = [$domain, "offline", $sw_dom_elapsed];
}
} catch (Exception $e) {
$states[] = [$domain, "offline", $sw_dom_elapsed];
}
}
$durations = array_reduce($states, function ($sum, $state) { $sum += $state[2]; return $sum; });
var_dump($durations);
/*
recorded durations : 22.7, 37.5, 43.6, 34.8, 20.4
example output:
Array
(
[0] => Array
(
[0] => stackoverflow.com
[1] => online
[2] => 0.90218901634216
)
[1] => Array
(
[0] => google.com
[1] => online
[2] => 0.51400780677795
)
[2] => Array
(
[0] => facebook.com
[1] => online
[2] => 1.2972490787506
)
[3] => Array
(
[0] => zyxwv.com
[1] => offline
[2] => 11.007841110229
)
[4] => Array
(
[0] => youtube.com
[1] => online
[2] => 2.3354029655457
)
[5] => Array
(
[0] => imdb.com
[1] => online
[2] => 1.1368417739868
)
[6] => Array
(
[0] => zyxwv1234.com
[1] => offline
[2] => 0.10531902313232
)
[7] => Array
(
[0] => mozilla.org
[1] => online
[2] => 8.8756558895111
)
[8] => Array
(
[0] => amazon.com
[1] => online
[2] => 2.3273060321808
)
[9] => Array
(
[0] => github.com
[1] => online
[2] => 1.3067789077759
)
)
float(29.808591604233) */
有时我会得到不一致的结果。脱机/伪造域最多需要10秒钟来处理,而在线站点则可以达到8秒钟。我不确定这是关于我的代码还是目标域。
我注意到索引页的大小,它们的权重并不大。 10页达655 KB。应该在几秒钟之内将其取回。那么,如果大小不影响性能,那又如何呢? file_get_contents()
是昂贵的电话吗?
我将定期检查150多个域(也许一天几次)。当前的方法对于此任务而言似乎是不好的方法。一个完整的循环大约需要10分钟,也就是说它不会失败。我应该如何进行?
答案 0 :(得分:0)
这是https://secure.php.net/manual/en/book.curl.php
中的示例foreach ($domains as $domain) {
http_response($domain,'200',3); // returns true if the response takes less than 3 seconds and the response code is 200
}
function http_response($url, $status = null, $wait = 3)
{
$time = microtime(true);
$expire = $time + $wait;
// we fork the process so we don't have to wait for a timeout
$pid = pcntl_fork();
if ($pid == -1) {
die('could not fork');
} else if ($pid) {
// we are the parent
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_NOBODY, TRUE); // remove body
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$head = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if(!$head)
{
return FALSE;
}
if($status === null)
{
if($httpCode < 400)
{
return TRUE;
}
else
{
return FALSE;
}
}
elseif($status == $httpCode)
{
return TRUE;
}
return FALSE;
pcntl_wait($status); //Protect against Zombie children
} else {
// we are the child
while(microtime(true) < $expire)
{
sleep(0.5);
}
return FALSE;
}
}
答案 1 :(得分:0)
file_get_contents()
和get_headers()
并不能真正解决我的问题。我认为这是因为连接没有超时,因此它只是等待并等待每个域。
我研究了cURL并使其正常运行。我相信这是我第一次使用cURL。
我为连接设置了3秒超时。如果域在3秒钟内未响应,则表明该域处于离线状态。尽管这对于实际的脱机/停机域可能是正确的,但有时甚至在线域也无法在3秒内响应。由于在连接时我无法检测到它是否真的掉线或正在等待响应,因此我必须进行第二次检查。
我再次检查“脱机”域,以确保它们确实处于脱机状态,并且此设置似乎可行。第二个更改的唯一区别是超时,这次是5秒。检查10个域大约需要10秒钟。有时甚至更短。
示例测试:
<?php
$sw_start = microtime(1);
$domains = [
"stackoverflow.com",
"google.com",
"facebook.com",
"zyxwv.com",
"youtube.com",
"imdb.com",
"zyxwv1234.com",
"mozilla.org",
"amazon.com",
"github.com",
];
ini_set("max_execution_time", count($domains) * 10);
function curl_is_online($url, $timeout = 3) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_NOBODY, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_FRESH_CONNECT, TRUE);
$head = curl_exec($ch);
curl_close($ch);
if ($head) {
return TRUE;
}
return FALSE;
}
function check_domain($domain) {
$sw_start = microtime(1);
$sw_elapsed = null;
$result = [
"domain" => $domain,
"state" => "offline",
"latency" => $sw_elapsed,
];
try {
$is_online = curl_is_online("http://{$domain}");
$sw_stop = microtime(1);
$sw_elapsed = $sw_stop - $sw_start;
if ($is_online) {
$result["state"] = "online";
}
$result["latency"] = $sw_elapsed;
} catch (Exception $e) {
$result["state"] = "offline";
$result["latency"] = $sw_elapsed;
}
return $result;
}
function check_domains($domains) {
$checks = [];
foreach ($domains as $domain) {
$checks[] = check_domain($domain);
}
return $checks;
}
function check_offlines(&$checks, $timeout = 5) {
foreach ($checks as &$check) {
if ($check["state"] == "offline") {
print_r($check);
$check = check_domain($check["domain"], $timeout);
}
}
}
$checks = check_domains($domains);
print_r($checks);
check_offlines($checks);
print_r($checks);
$sw_stop = microtime(1);
$sw_elapsed = $sw_stop - $sw_start;
echo "Total elapsed time: {$sw_elapsed} seconds.";