循环域并检查它们是否关闭

时间:2019-01-03 13:30:48

标签: php loops file-get-contents

我正在遍历一系列域,并检查它们是否处于离线/离线状态。目前,我正在使用foreach循环和file_get_contents()进行此操作。这似乎可行,但是我认为它有点慢,而且我不确定我在做什么是否是合适的方法,没有更好的方法来解决这个问题。

示例测试:

<?php

$domains = [
    "stackoverflow.com", // 49 KB
    "google.com",        // 68 KB
    "facebook.com",      // 112 KB
    "zyxwv.com",         // 0
    "youtube.com",       // 38 KB
    "imdb.com",          // 56 KB
    "zyxwv1234.com",     // 0
    "mozilla.org",       // 152 KB
    "amazon.com",        // 100 KB
    "github.com",        // 80 KB  >  total = 655 KB
];

ini_set("max_execution_time", count($domains) * 10); // 10 seconds for each domain

$states = [];

foreach ($domains as $domain) {

    $sw_dom_start   = microtime(1);
    $sw_dom_elapsed = null;

    try {

        $contents = @file_get_contents("http://{$domain}");

        $sw_dom_stop = microtime(1);
        $sw_dom_elapsed = $sw_dom_stop - $sw_dom_start;

        if ($contents) {
            $states[] = [$domain, "online", $sw_dom_elapsed];
        } else {
            $states[] = [$domain, "offline", $sw_dom_elapsed];
        }

    } catch (Exception $e) {
        $states[] = [$domain, "offline", $sw_dom_elapsed];
    }
}

$durations = array_reduce($states, function ($sum, $state) { $sum += $state[2]; return $sum; });
var_dump($durations);

/*
recorded durations : 22.7, 37.5, 43.6, 34.8, 20.4

example output:

Array
(
    [0] => Array
        (
            [0] => stackoverflow.com
            [1] => online
            [2] => 0.90218901634216
        )

    [1] => Array
        (
            [0] => google.com
            [1] => online
            [2] => 0.51400780677795
        )

    [2] => Array
        (
            [0] => facebook.com
            [1] => online
            [2] => 1.2972490787506
        )

    [3] => Array
        (
            [0] => zyxwv.com
            [1] => offline
            [2] => 11.007841110229
        )

    [4] => Array
        (
            [0] => youtube.com
            [1] => online
            [2] => 2.3354029655457
        )

    [5] => Array
        (
            [0] => imdb.com
            [1] => online
            [2] => 1.1368417739868
        )

    [6] => Array
        (
            [0] => zyxwv1234.com
            [1] => offline
            [2] => 0.10531902313232
        )

    [7] => Array
        (
            [0] => mozilla.org
            [1] => online
            [2] => 8.8756558895111
        )

    [8] => Array
        (
            [0] => amazon.com
            [1] => online
            [2] => 2.3273060321808
        )

    [9] => Array
        (
            [0] => github.com
            [1] => online
            [2] => 1.3067789077759
        )

)
float(29.808591604233) */

有时我会得到不一致的结果。脱机/伪造域最多需要10秒钟来处理,而在线站点则可以达到8秒钟。我不确定这是关于我的代码还是目标域。

我注意到索引页的大小,它们的权重并不大。 10页达655 KB。应该在几秒钟之内将其取回。那么,如果大小不影响性能,那又如何呢? file_get_contents()是昂贵的电话吗?

我将定期检查150多个域(也许一天几次)。当前的方法对于此任务而言似乎是不好的方法。一个完整的循环大约需要10分钟,也就是说它不会失败。我应该如何进行?

2 个答案:

答案 0 :(得分:0)

这是https://secure.php.net/manual/en/book.curl.php

中的示例
foreach ($domains as $domain) {

http_response($domain,'200',3); // returns true if the response takes less than 3 seconds and the response code is 200
}

function http_response($url, $status = null, $wait = 3)
{
    $time = microtime(true);
    $expire = $time + $wait;

    // we fork the process so we don't have to wait for a timeout
    $pid = pcntl_fork();
    if ($pid == -1) {
        die('could not fork');
    } else if ($pid) {
        // we are the parent
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_HEADER, TRUE);
        curl_setopt($ch, CURLOPT_NOBODY, TRUE); // remove body
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
        $head = curl_exec($ch);
        $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
        curl_close($ch);

        if(!$head)
        {
            return FALSE;
        }

        if($status === null)
        {
            if($httpCode < 400)
            {
                return TRUE;
            }
            else
            {
                return FALSE;
            }
        }
        elseif($status == $httpCode)
        {
            return TRUE;
        }

        return FALSE;
        pcntl_wait($status); //Protect against Zombie children
    } else {
        // we are the child
        while(microtime(true) < $expire)
        {
        sleep(0.5);
        }
        return FALSE;
    }
}

答案 1 :(得分:0)

file_get_contents()get_headers()并不能真正解决我的问题。我认为这是因为连接没有超时,因此它只是等待并等待每个域。

我研究了cURL并使其正常运行。我相信这是我第一次使用cURL。

我为连接设置了3秒超时。如果域在3秒钟内未响应,则表明该域处于离线状态。尽管这对于实际的脱机/停机域可能是正确的,但有时甚至在线域也无法在3秒内响应。由于在连接时我无法检测到它是否真的掉线或正在等待响应,因此我必须进行第二次检查。

我再次检查“脱机”域,以确保它们确实处于脱机状态,并且此设置似乎可行。第二个更改的唯一区别是超时,这次是5秒。检查10个域大约需要10秒钟。有时甚至更短。

示例测试:

<?php

$sw_start = microtime(1);

$domains = [
    "stackoverflow.com",
    "google.com",
    "facebook.com",
    "zyxwv.com",
    "youtube.com",
    "imdb.com",
    "zyxwv1234.com",
    "mozilla.org",
    "amazon.com",
    "github.com",
];

ini_set("max_execution_time", count($domains) * 10);

function curl_is_online($url, $timeout = 3) {

    $ch = curl_init($url);

    curl_setopt($ch, CURLOPT_HEADER, TRUE);
    curl_setopt($ch, CURLOPT_NOBODY, TRUE);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    curl_setopt($ch, CURLOPT_FRESH_CONNECT, TRUE);

    $head = curl_exec($ch);

    curl_close($ch);

    if ($head) {
        return TRUE;
    }

    return FALSE;
}

function check_domain($domain) {
    $sw_start   = microtime(1);
    $sw_elapsed = null;

    $result = [
        "domain"  => $domain,
        "state"   => "offline",
        "latency" => $sw_elapsed,
    ];

    try {

        $is_online = curl_is_online("http://{$domain}");

        $sw_stop = microtime(1);
        $sw_elapsed = $sw_stop - $sw_start;

        if ($is_online) {
            $result["state"] = "online";
        }
        $result["latency"] = $sw_elapsed;

    } catch (Exception $e) {
        $result["state"]   = "offline";
        $result["latency"] = $sw_elapsed;
    }

    return $result;
}

function check_domains($domains) {
    $checks = [];
    foreach ($domains as $domain) {
        $checks[] = check_domain($domain);
    }
    return $checks;
}

function check_offlines(&$checks, $timeout = 5) {
    foreach ($checks as &$check) {
        if ($check["state"] == "offline") {
            print_r($check);
            $check = check_domain($check["domain"], $timeout);
        }
    }
}

$checks = check_domains($domains);
print_r($checks);

check_offlines($checks);
print_r($checks);

$sw_stop = microtime(1);
$sw_elapsed = $sw_stop - $sw_start;
echo "Total elapsed time: {$sw_elapsed} seconds.";