PHP get_headers()因Pinterest而失败

时间:2015-08-19 14:58:11

标签: php pinterest get-headers

我目前正在开发一种工具来集成不同社交网络的链接:

Facebook: https://www.facebook.com/jonathan.parentlevesque

Google plus: https://plus.google.com/+JonathanParentL%C3%A9vesque

Instagram: https://instagram.com/mariloubiz/

Pinterest: https://www.pinterest.com/jonathan_parl/

RSS: https://regex101.com

Twitter: https://twitter.com/arcadefire

Vimeo: https://vimeo.com/ondemand/crashtest/135301838

Youtube: https://www.youtube.com/user/Darkjo666

我正在使用像这样的非常基本的正则表达式:

/^https?:\/\/(?:[a-z]{2}|[w]{3})?\.pinterest.com\/[\S]{5,}$/i
在客户端和服务器端

,以便在每个链接上进行最小的域验证。

然后,我正在使用此函数来验证页面是否真的存在(集成完全不起作用的社交网络链接是没用的):

public static function isUrlExists($url){

    $exists = false;

    if(!StringManager::stringStartWith($url, "http") and !StringManager::stringStartWith($url, "ftp")){

        $url = "https://" . $url;
    }

    if (preg_match(RegularExpression::URL, $url)){

        $headers = get_headers($url);

        if ($headers !== false and !empty($headers)){

            if (strpos($headers[0], '404') === false){

                $exists = true;
            }   
        }
    }

    return $exists;
}

注意:在此函数中,我使用Diego Perini的正则表达式在发送请求之前验证URL:

const URL = "%^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)?$%iu"; //@copyright Diego Perini

到目前为止,所有测试过的链接都没有产生任何错误,但测试Pinterest会产生一系列可怕的错误消息:

get_headers(): SSL operation failed with code 1. OpenSSL Error messages: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
)

get_headers(): Failed to enable crypto

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
)

get_headers(https://www.pinterest.com/jonathan_parl/): failed to open stream: operation failed

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
)

有人知道我在这里做错了什么吗?

我的意思是,Pinterest不是一个拥有有效证书的流行社交网络(我个人不使用它,我刚刚创建了一个测试帐户)?

感谢您的帮助,

来自蒙特利尔的Jonathan Parent-Lévesque

1 个答案:

答案 0 :(得分:2)

我尝试按照 N.B 的建议为我的开发环境(Xampp)创建自签名证书。在他的评论中。那个解决方案对我没用。

他的另一个解决方案是使用cUrl或guzzle而不是get_headers()。它不仅起作用,而且,根据这个开发者的测试:

http://php.net/manual/fr/function.get-headers.php#104723

它也比get_headers()快。

对于那些感兴趣的人,这里是感兴趣的人的新功能代码:

/**
* Send an HTTP request to a the $url and check the header posted back.
*
* @param $url String url to which we must send the request.
* @param $failCodeList Int array list of codes for which the page is considered invalid.
*
* @return Boolean
*/
public static function isUrlExists($url, array $failCodeList = array(404)){

    $exists = false;

    if(!StringManager::stringStartWith($url, "http") and !StringManager::stringStartWith($url, "ftp")){

        $url = "https://" . $url;
    }

    if (preg_match(RegularExpression::URL, $url)){

        $handle = curl_init($url);


        curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);

        curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);

        curl_setopt($handle, CURLOPT_HEADER, true);

        curl_setopt($handle, CURLOPT_NOBODY, true);

        curl_setopt($handle, CURLOPT_USERAGENT, true);


        $headers = curl_exec($handle);

        curl_close($handle);


        if (empty($failCodeList) or !is_array($failCodeList)){

            $failCodeList = array(404); 
        }

        if (!empty($headers)){

            $exists = true;

            $headers = explode(PHP_EOL, $headers);

            foreach($failCodeList as $code){

                if (is_numeric($code) and strpos($headers[0], strval($code)) !== false){

                    $exists = false;

                    break;  
                }
            }
        }
    }

    return $exists;
}

让我解释一下卷曲选项:

CURLOPT_RETURNTRANSFER :返回一个字符串,而不是在屏幕上显示调用页面。

CURLOPT_SSL_VERIFYPEER :cUrl不会结帐证书

CURLOPT_HEADER :在字符串

中包含标题

CURLOPT_NOBODY :不要在字符串中包含正文

CURLOPT_USERAGENT :某些网站需要正常运作(例如:https://plus.google.com

附加说明:我爆炸标题字符串和用户标题[0],以确保只验证返回代码和消息(例如:200,404,405等)

附加说明2 :有时只验证代码404是不够的(参见单元测试),因此有一个可选的$ failCodeList参数来提供拒绝的所有代码列表。

当然,这是我的编码合法化的单元测试:

public function testIsUrlExists(){

//invalid
$this->assertFalse(ToolManager::isUrlExists("woot"));

$this->assertFalse(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque4545646456"));

$this->assertFalse(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque890800"));

$this->assertFalse(ToolManager::isUrlExists("https://instagram.com/mariloubiz1232132/", array(404, 405)));

$this->assertFalse(ToolManager::isUrlExists("https://www.pinterest.com/jonathan_parl1231/"));

$this->assertFalse(ToolManager::isUrlExists("https://regex101.com/546465465456"));

$this->assertFalse(ToolManager::isUrlExists("https://twitter.com/arcadefire4566546"));

$this->assertFalse(ToolManager::isUrlExists("https://vimeo.com/**($%?%$", array(400, 405)));

$this->assertFalse(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666456456456"));


//valid
$this->assertTrue(ToolManager::isUrlExists("www.google.ca"));

$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));

$this->assertTrue(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque"));

$this->assertTrue(ToolManager::isUrlExists("https://instagram.com/mariloubiz/"));

$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));

$this->assertTrue(ToolManager::isUrlExists("https://www.pinterest.com/"));

$this->assertTrue(ToolManager::isUrlExists("https://regex101.com"));

$this->assertTrue(ToolManager::isUrlExists("https://twitter.com/arcadefire"));

$this->assertTrue(ToolManager::isUrlExists("https://vimeo.com/"));

$this->assertTrue(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666"));
}

我希望这个解决方案可以帮助某人,

来自蒙特利尔的Jonathan Parent-Lévesque