Question

我尝试过使用PHP下载$url = 'https://kat.cr/usearch/life%20of%20pi/';下面的页面的几种方法。但是，我总是收到一个加密字符的页面。

我尝试在发布之前搜索可能的解决方案，并尝试了一些，但是，我还没有能够开始工作。

请参阅我在下面尝试的方法并提出解决方案。我正在寻找一个PHP解决方案。

方法1 - 使用file_get_contents - 返回加密字符

<?php
//$contents = file_get_contents($url, $use_include_path, $context, $offset);

include('simple_html_dom.php');

$url = 'https://kat.cr/usearch/life%20of%20pi/';
$html = str_get_html(utf8_encode(file_get_contents($url)));

echo $html;


?>

方法2 - 使用file_get_html - 返回加密字符

include('simple_html_dom.php');

$url = 'https://kat.cr/usearch/life%20of%20pi/';

$encoded = htmlentities(utf8_encode(file_get_html($url)));
echo $encoded;

?>

方法3 - 使用gzread - 返回空白页

<?php

include('simple_html_dom.php');

$url = 'https://kat.cr/usearch/life%20of%20pi/';

$fp = gzopen($url,'r');

$contents = '';

while($html = gzread($fp , 256000))
{
    $contents .= $html;
}

gzclose($fp);

?>

方法4 - 使用gzinflate - 返回空白页

<?php

include('simple_html_dom.php');
//function gzdecode($data)
//{
//    return gzinflate(substr($data,10,-8));
//}

//$contents = file_get_contents($url, $use_include_path, $context, $offset);



$url = 'https://kat.cr/usearch/life%20of%20pi/';
$html = str_get_html(utf8_encode(file_get_contents($url)));

echo gzinflate(substr($html,10,-8));


?>

方法5 - 使用fopen和fgets - 返回加密字符

<?php
$url='https://kat.cr/usearch/life%20of%20pi/';
$handle = fopen($url, "r");

if ($handle)
{
    while (($line = fgets($handle)) !== false)
    {
        echo $line;
    }
}
else
{
    // error opening the file.
    echo "could not open the wikipedia URL!";
}
fclose($handle);
?>

方法6 - 在脚本开头添加ob_start - 页面不加载

<?php
ob_start("ob_gzhandler");

$url = 'https://kat.cr/usearch/life%20of%20pi/';
$handle = fopen($url, "r");

if ($handle)
{
    while (($line = fgets($handle)) !== false)
    {
        echo $line;
    }
}
else
{
    // error opening the file.
    echo "could not open the wikipedia URL!";
}
fclose($handle);
?>

方法7 - 使用curl - 返回空白页

<?php    
$url = 'https://kat.cr/usearch/life%20of%20pi/';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // Define target site
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return page in string
curl_setopt($cr, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2');
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
curl_setopt($ch, CURLOPT_TIMEOUT,5);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects

$return = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);

$html = str_get_html("$return");
echo $html;

?>

方法8 - 使用R - 返回加密字符

> thepage = readLines('https://kat.cr/usearch/life%20of%20pi/')
There were 29 warnings (use warnings() to see them)
> thepage[1:5]
[1] "\037‹\b"                                                                                                                                                                                                                                                                                                         
[2] "+SC®\037\035ÕpšÐ\032«F°{¼…àßá$\030±ª\022ù˜ú×Gµ."                                                                                                                                                                                                                                                                
[3] "\023\022&ÒÅdDjÈÉÎŽj\t¹Iê¬©\003ä\fp\024“ä(M<©U«ß×Ðy2\tÈÂæœ8ž\036â!9ª]ûd<¢QR*>öÝdpä’kß!\022?ÙG~è'>\016¤ØÁ\0019Re¥†\0264æ’Ø‰üQâÓ°Ô^—\016\tÂ¡‹\\:\016\003Š]4¤aLiˆ†8ìS\022Ão€'ðÿ\020a;¦Aš`‚<\032!/\"DF=\034'EåX^ÔˆÚ4‰KDCê‡.¹©¡ˆ\004Gµ4&8r\006EÍÄO\002r|šóóZðóú\026?\0274Š ½\030!\týâ;W8Ž‹k‡õ¬™¬ÉÀ\017¯2b1ÓA< \004„š€&J"
[4] "@ƒˆxGµz\035\032Jpâ;²C‡u\034\004’Ñôp«e^*Wz-Óz!ê\022\001èÌI\023ä;LÖ\v›õ‡¸Oâº‡¯Y!\031þ\024-mÍ·‡G#°›„¦Î@º¿ÉùÒò(ìó¶³f\177¤?}\017½<Cæ_eÎ\0276\t\035®ûÄœ\025À}rÌ\005òÃŸ$t}ï/IºM»µ*íÖšh\006\t#kåd³¡€âÈ¹E÷CÌG·!\017ý°èø‡x†ä\a|³&jÇ‡õìè>\016ú\t™aá¾ž[\017—z¹«K¸çeØ¿=/"                                                    
[5] "\035æ\034vÎ÷Gûx?Ú'ûÝý`ßßwö¯v‹bÿFç\177F\177\035±?ÿýß\177þupþ'ƒ\035ösT´°ûï¢<+(Òx°Ó‰\"<‘G\021M(ãEŽ\003pa2¸¬`\aGýtÈFíî.úÏîAQÙ?\032ÉNDpBÎ\002Â"

方法9 - 使用BeautifulSoup（python） - 返回加密字符

import urllib

htmltext = urllib.urlopen("https://kat.cr/usearch/life%20of%20pi/").read()
print htmltext

方法10 - 在linux终端上使用wget - 获取带有加密字符的页面

wget -O page https://kat.cr/usearch/Monsoon%20Mangoes%20malayalam/

方法11 -

tried manually by pasting the url to the below service - works

https://www.hurl.it/

方法12 -

    tried manually by pasting the url to the below service - works

https://www.import.io/

抓取网页会返回加密字符

0 个答案: