我尝试过使用PHP下载$url = 'https://kat.cr/usearch/life%20of%20pi/';
下面的页面的几种方法。但是,我总是收到一个加密字符的页面。
我尝试在发布之前搜索可能的解决方案,并尝试了一些,但是,我还没有能够开始工作。
请参阅我在下面尝试的方法并提出解决方案。我正在寻找一个PHP解决方案。
方法1 - 使用file_get_contents - 返回加密字符
<?php
//$contents = file_get_contents($url, $use_include_path, $context, $offset);
include('simple_html_dom.php');
$url = 'https://kat.cr/usearch/life%20of%20pi/';
$html = str_get_html(utf8_encode(file_get_contents($url)));
echo $html;
?>
方法2 - 使用file_get_html - 返回加密字符
include('simple_html_dom.php');
$url = 'https://kat.cr/usearch/life%20of%20pi/';
$encoded = htmlentities(utf8_encode(file_get_html($url)));
echo $encoded;
?>
方法3 - 使用gzread - 返回空白页
<?php
include('simple_html_dom.php');
$url = 'https://kat.cr/usearch/life%20of%20pi/';
$fp = gzopen($url,'r');
$contents = '';
while($html = gzread($fp , 256000))
{
$contents .= $html;
}
gzclose($fp);
?>
方法4 - 使用gzinflate - 返回空白页
<?php
include('simple_html_dom.php');
//function gzdecode($data)
//{
// return gzinflate(substr($data,10,-8));
//}
//$contents = file_get_contents($url, $use_include_path, $context, $offset);
$url = 'https://kat.cr/usearch/life%20of%20pi/';
$html = str_get_html(utf8_encode(file_get_contents($url)));
echo gzinflate(substr($html,10,-8));
?>
方法5 - 使用fopen和fgets - 返回加密字符
<?php
$url='https://kat.cr/usearch/life%20of%20pi/';
$handle = fopen($url, "r");
if ($handle)
{
while (($line = fgets($handle)) !== false)
{
echo $line;
}
}
else
{
// error opening the file.
echo "could not open the wikipedia URL!";
}
fclose($handle);
?>
方法6 - 在脚本开头添加ob_start - 页面不加载
<?php
ob_start("ob_gzhandler");
$url = 'https://kat.cr/usearch/life%20of%20pi/';
$handle = fopen($url, "r");
if ($handle)
{
while (($line = fgets($handle)) !== false)
{
echo $line;
}
}
else
{
// error opening the file.
echo "could not open the wikipedia URL!";
}
fclose($handle);
?>
方法7 - 使用curl - 返回空白页
<?php
$url = 'https://kat.cr/usearch/life%20of%20pi/';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // Define target site
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return page in string
curl_setopt($cr, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2');
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
curl_setopt($ch, CURLOPT_TIMEOUT,5);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects
$return = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
$html = str_get_html("$return");
echo $html;
?>
方法8 - 使用R - 返回加密字符
> thepage = readLines('https://kat.cr/usearch/life%20of%20pi/')
There were 29 warnings (use warnings() to see them)
> thepage[1:5]
[1] "\037‹\b"
[2] "+SC®\037\035ÕpšÐ\032«F°{¼…àßá$\030±ª\022ù˜ú×Gµ."
[3] "\023\022&ÒÅdDjÈÉÎŽj\t¹Iꬩ\003ä\fp\024“ä(M<©U«ß×Ðy2\tÈÂæœ8ž\036â!9ª]ûd<¢QR*>öÝdpä’kß!\022?ÙG~è'>\016¤ØÁ\0019Re¥†\0264æ’؉üQâÓ°Ô^—\016\t¡‹\\:\016\003Š]4¤aLiˆ†8ìS\022Ão€'ðÿ\020a;¦Aš`‚<\032!/\"DF=\034'EåX^ÔˆÚ4‰KDCê‡.¹©¡ˆ\004Gµ4&8r\006EÍÄO\002r|šóóZðóú\026?\0274Š ½\030!\týâ;W8Ž‹k‡õ¬™¬ÉÀ\017¯2b1ÓA< \004„š€&J"
[4] "@ƒˆxGµz\035\032Jpâ;²C‡u\034\004’Ñôp«e^*Wz-Óz!ê\022\001èÌI\023ä;LÖ\v›õ‡¸O⺇¯Y!\031þ\024-mÍ·‡G#°›„¦Î@º¿ÉùÒò(ìó¶³f\177¤?}\017½<Cæ_eÎ\0276\t\035®ûÄœ\025À}rÌ\005òß$t}ï/IºM»µ*íÖšh\006\t#kåd³¡€âȹE÷CÌG·!\017ý°èø‡x†ä\a|³&jLJõìè>\016ú\t™aᾞ[\017—z¹«K¸çeØ¿=/"
[5] "\035æ\034vÎ÷Gûx?Ú'ûÝý`ßßwö¯v‹bÿFç\177F\177\035±?ÿýß\177þupþ'ƒ\035ösT´°ûï¢<+(Òx°Ó‰\"<‘G\021M(ãEŽ\003pa2¸¬`\aGýtÈFíî.úÏîAQÙ?\032ÉNDpBÎ\002Â"
方法9 - 使用BeautifulSoup(python) - 返回加密字符
import urllib
htmltext = urllib.urlopen("https://kat.cr/usearch/life%20of%20pi/").read()
print htmltext
方法10 - 在linux终端上使用wget - 获取带有加密字符的页面
wget -O page https://kat.cr/usearch/Monsoon%20Mangoes%20malayalam/
方法11 -
tried manually by pasting the url to the below service - works
方法12 -
tried manually by pasting the url to the below service - works