简单的html dom url长度错误

时间:2013-06-01 13:22:39

标签: php simple-html-dom

<?php
include('../simple_html_dom.php');

$fname = "http://www.myurl.com";

$html = file_get_html($fname);

$divs = $html->find('h6');
foreach($divs as $element)
{
 $title = $element->find('a', 0)->plaintext;
 echo $title.'<br>';
}
echo '<br>';
?>

我收到了这个错误:

  

“无法打开流:HTTP请求失败!HTTP / 1.1 500内部服务器错误.......”

我的网址很长,实际长度为750个字符。 如果我使用wget它显示“文件名太长”

我该如何解决?我需要它来使用简单的dom

3 个答案:

答案 0 :(得分:2)

URL长度可以使用750个字符。最常用的实际限制是2000个字符,这是旧IE中的限制。

您应该尝试模拟发出请求的Web浏览器。请参阅this other question

编辑:将CURL与您的代码一起使用

<?php

// include is not a function, don't use parens (also use require instead)
require '../simple_html_dom.php';

$fname = "http://www.myurl.com";

$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';

$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
// don't want to polute your output
//curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL, $fname);
$result=curl_exec($ch);

$html = new simple_html_dom();
$html->load($result);

$divs = $html->find('h6');
foreach($divs as $element)
{
 $title = $element->find('a', 0)->plaintext;
 echo $title.'<br>';
}
echo '<br>';

答案 1 :(得分:0)

网址长度很好。该链接可能已损坏或已过期。 我尝试使用下面显示的链接,结果似乎很好:

<?php
include("simple_html_dom.php");

$fname = "http://www.youtubeonfire.com/?genre=0&language=0&next_token=rO0ABXNyACdjb20uYW1hem9uLnNkcy5RdWVyeVByb2Nlc3Nvci5Nb3JlVG9rZW7racXLnINNqwMA%0AC0kAFGluaXRpYWxDb25qdW5jdEluZGV4WgAOaXNQYWdlQm91bmRhcnlKAAxsYXN0RW50aXR5SURa%0AAApscnFFbmFibGVkSQAPcXVlcnlDb21wbGV4aXR5SgATcXVlcnlTdHJpbmdDaGVja3N1bUkACnVu%0AaW9uSW5kZXhaAA11c2VRdWVyeUluZGV4TAANY29uc2lzdGVudExTTnQAEkxqYXZhL2xhbmcvU3Ry%0AaW5nO0wAEmxhc3RBdHRyaWJ1dGVWYWx1ZXEAfgABTAAJc29ydE9yZGVydAAvTGNvbS9hbWF6b24v%0Ac2RzL1F1ZXJ5UHJvY2Vzc29yL1F1ZXJ5JFNvcnRPcmRlcjt4cAAAAAEAAAAAAAABds0AAAAAAQAA%0AAAC71ED7AAAAAAFwdAAQMDAwMDAwMDAwMDAwMjAxM35yAC1jb20uYW1hem9uLnNkcy5RdWVyeVBy%0Ab2Nlc3Nvci5RdWVyeSRTb3J0T3JkZXIAAAAAAAAAABIAAHhyAA5qYXZhLmxhbmcuRW51bQAAAAAA%0AAAAAEgAAeHB0AApERVNDRU5ESU5HeA%3D%3D&sort=2";

$html = file_get_html($fname);

$divs = $html->find("h6");
foreach($divs as $element) {
    $title = $element->find("a", 0)->plaintext;
    echo($title . "<br />");
}
echo("<br />");

输出:

Spider (2013)
500 MPH STORM 2013 HD
Van Diemans Land (Action,Adventure,20...
Good Agent is A Bad Agent (Full HQ En...
Employee of the Month (Full HQ Englis...
The Croods (2013)
GIRLFRIENDS - 2013
Boys Are Pigs-2013
The Patriot -2013
My Daughter&#x27;s Secret -2013
Dead on Arrival [2013]
Flght 2013XViD1
Samsung Galaxy S4 Presentation UNPACK...
Affinity 2013
Golden Globe Awards 2013: Full Show
Parker-2013
Hells&#x27; Kitchen-  New Action Movie 2013
ALIENS [2013]
7 Nights Of Darkness -2013
Hansel And Gretel 2013
The Collection (2012)
Mac And Devin Go To High School 2012
Red Dawn (2012)
Hijacked -2012
Bending The Rules -2012
Inside -2012
VAMPIRELAND-2012
Dead Mine -2012
Devil Seed-2012
Kill Em All -2012
One In The Chamber -2012
The Forger - 2012
Dark Desire -2012
A Common Man -2012 .
The Helpers -2012
Red Dawn- 2012 720p

所以,用URL解决问题,一切都会正常工作!

答案 2 :(得分:0)

您说您的网址在您的浏览器中正常运行,而我们这里的所有人都收到了500错误,就像您的脚本一样。

该站点可能会根据IP以及可能的请求的其他标头检查URL中的令牌。因此,您需要找到一种从PHP脚本中获取标记化URL的方法。

为此,您需要先从PHP脚本下载主页,然后找到下一个链接的URL并在脚本中使用此页面。