为什么Simple HTML DOM Parser不会通过某些网站?

时间:2013-01-24 16:39:43

标签: php html parsing dom

编辑:当我添加error_reporting(-1);我收到此错误消息:

Notice: Trying to get property of non-object in /data/22/2/145/126/2960289/user/3282682/htdocs/add.php on line 106

Notice: Undefined offset: 0 in /data/22/2/145/126/2960289/user/3282682/htdocs/add.php on line 108

我正在尝试使用Simple HTML DOM Parser(http://simplehtmldom.sourceforge.net/)来搜索网站。它在localhost上工作得很好但是当我将它上传到我的主机(网络解决方案)时,一些网址不再给出任何结果。当我从file_get_html切换到str_get_html时发生了这种情况。我必须这样做,因为主持人。你知道什么是错的吗?我很抱歉脚本看起来如何,但我是一个新手...也许你有一些关于如何压缩它的技巧?脚本如下所示:

<?php
session_start(); 
include("connect.php");

if (isset($_POST['done'])) {

    function getimg($url) {         
        $headers[] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg';              
        $headers[] = 'Connection: Keep-Alive';         
        $headers[] = 'Content-type: application/x-www-form-urlencoded;charset=UTF-8';         
        $user_agent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)';         
        $process = curl_init($url);         
        curl_setopt($process, CURLOPT_HTTPHEADER, $headers);         
        curl_setopt($process, CURLOPT_HEADER, 0);         
        curl_setopt($process, CURLOPT_USERAGENT, $user_agent);         
        curl_setopt($process, CURLOPT_TIMEOUT, 30);         
        curl_setopt($process, CURLOPT_RETURNTRANSFER, 1);         
        curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1);         
        $return = curl_exec($process);         
        curl_close($process);         
        return $return;     
    } 

    $imgurl = $_POST['finalimg'];
    $random = substr(number_format(time() * rand(),0,'',''),0,30);
    $imagename= basename($random.".jpg");
    if(file_exists('./upload/'.$imagename)){continue;} 
    $image = getimg($imgurl); 
    file_put_contents('upload/'.$imagename,$image);

    $finalurl = $_POST['finalurl'];
    $description = $_POST['description'];
    $titlen = $_POST['title'];
    $pricen = $_POST['price'];

    $sql = "INSERT INTO samples(description, name, productUrl, imageUrl, price)
        VALUES('$description', '$titlen', '$finalurl', '$imagename', '$pricen')";
    mysql_query($sql);

        // Skickar vidare
        header("Location: collection.php?id={$_SESSION['sess_id']}");
        exit;

}

include("head.php");
?>

<div class="content add-content">

    <div class="header">
        <h1>Add your picture</span></h1>
    </div>

    <?php

    if (isset($_POST['submit'])) {

    require('DOM/simple_html_dom.php');
    require('DOM/example/url_to_absolute.php');

    $url = $_POST['url'];

    $curl = curl_init(); 
    curl_setopt($curl, CURLOPT_URL, $url);  
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);  
    curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);  
    $str = curl_exec($curl);  
    curl_close($curl);  

    $html = str_get_html($str);

    foreach($html->find('img') as $element) {

        $linktoimg = url_to_absolute($url, $element->src);

        $ch = curl_init();
        curl_setopt ($ch, CURLOPT_URL, $linktoimg);
        curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);


        $contents = curl_exec($ch);
        curl_close($ch);


        $new_image = ImageCreateFromString($contents);
        imagejpeg($new_image, "temp.jpg",100);


    $size = getimagesize("temp.jpg");


        // width and height


        $width = $size[0];
        $height = $size[1];

        echo $linktoimg . " Height: " . $height . " Width: " . $width . "<br />";

        if ($height >= 200) {

        $title1 = $html->find(".product_title");
        $title11 = $title1[0]->plaintext;
        $title2 = $html->find(".product-title");
        $title22 = $title2[0]->plaintext;
        $title3 = $html->find(".product_name");
        $title33 = $title3[0]->plaintext;
        $title4 = $html->find(".product_name");
        $title44 = $title4[0]->plaintext;
        $title5 = $html->find("h1");
        $title55 = $title5[0]->plaintext;

        if ($title55 != "") {
            $title = $title55;
        }

        if ($title44 != "") {
            $title = $title44;
        }

        if ($title33 != "") {
            $title = $title33;
        }

        if ($title22 != "") {
            $title = $title22;
        }

        if ($title11 != "") {
            $title = $title11;
        }

        $desc1 = $html->find("p .product_description");
        $desc11 = $desc1[0]->plaintext;
        $desc2 = $html->find("p .product-description");
        $desc22 = $desc2[0]->plaintext;
        $desc3 = $html->find("p .description");
        $desc33 = $desc3[0]->plaintext;
        $desc4 = $html->find(".product-description");
        $desc44 = $desc4[0]->plaintext;
        $desc5 = $html->find(".product_description");
        $desc55 = $desc5[0]->plaintext;
        $desc6 = $html->find(".description");
        $desc66 = $desc6[0]->plaintext;

        if ($desc66 != "") {
            $desc = $desc66;
        }

        if ($desc55 != "") {
            $desc = $desc55;
        }

        if ($desc44 != "") {
            $desc = $desc44;
        }

        if ($desc33 != "") {
            $desc = $desc33;
        }

        if ($desc22 != "") {
            $desc = $desc22;
        }

        if ($desc11 != "") {
            $desc = $desc11;
        }

        $price1 = $html->find(".product_price");
        $price11 = $price1[0]->plaintext;
        $price2 = $html->find(".product-price");
        $price22 = $price2[0]->plaintext;
        $price3 = $html->find(".price");
        $price33 = $price3[0]->plaintext;
        $price4 = $html->find("#product_price");
        $price44 = $price4[0]->plaintext;
        $price5 = $html->find("#product-price");
        $price55 = $price5[0]->plaintext;
        $price6 = $html->find("#price");
        $price66 = $price6[0]->plaintext;
        $price7 = $html->find(".product_price_details");
        $price77 = $price7[0]->plaintext;
        $price8 = $html->find(".price-red");
        $price88 = $price8[0]->plaintext;

        if ($price88 != "") {
            $price = $price88;
        }

        if ($price77 != "") {
            $price = $price77;
        }

        if ($price66 != "") {
            $price = $price66;
        }

        if ($price55 != "") {
            $price = $price55;
        }

        if ($price44 != "") {
            $price = $price44;
        }

        if ($price33 != "") {
            $price = $price33;
        }

        if ($price22 != "") {
            $price = $price22;
        }

        if ($price11 != "") {
            $price = $price11;
        }

            ?>

            <form action="add.php" method="post">
                <div class="add-wrapper">
                    <div style="font-weight: bold; font-size: 18px; margin-bottom: 20px; margin-top: -40px;">Is this a good picture?</div>
                    <div class="add-window">
                        <img src="timthumb.php?src=<?php echo $linktoimg; ?>&zc=3&h=432&w=500" style="border: 1px solid #000;" />
                    </div>

                </div>
                <input type="hidden" name="finalurl" value="<?php echo $_POST['url']; ?>">
                <input type="hidden" name="finalimg" value="<?php echo $linktoimg; ?>">
                <input type="hidden" name="title" value="<?php echo $title; ?>">
                <input type="hidden" name="description" value="<?php echo $desc; ?>">
                <input type="hidden" name="price" value="<?php echo $price; ?>">
                <input type="submit" value="" class="submit" name="done" style="margin-top: 40px;">
            </form>

            <?php
            echo $title . "<br /><br />";
            echo $desc . "<br /><br />";
            echo $price;
        break;

        }

    }

    } else {

    ?>

    <form action="add.php" method="post">

    <div class="add-url">
        <input type="text" name="url" class="biginput" placeholder="Paste the link...">
        <input type="submit" class="bigsubmit" value="" name="submit">
    </div>

    </form>

    <?php

    }

    ?>

</div>

<?php
include("feet.php");
?>

2 个答案:

答案 0 :(得分:1)

不要接受这个作为答案,因为它不是

你可以缩短

$price1 = $html->find(".product_price");
$price11 = $price1[0]->plaintext;
$price2 = $html->find();
$price22 = $price2[0]->plaintext;
$price3 = $html->find();
$price33 = $price3[0]->plaintext;
$price4 = $html->find();
$price44 = $price4[0]->plaintext;
$price5 = $html->find();
$price55 = $price5[0]->plaintext;
$price6 = $html->find();
$price66 = $price6[0]->plaintext;
$price7 = $html->find();
$price77 = $price7[0]->plaintext;
$price8 = $html->find();
$price88 = $price8[0]->plaintext;

if ($price88 != "") {
    $price = $price88;
}

if ($price77 != "") {
    $price = $price77;
}

if ($price66 != "") {
    $price = $price66;
}

if ($price55 != "") {
    $price = $price55;
}

if ($price44 != "") {
    $price = $price44;
}

if ($price33 != "") {
    $price = $price33;
}

if ($price22 != "") {
    $price = $price22;
}

if ($price11 != "") {
    $price = $price11;
}

以下内容:

$selectors = array(
    ".product_price",
    ".product-price",
    ".price",
    "#product_price",
    "#product-price",
    "#price",
    ".product_price_details",
    ".price-red"
);

foreach ($selectors as $selector) {
    $selectorPrice = $html->find($selector, 0)->plaintext;
    if (!empty($selectorPrice)) {
        $price = $selectorPrice;
        break;
    }
}

编码时要记住的一件好事是,如果你的代码在几行中看起来相同,那么它很可能会缩短它(并节省性能/提高理解力)。

答案 1 :(得分:0)

某些托管服务提供商有一个安全选项卡,您可以在其中为传出连接设置允许的IP。这主要是出于安全原因,这也是为什么有些用户无法升级或检查Joomla升级的原因,Joomla尝试连接到外部站点来检查更新,但服务器阻止了这些更新。

这是Hepsia控制面板上允许的IP传出连接选项卡的示例:

Example of outgoing allowed IPs fro Hepsia

如果您有类似的内容,则需要手动添加受信任的网址,添加后,服务器应该让您与他们联系