Question

我正在尝试创建一个try / catch循环，用于从其他网站下载HTML：

foreach($intldes as $id) {
    $html = HtmlDomParser::file_get_html('https://nssdc.gsfc.nasa.gov/nmc/spacecraftDisplay.do?id='.$id); 
    foreach($html->find('#rightcontent') as $id);
    foreach($html->find('.urone p') as $element);
    foreach($html->find('.urtwo') as $launchdata); 
}

如果数据存在，则会生成以下HTML：

<p><strong>NSSDCA/COSPAR ID:</strong> 2009-038F</p>
<p>ANDE 2, the Atmospheric Neutral Density Experiment 2, is a pair of microsatellites (Castor and Pollux) launched from Cape Canaveral on STS 127 on 15 July 2009 at 22:03 UT and deployed from the payload bay of the shuttle on 30 July 2009 at 17:22 UT.</p>
<p><strong>Launch Date:</strong> 2009-07-15<br/><strong>Launch Vehicle:</strong> Shuttle<br/><strong>Launch Site:</strong> Cape Canaveral, United States<br/></p>

如果数据不存在，我会收到Undefined variable: element错误，这意味着DOM Parser无法找到我想要显示的HTML。

所以我需要一些能够跳过没有所需HTML或返回NULL变量的网页的东西。

基本上，如果我想要的HTML或变量$element不存在，我希望Guzzle跳过该网页而不加载它。

修改

我的全部功能：

    public function tester() {
    $intldes = DB::table('examples')->pluck('id');
    foreach ($intldes as $query) {
        $html = HtmlDomParser::file_get_html('https://example.com?id='.$query); 
        $elements = $html->find('.urone p', 0);
    if (is_array($elements)) {
        foreach($html->find('#rightcontent') as $rawid);
        foreach($html->find('.urone p') as $rawdescription);
        foreach($html->find('.urtwo') as $launchdata); 

        //-- Data Parser --//
        //Intldes
        $intldesgetter = strip_tags($rawid->first_child()->next_sibling()->next_sibling()); //Get Element and Remove Tags
        $intldesformat = substr($intldesgetter, ($pos = strpos($intldesgetter, ':')) !== false ? $pos + 3 : 0); //Remove Title
        $dbintldes = ltrim($intldesformat); //Remove Blank-space

        //Description
        $description = strip_tags($rawdescription);
        $dbdescription = ltrim($description);

        //Launch Data
        $launchdate = $launchdata->first_child()->next_sibling()->next_sibling()->next_sibling();
        $explode = explode("<br/>", $launchdate);
        $newArray = array_map(function($v){
            return trim(strip_tags($v));
        }, $explode);
        $dblaunchdate = substr($newArray[0], ($pos = strpos($newArray[0], ':')) !== false ? $pos + 3 : 0);
        $dblaunchvehicle = substr($newArray[1], ($pos = strpos($newArray[1], ':')) !== false ? $pos + 3 : 0);
        $dblaunchsite = substr($newArray[2], ($pos = strpos($newArray[2], ':')) !== false ? $pos + 3 : 0);

        //Data Saver
        DB::table('descriptions')->insert(
            ['intldes' => $dbintldes, 'description' => strip_tags($dbdescription), 'launch_date' => $dblaunchdate, 'launch_vehicle' => $dblaunchvehicle, 'launch_site' => $dblaunchsite]
        );
        echo "Success"; 
        } else {
            echo "$query does not exist";
            continue;
        };
    } 
}

Answer 1

我认为您的代码中出现错误：

foreach($html->find('.urone p') as $element);

根据我的经验，我建议您在迭代 foreach 循环之前先检查 HTML标记的可用性。

您可以使用is_object()或is_array()来解决问题。搜索单个元素时，将返回一个对象。搜索一组元素时，将返回一个对象数组。

在搜索元素集时，可以使用

$elements = $html->find('.urone p');
if (is_array($elements)) {
    //continue
}

Guzzle HTTP - 如果出现异常或未找到页面则跳过

1 个答案: