网络抓取表元素-未捕获的错误

时间:2019-05-23 08:48:45

标签: php web-scraping simple-html-dom

我正在尝试使用简单htm dom和PHP从网站上抓取数据

网站: http://portal.chmi.cz/aktualni-situace/aktualni-stav-pocasi/ceska-republika/stanice/profesionalni-stanice/tabulky/teplota

但是我的代码不起作用,出现了致命错误。 谁能帮我吗?

错误:

  

致命错误:未捕获错误:调用数组中的成员函数find()   在C:\ xampp \ htdocs \ simple_dom \ index.php:20堆栈跟踪中:#0 {main}   在第20行的C:\ xampp \ htdocs \ simple_dom \ index.php中抛出

我的代码:

<?php
      include('simple_html_dom.php');

      $html = file_get_html('http://portal.chmi.cz/aktualni-situace/aktualni-stav-pocasi/ceska-republika/stanice/profesionalni-stanice/tabulky/teplota',false);

      $table = $html->find('table');
      $Data = array();

      foreach($table->find('tr[class=portlet-table-alternate]') as $row) {

          $rowData = array();

              foreach($row->find('td') as $cell) {

                  $rowData[] = $cell->innertext;
                  }

          $Data[] = $rowData;
      }
      print_r($Data);
?>

1 个答案:

答案 0 :(得分:0)

该错误告诉您确切的问题所在。您得到array作为file_get_html function的响应。在下一行中,当您致电

$table = $html->find('table');

您正试图调用find的{​​{1}}方法,这违反了PHP的规则。您将需要找出为什么得到array的内容。直观地,我认为您得到一个array,其中一个项目(可能具有索引0)包含您要查找的项目。因此,您需要调查您的array。如果这是一个错误,那么您将成功迈出一步,了解问题的本质。如果结果具有有用的属性,则可以使用它。有关更多参考,请参见array的实现:

file_get_html

编辑

事实证明,该错误发生在/** * All of the Defines for the classes below. * @author S.C. Chen <me578022@gmail.com> */ define('HDOM_TYPE_ELEMENT', 1); define('HDOM_TYPE_COMMENT', 2); define('HDOM_TYPE_TEXT', 3); define('HDOM_TYPE_ENDTAG', 4); define('HDOM_TYPE_ROOT', 5); define('HDOM_TYPE_UNKNOWN', 6); define('HDOM_QUOTE_DOUBLE', 0); define('HDOM_QUOTE_SINGLE', 1); define('HDOM_QUOTE_NO', 3); define('HDOM_INFO_BEGIN', 0); define('HDOM_INFO_END', 1); define('HDOM_INFO_QUOTE', 2); define('HDOM_INFO_SPACE', 3); define('HDOM_INFO_TEXT', 4); define('HDOM_INFO_INNER', 5); define('HDOM_INFO_OUTER', 6); define('HDOM_INFO_ENDSPACE',7); define('DEFAULT_TARGET_CHARSET', 'UTF-8'); define('DEFAULT_BR_TEXT', "\r\n"); define('DEFAULT_SPAN_TEXT', " "); define('MAX_FILE_SIZE', 600000); // helper functions // ----------------------------------------------------------------------------- // get html dom from file // $maxlen is defined in the code as PHP_STREAM_COPY_ALL which is defined as -1. function file_get_html($url, $use_include_path = false, $context=null, $offset = -1, $maxLen=-1, $lowercase = true, $forceTagsClosed=true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT) { // We DO force the tags to be terminated. $dom = new simple_html_dom(null, $lowercase, $forceTagsClosed, $target_charset, $stripRN, $defaultBRText, $defaultSpanText); // For sourceforge users: uncomment the next line and comment the retreive_url_contents line 2 lines down if it is not already done. $contents = file_get_contents($url, $use_include_path, $context, $offset); // Paperg - use our own mechanism for getting the contents as we want to control the timeout. //$contents = retrieve_url_contents($url); if (empty($contents) || strlen($contents) > MAX_FILE_SIZE) { return false; } // The second parameter can force the selectors to all be lowercase. $dom->load($contents, $lowercase, $stripRN); return $dom; } 行中,调用了一个空foreach的方法。一种array简洁而优雅的方法是定义一个助手foreach

function
相关问题