使用简单的xml来读取html页面的部分会返回非对象的警告/通知,并需要修复此问题

时间:2013-06-19 04:37:23

标签: mysqli html-parsing simplexml php

好吧所以我一直在研究这个小脚本,从ted.com上删除一页,一切正常,因为我想要它(意思是我可以打印出我感兴趣的所有值),问题是某些原因我在运行刮刀时会收到这些警告,但我不确定为什么正确打印出通知/警告的值?

 PHP Warning:  dom_import_simplexml() expects parameter 1 to be object, null given in /var/www/ted/import_ted.php on line 23
 PHP Notice:  Trying to get property of non-object in /var/www/ted/import_ted.php on line 23
 PHP Notice:  Undefined offset: 1 in /var/www/ted/import_ted.php on line 25
 PHP Notice:  Trying to get property of non-object in /var/www/ted/import_ted.php on line 27

这是我的实际PHP脚本(我已经注释了警告和注意的行)

<?php
    $mysqli = mysqli_connect("localhost", "user", "password", "database");
    if (mysqli_connect_errno())
      {
      echo "Failed to connect to MySQL: " . mysqli_connect_error();
      }

 $html = file_get_contents('http://www.ted.com/talks/quick-list?sort=date&order=desc');
 $doc = new DOMDocument();
 $doc->loadHTML($html);
 $sxml = simplexml_import_dom($doc);
 $rows = $sxml->xpath('//tr');
 $description="not_available";
 $ted_link="none";
 $i=0;
 //$stmt = $mysqli->prepare("INSERT INTO `ted` VALUES( ?, ?, ?, ?, ?, ?, ?)");

 foreach($rows as $row) {
    $video = Array();
    $video['pub_date']=  $row->td[0];
    $video['event'] = $row->td[1];
    $sec_temp = explode(":" , dom_import_simplexml($row->td[2])->textContent );//line23
    $video['speaker'] = $sec_temp[0];
    $video['title'] = $sec_temp[1]; //line 25
    $video['duration'] = $row->td[3];
    $video['link'] = $row->td[4]->a[2]['href']; //line27
    print( "\n  line Number: " . $i . "title: " . $video['title']);
    print ("link: " .$video['link']);
   if($i != 0){
      //      $stmt->bind_param("sssssss", $video['event'], $video['speaker'], $video['title'], $description, $ted_link, $video['link'], $description, $video['pub_date'] );
 //        $stmt->execute();
   }
  $i++;
}

// $ stmt-&gt; close();

&GT;

好的,就像我说的一切都打印出我期待的内容,包括由于某种原因产生未定义偏移的$video['title']。问题是,直到我可以使这些变量“对象”我不能将它们作为参数绑定到mysqli查询。但是我似乎无法弄清楚如何做到这一点?

此外还有相关内容是一个表格行的片段,这是问题(我不认为是这样)

<tr>
    <td>Jun 2013</td>
<td>TEDGlobal 2013</td>
<td><a href="/talks/manal_al_sharif_a_saudi_woman_who_dared_to_drive.html">Manalal-Sharif: A Saudi woman who dared to drive</a> </td>
<td>14:16</td>
<td><a href="http://download.ted.com/talks/ManalAlSharif_2013G-light.mp4?apikey=TEDDOWNLOAD">Low</a> | <a href="http://download.ted.com/talks/ManalAlSharif_2013G.mp4?apikey=TEDDOWNLOAD">Regular</a> | <a href="http://download.ted.com/talks/ManalAlSharif_2013G-480p.mp4?apikey=TEDDOWNLOAD">High</a></td>
</tr>

另请注意,我尝试在绑定之前使用settype($ var,“object”)而不使用luckeither(虽然重复为true)

无论如何,我将非常感谢任何有关如何使其工作的帮助!

1 个答案:

答案 0 :(得分:1)

<?php 

 $html = file_get_contents('http://www.ted.com/talks/quick-list?sort=date&order=desc');
 $doc = new DOMDocument();
 $doc->loadHTML($html);
 $sxml = simplexml_import_dom($doc);
 $rows = $sxml->xpath('//tr');
 /* print_r($rows);
 die(); */


 $description="not_available";
 $ted_link="none";
 $i=0;
 //$stmt = $mysqli->prepare("INSERT INTO `ted` VALUES( ?, ?, ?, ?, ?, ?, ?)");

 foreach($rows as $row) {

    //first object is th not an td
    if(isset($row->th))
    {
        echo $row->th[1]->a;
        echo $row->th[2]->a;        
        echo $row->th[3]->a;        
        echo $row->th[4];       

    }else{

        $video['pub_date']=  $row->td[0];
        $video['event'] = $row->td[1];
        $sec_temp = explode(":" , $row->td[2]->a);//line23
        $video['speaker'] = $sec_temp[0];
        $video['title'] = $sec_temp[1]; //line 25
        $video['duration'] = $row->td[3];
        $video['link'] = $row->td[4]->a[2]['href']; //line27
        print( "\n  line Number: " . $i . "title: " . $video['title']);
        print ("link: " .$video['link']); 

        if($i != 0){
            //      $stmt->bind_param("sssssss", $video['event'], $video['speaker'], $video['title'], $description, $ted_link, $video['link'], $description, $video['pub_date'] );
            //        $stmt->execute();
        }
        $i++;
    }

}