如何解析远程网页并从<span>字段列表</span>中提取字符串

时间:2014-05-11 16:48:08

标签: php arrays string

我遇到了从数组中打印字符串列表的问题。我使用$ test变量打印字符串列表。当我试图从数组中打印字符串时,我将得到每个数组中一个字母的短字符串,我不希望这样。我想使用$ count打印每个数组的完整字符串来计算值。

这是输入:

SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSLLLLLLLLLLLLL
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
SSSSSSSSSSSSSSSSSSSSSSAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAADDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDLLLLLLLLLLLL
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

当我尝试这段代码时:

   <?php
   ini_set('max_execution_time', 300);
   $errmsg_arr = array();
   $errflag = true;

   function getState($string)
   {
     $ex = explode(" ",$string);
     return $ex[1];
   }
   $baseUrl = file_get_contents('http://www.myownsite.com/get-listing.php'
   );

   $domdoc = new DOMDocument();
   $domdoc->strictErrorChecking = false;
   $domdoc->recover=true;
   @$domdoc->loadHTML($baseUrl);
   $links = $domdoc->getElementsByTagName('a');
   $i = 0;
   $count = 0;
   for ($i = 1; $i < 70; $i++)
   {
     $time_arr[] = $xpath->query("*/span[@id='time".$i."']");
     $programme_arr[] = $xpath->query("*/span[@id='title".$i."']");
   }

   $programme_title = array();
   foreach($programme_arr as $programme) 
   {
     $programme1 = $programme->item(0)->nodeValue;
     $programme_title[] = $programme1;
   }

   foreach($time_arr as $time)
   {
     //$test = implode(' ', $programme_title);
     //$tester[] = $test;
     //echo $tester;

     $test = implode(' ', $programme_title);
     echo $test[$count];
   }
 }
 ?>

你知道如何在不打印一个字母的情况下从每个数组中打印完整的字符串 与$ test变量一起使用时?

编辑:输出结果如下:

Sister Act Sister Act 2: Back in the Habit Mamma Mia!Forrest Gump(D,L,V,S) 
The Blind SideJoel OsteenJoyce Meyer: Enjoying Everyday LifeShaun T's 
Focus T25Summer Sexy With T25!Total Gym for $14.95Dr. Ordon's Secret!
Sleep Better!Steam And Spray The Dirt Away ... SHARK Style!Shaun T's Focus T25
Airbrushed BeautyJoseph PrinceLife Today With James Robison - Mark Driscoll 1
Joyce Meyer: Enjoying Everyday LifeShaun T's Focus T25That '70s Show - I Love Cake
That '70s Show - Sleepover That '70s Show - Eric Gets Suspended That '70s Show - 
Red's BirthdayStill Standing - Still Thankful700 Club InteractiveThe 700 ClubGil
more Girls - Haunted Leg8 Simple Rules -

依旧......

编辑:这是更新代码:

   $programme_title = array();
   foreach($programme_arr as $programme)
   {
     $programme1 = $programme->item(0)->nodeValue;
     $programme_title[] = $programme1;
   }


   foreach($time_arr as $time)
   {
     echo $programme_title[$count];
     //$test = implode(' ', $programme_title);
     //echo $test[$count];
   }

1 个答案:

答案 0 :(得分:1)

您正在致电

 $time_arr[] = $xpath->query("*/span[@id='time".$i."']");
在它被定义之前

。您需要将 $ xpath 定义为解析器。像这样的东西

@$domdoc->loadHTML($baseUrl);

$xpath = DOMXPath( $domdoc );

然后您就可以从文档中请求数据。

现在可以按预期给出结果。我确实使用了提供的网址

 <?php
   ini_set('max_execution_time', 300);
   $errmsg_arr = array();
   $errflag = true;

   function getState($string)
   {
     $ex = explode(" ",$string);
     return $ex[1];
   }

   // read the remote file
   $baseUrl = file_get_contents('http://some-server.com/get-listing.php?channels=ABC%20FAMILY&id=101');

   // create the parser
   $domdoc = new DOMDocument();
   $domdoc->strictErrorChecking = false;
   $domdoc->recover=true;
   @$domdoc->loadHTML($baseUrl);
   $xpath = new DOMXpath($domdoc);

   //$links = $domdoc->getElementsByTagName('a');

   $i = 0;
   $count = 0;
   for ($i = 1; $i < 5; $i++){
     $time_arr[] = $xpath->query("*/span[@id='time".$i."']");
     $programme_arr[] = $xpath->query('//span[@id="title'.$i.'"]');
   }

   $programme_title = array();
   foreach($programme_arr as $programme) {
     $programme1 = $programme->item(0)->nodeValue;
     $programme_title[] = $programme1;
   }

   //

   echo "<pre>";

   print_r( $programme_title );

   echo "</pre>";

   $count = 0;
   foreach($time_arr as $time){
     //$test = implode(' ', $programme_title);
     //$tester[] = $test;
     //echo $tester;
     echo $time->item(0)->nodeValue." ";
     echo $programme_title[$count++]."<br>";
   }


 ?> 

它返回

Array
(
    [0] => Sister Act 2: Back in the Habit
    [1] => Mamma Mia!
    [2] => Forrest Gump(D,L,V,S)
    [3] => The Blind Side
)

1:30 PM Sister Act 2: Back in the Habit
3:30 PM Mamma Mia!
6:00 PM Forrest Gump(D,L,V,S)
9:00 PM The Blind Side