找到网页上的所有链接并自动下载所有这些链接作为使用php的separeted html文件

时间:2015-07-08 19:05:53

标签: php html curl

对不起我的英语,我会尽量解释得更好,我希望有人可以帮助我......谢谢你的推荐。

我试图在PHP中创建一个读取给定URL的网页源的页面,找到所有链接然后为每个单独的链接(如果是html)自动下载我的电脑上的文件(更好的是不要问哪里......)。

到现在为止我尝试这段代码:

 <?php

    $srcUrl= 'www.example.com';

    $html = file_get_contents($srcUrl);

    $dom = new DOMDocument();
    @$dom->loadHTML($html);

    // grab all the on the page
    $xpath = new DOMXPath($dom);

    //finding the a tag
    $hrefs = $xpath->evaluate("/html/body//a");

    $testo = '<table width="100%" border="1" cellspacing="2" cellpadding="2" summary="layout">
      <caption>
        List of links
      </caption>
      <tr>
        <th scope="col">&nbsp;</th>
      </tr>';

    //Loop to display all the links and download
    for ($i = 0; $i < $hrefs->length; $i++) {

           $href = $hrefs->item($i);
           $url = $href->getAttribute('href');

     //if real link
           if($url!='#')  

           {

        $testo.='<tr>
        <td>'.$url.'</td>
        </tr>';
     //take webpage
     $contents = file_get_contents($url);

     $filename = $url;

     //save the file...
     $fh = fopen($filename,"w");
     fwrite($fh,$contents);
     fclose($fh);

    //download automatically (better if without asking where... maybe in download folder)
    header('Content-disposition: attachment; filename=' . $filename);
    header("Content-Type: application/force-download");
    header('Content-type: text/html');

           }

    }

    $testo.='</table>';

    echo $testo;

    ?>

我甚至尝试卷曲:

<?php

$srcUrl= 'www.example.com';

$html = file_get_contents($srcUrl);

$dom = new DOMDocument();
@$dom->loadHTML($html);

// grab all the on the page
$xpath = new DOMXPath($dom);

//finding the a tag
$hrefs = $xpath->evaluate("/html/body//a");

$testo = '<table width="100%" border="1" cellspacing="2" cellpadding="2" summary="layout">
  <caption>
    List of links
  </caption>
  <tr>
    <th scope="col">&nbsp;</th>
  </tr>';

//Loop to display all the links and download
for ($i = 0; $i < $hrefs->length; $i++) {

       $href = $hrefs->item($i);
       $url = $href->getAttribute('href');

 //if real link
       if($url!='#')  

       {

    $testo.='<tr>
    <td>'.$url.'</td>
    </tr>';
 //take webpage
    $ch = curl_init();
    $timeout = 5;
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    $contents = curl_exec($ch);
    curl_close($ch);

 $filename = $url;

 $filename = $url;

 //save the file...
 $fh = fopen($filename,"w");
 fwrite($fh,$contents);
 fclose($fh);

//download automatically (better if without asking where... maybe in download folder)
header('Content-disposition: attachment; filename=' . $filename);
header("Content-Type: application/force-download");
header('Content-type: text/html');

       }

}

$testo.='</table>';

echo $testo;

?>

但不能工作!

0 个答案:

没有答案