Question

可能重复：
Finding and Printing all Links within a DIV

我正在尝试制作迷你爬虫..当我指定一个网站..它确实是file_get_contents（）..然后获取我想要的数据..我已经做过了...现在我想添加代码这使它能够在它所在的网站上找到任何外部链接..并获取数据..

基本上......而不是我指定一个网站..只需跟随外部链接并获取数据（如果可用）......

这是我的......

提前感谢..

     <?php

        $link = strip_tags($_GET['s']);

        $path_info = parse_url($link); 
        $name= $path_info['host'];
        $name= str_replace('www.','', $name);


        $original_file = @file_get_contents($link);

          if($original_file  === false) { 
    die("$link does not exist");  
    }
        $data= preg_match("stuff", $original_file, $m); 
echo $data;

Answer 1

使用HTML DOM PARSER

// Create DOM from URL
$html = file_get_html('http://www.example.com/');

// Find all links 
$allURLs = array();
foreach($html->find('a') as $element) 
       $allURLs[] = $element->href;

现在 $ allURLs 包含网页的所有网址，您可以使用循环为每个链接file_get_contents()。

Answer 2

如果我是你，我会将这段代码分成两部分

第一部分：---

  will fetch the content and display the link

第二部分：---

        Second part will be called when I specify which link i want to display
        i will specify this external link back to same file recursively.

所以基本上你的代码看起来像这个

     first part --> 1)get the data
                    2)parse the link 
                   if( link is chosen )
                    {
                       run current file again with selected link passed
                     }

php查找外部链接并获取数据

2 个答案: