简介
我正在开展扫描网站漏洞威胁的项目。因此,我需要编写一个Spider来编制所有页面的索引。
我使用两个库的组合对Spider进行编程。
1) Symfony\Component\BrowserKit\Client //is a abstract class
2) mmerian\phpcrawl\PHPCrawler //is a concrete class with override function
为了使用它们,需要扩展它们,因为一个是抽象的而另一个是我需要的覆盖功能。
PHP不允许多重继承,有没有办法绕过这个问题?
Spider.php
<?php
namespace App\Core;
use PHPCrawler; //I need to inherit this object
use PHPCrawlerDocumentInfo;
use Symfony\Component\BrowserKit\Client as BaseClient;
class Spider extends BaseClient
{
private $url;
private $phpCrawler;
public function __construct($url){
parent::__construct();
//I have instantiated the object instead of inheriting it.
$this->phpCrawler = new PHPCrawler;
$this->url = $url;
}
public function setup(){
$this->phpCrawler->setURL($this->url);
$this->phpCrawler->addContentTypeReceiveRule("#text/html#");
$this->phpCrawler->addURLFilterRule("#\.(jpg|jpeg|gif|png|css)$# i");
}
public function start(){
$this->setup();
echo 'Starting spider' . PHP_EOL;
$this->phpCrawler->go();
$report = $this->phpCrawler->getProcessReport();
echo "Summary:". PHP_EOL;
echo "Links followed: ".$report->links_followed . PHP_EOL;
echo "Documents received: ".$report->files_received . PHP_EOL;
echo "Bytes received: ".$report->bytes_received." bytes". PHP_EOL;
echo "Process runtime: ".$report->process_runtime." sec" . PHP_EOL;
if(!empty($this->phpCrawler->links_found)){
echo 'not empty';
}
}
//Override - This doesn't work because it is not inherit
public function handleDocumentInfo(PHPCrawlerDocumentInfo $pageInfo){
$this->parseHTMLDocument($pageInfo->url, $pageInfo->content);
}
public function parseHTMLDocument($url, $content){
$crawler = $this->request('GET', $url);
$crawler->filter('a')->each(function (Crawler $node, $i){
echo $node->attr('href');
});
}
//This is a abstract function
public function doRequest($request){}
}
答案 0 :(得分:0)
我找到了解决问题的方法。
我已经使用自己的具体类扩展了抽象类(BrowserKit \ Client),就像BaseClient extends Client
一样。这使得可以在BaseClient
类中实例化Spider
而不是扩展它。此外,Spider
类现在可以使用PHPCrawler
进行扩展,以便可以调用覆盖函数handleDocumentInfo
。
解决方案的类结构
Core/
- BaseClient //extends BrowserKit\Client
- Spider //extends PHPCrawl