我写了一个小爬虫,我想知道如何正确地将结果分配给被调用的实例。
我的构造函数设置了一些基本属性并调用下一个方法,该方法包含一个可能调用foreach循环的if循环。当一切都完成后,我回应我的结果。
这完全正常,但我不想回显我的json_encode数据。我更希望底部的$ crawler变量包含json_encode数据。
这是我的代码:
<?php
class Crawler {
private $url;
private $class;
private $regex;
private $htmlStack;
private $pageNumber = 1;
private $elementsArray;
public function __construct($url, $class, $regex=null) {
$this->url = $url;
$this->class = $class;
$this->regex = $regex;
$this->curlGet($this->url);
}
private function curlGet($url) {
$curl = curl_init();
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_URL, $url);
$this->htmlStack .= curl_exec($curl);
$response = curl_getinfo($curl, CURLINFO_HTTP_CODE);
$this->paginate($response);
}
private function paginate($response) {
if($response === 200) {
$this->pageNumber++;
$url = $this->url . '?page=' . $this->pageNumber;
$this->curlGet($url);
} else {
$this->CreateDomDocument();
}
}
private function curlGetDeep($link) {
$curl = curl_init();
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_URL, $link);
$product = curl_exec($curl);
$dom = new Domdocument();
@$dom->loadHTML($product);
$xpath = new DomXpath($dom);
$descriptions = $xpath->query('//div[contains(@class, "description")]');
foreach($descriptions as $description) {
return $description->nodeValue;
}
}
private function CreateDomDocument() {
$dom = new Domdocument();
@$dom->loadHTML($this->htmlStack);
$xpath = new DomXpath($dom);
$elements = $xpath->query('//article[contains(@class, "' . $this->class . '")]');
foreach($elements as $element) {
$title = $xpath->query('descendant::div[@class="title"]', $element);
$title = $title->item(0)->nodeValue;
$link = $xpath->query('descendant::a[@class="link-overlay"]', $element);
$link = $link->item(0)->getAttribute('href');
$link = 'https://www.gall.nl' . $link;
$image = $xpath->query('descendant::div[@class="image"]/node()/node()', $element);
$image = $image->item(1)->getAttribute('src');
$description = $this->curlGetDeep($link);
if($this->regex) {
$title = preg_replace($this->regex, '', $title);
}
if(!preg_match('/\dX(\d+)?/', $title)) {
$this->elementsArray[] = [
'title' => $title,
'link' => $link,
'image' => $image,
'description' => $description
];
}
}
echo json_encode(['beers' => $this->elementsArray]);
}
}
$crawler = new Crawler('https://www.gall.nl/shop/speciaal-bier/', 'product-block', '/\d+\,?\d*CL/i');
Github链接以获取更多概述: https://github.com/stephan-v/crawler/blob/master/ArticleCrawler.php
希望有人可以帮助我,因为我在这里对如何正常工作感到困惑。
答案 0 :(得分:3)
你无法在构造函数中执行此操作。但是您可以将json分配给类属性并在另一个方法中返回它。这是唯一合乎逻辑的选择。
答案 1 :(得分:1)
我太慢了......伙计。所以我只是在这里用代码扩展ardabeyazoglu答案:
更改echo json_encode(['beers' => $this->elementsArray]);
进入$this->json = json_encode(['beers' => $this->elementsArray]);
。
然后
$crawler = new Crawler(....);
var_dump($crawler->json);
您可以添加一个访问器方法,但公共属性也可以。