Question

我是PHP OOP的新手。我想知道如何构建这种应用程序。此应用程序用于刮擦大约100个不同的网站。

我有一个主要的类，“Scrap”，它处理所有不同网站的全局方法，在文件夹“Scripts”中我有Classes来处理我正在抓取的网站的特定方面。我有另一个名为“Lib”的文件夹，包含外部库。

让我直观地解释一下：

我有这个文件架构：

- Scrap.php
+ Scripts
               - Google.php
               - Yahoo.php
               - Stackoverflow.php
+ Lib
     + libScrap
               - LIB_parse.php
     + phpQuery
               - phpQuery.php
               - others files and folder...

Scrap.php包含以下内容：

<?php

// Includes
require('/lib/libScrap/LIB_parse.php');
require('/lib/phpQuery/phpQuery.php');

// Testing Scrap
$testing = new Scrap;
$testing->teste = $testing->getPage('http://www.yahoo.com','','off');
echo $testing->teste; 


class Scrap {

    public function __construct() {
        // do things!
    }

    /*
    * This method grabs the entire page(HTML) on given URL
    * Ex: $htmlgrab->teste = $htmlgrab->getPage('http://testing.com/ofertas/','','off');
    * Returns, the HTML of given URL
    */
    public function getPage($site, $proxy, $proxystatus) {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
        if ($proxystatus == 'on') {
            curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
            curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);
            curl_setopt($ch, CURLOPT_PROXY, $proxy);
        }
        curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
        curl_setopt($ch, CURLOPT_URL, $site);
        ob_start();      // prevent any output
        return curl_exec ($ch); // execute the curl command
        ob_end_clean();  // stop preventing output
        curl_close ($ch);
    }

    /*
    * 
    * 
    */
    public function getLinks() {
        // do things!
    }

    /*
    * This method grabs the page title.
    * Ex: <title>This is the page title</title>
    * Returns, "This is the page title"
    */
    public function getTitle() {
        // do things!
    }

}
?>

在“Scripts”文件夹中，我将有这样的文件：

<?php
require('../Scrap.php');

class Yahoo extends Scrap {

    public function doSomething() {
        // do things!
    }

}
?>

结束注释：我需要调用/实例化在“Scripts”文件夹中创建的所有类来废弃网站。我怀疑是实例化大约100个类的最佳方法。

如果你能给我一些关于如何设计这个的线索。

最诚挚的问候，

抱歉我的英文不好。

Answer 1

如果要在脚本文件夹中包含所有文件，为什么简单的循环不够用？我经常对包含许多脚本的项目做同样的事情。

$arr = glob ('scripts/*.php');
foreach ($arr as $script)
    include_once ($script);

<强>更新

就初始化每个对象而言......最好的选择可能是在每个类中声明一个对象......就像这样......

<?php
require('../Scrap.php');

class Yahoo extends Scrap {

    public function doSomething() {
        // do things!
    }
}

$yahooObj = new Yahoo(); //This is the addition

?>

这样，在您致电include_once('yahoo.php')后，您还会获得$yahooObj个对象。

雅挖？

Answer 2

假设每个类都有自己的源文件，您可以考虑“autoloading”。在我自己的项目中，我使用spl_autoload_register()函数来实现此目的，而不是使用__autoload（）。

Answer 3

我建议你像这样命名你的Scrap类：

Scrap_Yahoo
Scrap_Google
...

然后你做了Dutchie432建议的事情：

$scraps = array();
foreach (glob('scripts/*.php') as $script) {
  $scrap = 'Scrap_' . pathinfo($script, PATHINFO_FILENAME);
  require_once($script);
  $scraps[] = new $scrap();
}

然后你可以用这个阵列/工厂的废料做任何你想做的事情：

foreach ($scraps as $scrap) {
  $scrap->scrap();
}

然后你应该在你的Scrap类中定义一个抽象方法scrap()，并且不要忘记让这个类抽象化：

// file: Scrap.php
abstract class Scrap {
  public abstract scrap();
}

scripts/*目录中的每个类都将扩展类Scrap并定义了这个特定方法scrap()。

您可以更进一步实施Template Method设计模式。

Answer 4

您可能正在寻找的模式是Strategy或Command。

至于设置刮刀，你有多种选择。您可以将刮刀的路径硬编码到主Scrape类中，或者从配置文件中加载它们，或者使用自动加载或类映射，或者使用工厂或它们的组合。这真的取决于你。更重要的是决定哪些方法适用于您的应用程序。

如果您已经在使用自动加载功能，请确保找到您的铲运机。如果要添加其他自动装带器，请执行此操作。如果您更喜欢使用类映射来增加安全性和速度，那么请使用类映射等。讨论所有的利弊超出了这个问题的范围。如果您对此感兴趣，请查看this blog post about autoloading benchmarks（适用于ZF2但通常适用）。

由于所有刮痕都不可能相互依赖。我建议调查Gearman之类的东西，在不同的进程中异步运行它们，而不是在同一个脚本中顺序运行。然后，您的主脚本将仅使用适当的设置创建必要的Workers，并让它们在后台进程中运行。有一些examples in the manual，这里是another one from the same site as the benchmarking article

在PHP中设计OOP应用程序。如何？

4 个答案: