变量1：注入其他参数作为构造函数参数

Question

我想将参数传递给crawler4j中的should Visit（）方法。我在github上看到了文档库页面的示例，该示例使用Factory方式，但我听不懂。请有人提供示例示例以实现该目标

Answer 1

变量1：注入其他参数作为构造函数参数

除了shouldVisit(...)的方法参数之外，其他参数也需要作为构造函数参数传递给每个WebCrawler类。

这意味着，您可以通过使用factory类来实现以下目的：

MyWebCrawler.class，带有两个自定义参数（customArgument1和customArgument2）：

public class MyWebCrawler extends WebCrawler {

    private final String customArgument1;
    private final String customArgument2;

    public MyWebCrawler(String customArgument1, String customArgument2) {
        this.customArgument1 = customArgument1;
        this.customArgument2 = customArgument2; 
    }

    @Override
    public boolean shouldVisit(Page referringPage, WebURL url) {
        String href = url.getURL().toLowerCase();
        return customArgument1.equals(href) || customArgument2.equals(href);;
    }

    @Override
    public void visit(Page page) {
        //do something
    }
}

要使其正常工作，factory应该是这样的：

public class MyCrawlerFactory implements CrawlController.WebCrawlerFactory<MyWebCrawler> {

        public MyCrawlerFactory newInstance() throws Exception {
        return new MyCrawlerFactory("some argument", "some other argument");
    }
}

每次创建MyWebCrawler的新实例时，您都可以传递自定义参数。

要使用工厂，您可以像这样从CrawlController开始抓取过程：

controller.start(new MyCrawlerFactory(), numberOfCrawlers);

可以找到类似的工作示例at the official GitHub repository。

变体2：使用`CrawlController#getCustomData()`（已弃用）

您可以在customData对象上使用CrawlController将其他数据注入到Web爬网程序对象中。但是，这是不建议使用的方法，在以后的crawler4j版本中可能会删除。

如何将args路径转换为crawler4j中的shouldVisit（）方法？

1 个答案:

变量1：注入其他参数作为构造函数参数

变体2：使用`CrawlController#getCustomData()`（已弃用）

如何将args路径转换为crawler4j中的shouldVisit（）方法？

1 个答案:

变量1：注入其他参数作为构造函数参数

变体2：使用CrawlController#getCustomData()（已弃用）

变体2：使用`CrawlController#getCustomData()`（已弃用）