我需要从另一个类调用Crawler4j。我没有使用Controller类中的main方法,而是使用了一个名为setup的简单方法。
class Controller {
public void setup(String seed) {
try {
String rootFolder = "data/crawler";
int numberOfCrawlers = 1;
CrawlConfig config = new CrawlConfig();
config.setCrawlStorageFolder(rootFolder);
config.setPolitenessDelay(300);
config.setMaxDepthOfCrawling(1);
PageFetcher pageFetcher = new PageFetcher(config);
RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);
controller.addSeed(seed);
controller.setCustomData(seed);
controller.start(MyCrawler.class, numberOfCrawlers);
} catch(Exception e) {
e.printStackTrace();
}
}
}
试图在另一个类中调用它,但是会出错。
Controller c = new Controller();
c.setup(seed);
是否可以在Controller类中没有main方法并仍然运行crawler4j。简而言之,我想知道如何将爬虫集成到已经有主方法的应用程序中。帮助将不胜感激。
答案 0 :(得分:0)
运行Crawler应该没有问题。下面的代码经过测试,可以像预期的那样工作:
public class Controller {
public void setup(String seed) {
try {
String rootFolder = "data/crawler";
int numberOfCrawlers = 4;
CrawlConfig config = new CrawlConfig();
config.setCrawlStorageFolder(rootFolder);
config.setPolitenessDelay(300);
config.setMaxDepthOfCrawling(2);
PageFetcher pageFetcher = new PageFetcher(config);
RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);
controller.addSeed(seed);
controller.setCustomData(seed);
controller.start(BasicCrawler.class, numberOfCrawlers);
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) throws Exception {
Controller crawler = new Controller();
crawler.setup("http://www.ics.uci.edu/");
}
}
答案 1 :(得分:0)
抱歉,我忘记了一个访问修饰符" public"在班级名称之前。因此错误。谢谢你的回答。