使用风暴爬虫无法进行爬网

时间:2021-05-20 05:23:24

标签: elasticsearch stormcrawler

我们第一次运行了下面的命令,然后它开始爬行。 /app/elasticsearch/storm-crawler/storm/bin/storm jar /app/elasticsearch/storm-crawler/crawler/storm-crawler-1.13.jar org.apache.storm.flux.Flux /app/elasticsearch/storm-crawler /crawler/es-injector.flux --remote /app/elasticsearch/storm-crawler/storm/bin/storm jar /app/elasticsearch/storm-crawler/crawler/storm-crawler-1.13.jar org.apache.storm.flux.Flux /app/elasticsearch/storm-crawler /crawler/es-crawler.flux --remote

上个月我们重启了整个 linux box,重启后,如果我们更新 seed.txt 文件,爬虫服务就不会爬行。再次重新启动 linux box 后,我运行了上面的命令,但它给出了错误。

如果我在命令手册下运行它会爬行。

/app/elasticsearch/storm-crawler/storm/bin/storm jar /app/elasticsearch/storm-crawler/crawler/storm-crawler-1.13.jar org.apache.storm.flux.Flux /app/elasticsearch/风暴爬虫/爬虫/es-injector.flux /app/elasticsearch/storm-crawler/storm/bin/storm jar /app/elasticsearch/storm-crawler/crawler/storm-crawler-1.13.jar org.apache.storm.flux.Flux /app/elasticsearch/storm-crawler /crawler/es-crawler.flux

有人可以建议我是否要抓取种子文件中新添加的条目。

0 个答案:

没有答案