语法错误,插入" ... VariableDeclaratorId"完成FormalParameterList

时间:2015-10-29 06:35:36

标签: java web-crawler crawler4j

我在使用此代码时遇到了一些问题:

import edu.uci.ics.crawler4j.crawler.CrawlConfig;
import edu.uci.ics.crawler4j.crawler.CrawlController;
import edu.uci.ics.crawler4j.fetcher.PageFetcher;
import edu.uci.ics.crawler4j.robotstxt.RobotstxtConfig;
import edu.uci.ics.crawler4j.robotstxt.RobotstxtServer;

public class Controller {

     String crawlStorageFolder = "/data/crawl/root";
     int numberOfCrawlers = 7;

     CrawlConfig config = new CrawlConfig();
     config.setCrawlStorageFolder(crawlStorageFolder);
     /*
      * Instantiate the controller for this crawl.
      */
     PageFetcher pageFetcher = new PageFetcher(config);
     RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
     RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
     CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);

     /*
      * For each crawl, you need to add some seed urls. These are the first
      * URLs that are fetched and then the crawler starts following links
      * which are found in these pages
      */
     controller.addSeed("http://www.ics.uci.edu/~lopes/");
     controller.addSeed("http://www.ics.uci.edu/~welling/");
     controller.addSeed("http://www.ics.uci.edu/");
     /*
      * Start the crawl. This is a blocking operation, meaning that your code
      * will reach the line after this only when crawling is finished.
      */
     controller.start(MyCrawler.class, numberOfCrawlers);
 }

我收到以下错误:

  

"语法错误,插入" ... VariableDeclaratorId"去完成   FormalParameterList"上   config.setCrawlStrorageFolder(crawlStorageFolder)

2 个答案:

答案 0 :(得分:3)

你不能直接在类体中拥有这样的任意代码。它必须位于方法(或构造函数或初始化块)中。

答案 1 :(得分:0)

您的代码在类体中。把它放在一个主要的方法来运行。

   import edu.uci.ics.crawler4j.crawler.CrawlConfig;
    import edu.uci.ics.crawler4j.crawler.CrawlController;
    import edu.uci.ics.crawler4j.fetcher.PageFetcher;
    import edu.uci.ics.crawler4j.robotstxt.RobotstxtConfig;
    import edu.uci.ics.crawler4j.robotstxt.RobotstxtServer;

    public class Controller {
    public static void main(String[] args){

         String crawlStorageFolder = "/data/crawl/root";
         int numberOfCrawlers = 7;

         CrawlConfig config = new CrawlConfig();
         config.setCrawlStorageFolder(crawlStorageFolder);
         /*
          * Instantiate the controller for this crawl.
          */
         PageFetcher pageFetcher = new PageFetcher(config);
         RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
         RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
         CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);

         /*
          * For each crawl, you need to add some seed urls. These are the first
          * URLs that are fetched and then the crawler starts following links
          * which are found in these pages
          */
         controller.addSeed("http://www.ics.uci.edu/~lopes/");
         controller.addSeed("http://www.ics.uci.edu/~welling/");
         controller.addSeed("http://www.ics.uci.edu/");
         /*
          * Start the crawl. This is a blocking operation, meaning that your code
          * will reach the line after this only when crawling is finished.
          */
         controller.start(MyCrawler.class, numberOfCrawlers);
     }
    }