Question

所以我想要使用casperjs抓取这个网站。该网站的网址结构为https://www.siteIwanttocrawl.com/?date=2016-04-16，此网址将显示2016年4月16日发布的网站的所有内容。

我需要抓取2015年4月1日到2016年3月31日的页面。所以我写了以下的casper脚本，从2015-04-01开始，不断添加一天并继续抓取。

var casper = require('casper').create({
        verbose: true,
        logLevel: 'debug'
    });
    casper.userAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0');

    var curdate = new Date(2015, 3, 1);
    var endDate = new Date(2016, 2, 31);
    var startUrl = 'http://www.google.com/';

    casper.spider = function(dateStr) {
        casper.open("http://www.siteIwanttocrawl.com/?date=" + dateStr).then(function() {

            self.echo("\n#########\n" + date + self.getCurrentUrl() + "\n#########\n")

            curdate.setDate(curdate.getDate() + 1);
            if (curdate <= endDate) {
                var month = curdate.getMonth() + 1;
                var date = curdate.getDate();
                if (month < 10) {
                    month = "0" + month;
                }
                if (date < 10) {
                    date = "0" + date;
                }

                dateStr = curdate.getFullYear() + "-" + month + "-" + date;
                casper.spider(dateStr);
            }

        });
    }

    // Start spidering
    casper.start(startUrl, function() {
        casper.spider('2015-04-01');
    });

    // Start the run
    casper.run();

我得到的回应是：

casperjs th2.js [info] [幻影]开始...... [info] [幻影]跑步套房：2个步骤 [debug] [幻影]打开网址：http://www.google.com/，HTTP GET [debug] [幻影]导航请求：url = http://www.google.com/，type = Other，willNavigate = true，isMainFrame = true [debug] [幻影]导航请求：url = http://www.google.co.in/?gfe_rd=cr&ei=ME0RV-XgDe-K8QfDpaGgBA，type = Other，willNavigate = true，isMainFrame = true [debug] [幻影]导航请求：url = https://www.google.co.in/?gfe_rd=cr&ei=ME0RV-XgDe-K8QfDpaGgBA&gws_rd=ssl，type = Other，willNavigate = true，isMainFrame = true [debug] [幻影]网址改为＆＃34; https://www.google.co.in/?gfe_rd=cr&ei=ME0RV-XgDe-K8QfDpaGgBA&gws_rd=ssl＆＃34; [debug] [phantom]成功注入了Casper客户端实用程序 [info] [幻影]步骤匿名2/2 https://www.google.co.in/?gfe_rd=cr&ei=ME0RV-XgDe-K8QfDpaGgBA&gws_rd=ssl（HTTP 200） [debug] [phantom]导航请求：url = about：blank，type = Other，willNavigate = true，isMainFrame = true [debug] [幻影]网址更改为＆＃34; about：blank＆＃34;

    > casperjs --version
    1.1.0-beta5

    > phantomjs --version
    2.1.1

casperjs错误：[debug] [phantom] url更改为＆＃34; about：blank＆＃34;

0 个答案: