CasperJS - 在页面A上循环弹出链接并在抓取所有弹出窗口后返回到页面A?

时间:2016-06-16 04:18:56

标签: javascript web-scraping casperjs

我的页面A有9个弹出链接,我需要从每个弹出窗口中提取信息。在此之后,我需要返回到页面A并继续处理它。

我在for循环中使用了waitForPopup和withPopup,我认为这是打开和关闭每个弹出窗口的正确方法。但奇怪的是,我从第一个弹出窗口重复了9次。我花了整整一夜才把它弄好,但被卡住了。请看下面的代码。感谢。

更新 我发现链接状态确切的问题。 Casperjs - Is ThenClick Or Any 'then Function' Synchronized Function?

  

在我的脚本中,有一个while循环,我发现我的脚本将在thenClick之前跳转到while循环的结尾

供您参考:

1> The code is part of the var suite_1 = function() { ...my code... }

2> The regex /fulltext/ here is based on the popup links 
(only the Gid value varies), like url=http://www.***.com/fulltext_form.aspx&Gid=788...

3> I also have some debug info.

[debug] [phantom] Mouse event 'mousedown' on selector: xpath selector: (//a[@class="main-ljwenzi"])[9]
[debug] [phantom] Mouse event 'mouseup' on selector: xpath selector: (//a[@class="main-ljwenzi"])[9]
[debug] [phantom] Mouse event 'click' on selector: xpath selector: (//a[@class="main-ljwenzi"])[9]
[debug] [phantom] Navigation requested: url=http://www.***.com/fulltext_form.aspx&Gid=788, type=LinkClicked, willNavigate=true, isMainFrame=false

......而且......

[info] [phantom] Step anonymous 119/122 http://www.***.com/fulltext_form.aspx&Gid=252923 (HTTP 200)

The program is expected to open the link with Gid=788, but after some _step, it still open the Gid=252923, which is the first popup.

代码

this.then(function() {

    var count = this.getElementsInfo('a.main').length;
    this.echo(count + ' fulltext links found:', 'INFO');

    for (var i = 1; i < count + 1; i++) {

    //According to the output,
the program will run the this.capture 9 times,
before run the following lines of code 9 times

        this.capture('before the click - ' + i + '.png');

        this.thenClick(x('(//a[@class="main"])[' + i + ']'));

        this.waitForPopup(/fulltext/, function() {
            this.echo('Popup opened', 'INFO');
            this.echo(this.getTitle(), 'INFO');
        }, null, 10000);

        this.withPopup(/fulltext/, function() {

            this.echo(this.getTitle(), 'INFO');
            this.waitForSelector('#tbl_content_main', function() {
                // do something
            });
        });

        this.then(function() {
            this.echo('Back to the main page' + this.getTitle(), 'INFO_BAR');
        });
    }
});

1 个答案:

答案 0 :(得分:0)

我做了一个简单的例子来测试你的案例:

HTML(popup.html):

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Title</title>
</head>
<body>
<div>
    <button class="popup" id="casperjs" onclick="javascript:window.open('http://casperjs.org/')">casperjs</button>
    <button class="popup" id="phantomjs" onclick="javascript:window.open('http://phantomjs.org/')">phantomjs</button>
    <button class="popup" id="nodejs" onclick="javascript:window.open('https://nodejs.org/')">nodejs</button>
</div>
</body>
</html>

代码:

var casper = require('casper').create();

casper.start('http://localhost:63342/popup.html')

casper.then(function () {
    var count = casper.getElementsInfo('.popup').length
    var counter = 1
    for (var i = 0; i < count; i++) {
        casper.then(function () {
            var selector = 'body > div > button:nth-child(' + counter + ')';
            var text = casper.getElementInfo(selector).text
            casper.click(selector)
            casper.waitForPopup(text, undefined, undefined, 20000)
            casper.withPopup(text, function () {
                casper.echo(casper.getTitle())
            })
            counter ++
        })
    }
})

casper.run()

输出:

$ casperjs popup.js 
CasperJS, a navigation scripting and testing utility for PhantomJS and SlimerJS
PhantomJS | PhantomJS
Node.js

它按预期工作。

因此,您的代码有两个问题:

  1. 应该将for循环中的整个代码包装成then。
  2. 应该使用另一个计数器而不是i来制作选择器(您可以回显i的值,看它总是count。就我而言,那个&#39 ; s 3)。
  3. thenClick无关。您可以在我的示例代码中将click替换为thenClick,并且一切都保持不变......如果您查看thenClick的实现,您会发现我背后的原因声明...

    试试这个:

    this.then(function() {
    
        var count = this.getElementsInfo('a.main').length;
        this.echo(count + ' fulltext links found:', 'INFO');
        var counter = 1 // +
        for (var i = 1; i < count + 1; i++) {
            this.then(function() { //+
                this.capture('before the click - ' + counter + '.png');//edit
    
                this.thenClick(x('(//a[@class="main"])[' + counter + ']'));//edit
    
                this.waitForPopup(/fulltext/, function() {
                    this.echo('Popup opened', 'INFO');
                    this.echo(this.getTitle(), 'INFO');
                }, null, 10000);
    
                this.withPopup(/fulltext/, function() {
    
                    this.echo(this.getTitle(), 'INFO');
                    this.waitForSelector('#tbl_content_main', function() {
                        // do something
                    });
                });
    
                this.then(function() {
                    this.echo('Back to the main page' + this.getTitle(), 'INFO_BAR');
                });
                counter ++ //+
            })//+
        }
    });
    

    如果您对为什么要使用其他计数器感到好奇,可以在另一个SO post上阅读我的答案。