Question

我试图想出基本的网络抓取工具。堆栈会跟踪将来要访问的所有URL。

在堆栈变空之前，想要获取网页中使用的所有href的列表。试图使用arguments.calee，但它返回：

RangeError：超出最大调用堆栈大小

的JavaScript

"checkStack": function(test) {
    //check if the stack is empty
    if (!stack.isEmpty()) {
        var newAddress = stack.pop();
        console.log("trying to navigate to: ", newAddress);
        return test.remote.get(newAddress)
            .setFindTimeout(240000)
            //.sleep(4000)
            .findAllByTagName("a")
            .getAttribute("href")
            .then(function(hrefs) {
                console.log("got hrefs: " + hrefs.length);
                assert.isArray(hrefs, 'Links not an array');
                checkAddressValidity(hrefs, 0);
            })
            .then(function() {
                //call checkStack recursively
                checkStack(test);
            }.bind(test));

    }
},
...

Answer 1

在命令链（或任何Promise链，实际上！）中执行递归的简单方法是将堆栈保持在闭包状态，然后调用以递归方式执行工作的方法作为Promise回调，直到堆栈耗尽为止。一旦堆栈被解析，undefined将返回next而不是另一个promise，这表示递归的结束：

checkStack: function (test) {
  var remote = test.remote;
  var stack = [];

  function next() {
    var newAddress = stack.pop();
    if (newAddress) {
      return remote.get(newAddress)
        .findAllByTagName('a')
        .getAttribute('href')
        .then(function (hrefs) {
          // do whatever
        })
        .then(next);
    }
  }

  return next();
}

如何递归调用此函数？

1 个答案: