更新

Question

我对NodeJs很陌生我试图从网站下载一些html来解析它并提供一些调试信息我尝试使用http模块（see this post）成功，但是当我打印chunk时这样：

var req = http.request(options, function(res) {
    res.setEncoding("utf8");
    res.on("data", function (chunk) {
       console.log(chunk);
    });
});

我没有获得所有使用ajax动态加载的html：

<div class="container">
  ::before
      <div class="row">
        ::before
....
</div>

还有其他module可以帮助我实现这一目标吗？

谢谢！

更新

我想与您分享我的成功（感谢@oKonyk）。

npm install phantomjs
创建脚本
使用@oKonyk建议的相同代码

请注意，如果您在本地运行脚本，则需要设置以下选项：

options = { 'web-security': 'no' };
phantom.create({parameters: options}, function() {});

Answer 1

为了捕获动态构建的页面，您必须在浏览器中呈现它们。使用node.js有几种方法可以做到这一点。

我建议使用phantomjs，这是一个所谓的无头浏览器。

为了证明这一概念，您可以在全球范围内安装npm install phantomjs -g。使用以下内容创建测试脚本“google.js”：

var page = require('webpage').create();
console.log('The default user agent is ' + page.settings.userAgent);
page.settings.userAgent = 'SpecialAgent';
page.open('http://www.google.org', function(status) {
  if (status !== 'success') {
    console.log('Unable to access network');
  } else {
    var html = page.evaluate(function() {
      return document.getElementsByTagName('html')[0].innerHTML;
    });
    console.log(html);
  }
  phantom.exit();
});

然后将其作为phantomjs google.js

运行

您将获得页面的整个DOM（至少<html>个标签内的所有内容），这与您使用http模块获得的原始响应不同。

稍后您可以在节点项目中使用phantom（更多信息here）。

使用nodejs加载动态html

更新

1 个答案: