Question

我基本上是在编写一个小的网页抓取代码。最初我去了NodeJS + Request + Cheerio去抓网站，但后来我意识到Cheerio只适用于静态网页而不适用于那些使用JS加载DOM对象的人。

然后我去了ScraperJS，因为它有Static_Dynamic ContentScraper。我已按照此处的指定设置了所有依赖项：https://github.com/ruipgil/scraperjs

但是我的代码仍然无效，这是他们的github存储库上给出的示例代码。平台：Windows 7，通过CMD运行：node file_name.js

代码：

var scraperjs = require('scraperjs');
    console.log("a2");
scraperjs.DynamicScraper.create('https://news.ycombinator.com/')
    .scrape(function($) {
        return $(".title a").map(function() {
            return $(this).text();
        }).get();
    })
    .then(function(news) {
        console.log(news);
    })

Answer 1

根据this issue，它来自phantomJS，必须降级才能使DynamicScraper正常工作。

您必须将phantomJS降级到版本1.9.8，或使用4.8版本的NodeJS。

scraperjs：网页抓取代码不起作用

1 个答案: