Question

所以我有一系列网址

我想从每个中提取html（我正在使用restler node.js库）

然后选择一些数据通过jquery进行操作（我正在使用cheerio node.js库）

我所使用的代码，但是通过许多URL来复制提取的数据。我在Node中这样做，但怀疑这是一个我不太了解的广义Javascript问题。

url.forEach(function(ugh){
    rest.get(ugh).on('complete', function(data) {
        $ = cheerio.load(data);
        prices.push($(".priceclass").text());
        //i only want this code to happen once per item in url array
        //but it happens url.length times per item
        //probably because i don't get events or async very well
    });
});

因此，如果'url'数组中有3个项目，那么带有我想要的数据的'prices'数组将有9个项目。我不想要

- 编辑：

添加了一个计数器，用于验证“完整”回调是否正在为每个数组项执行数组长度。

x=0;
url.forEach(function(ugh){
    rest.get(ugh).on('complete', function(data) {
        var $ = cheerio.load(data);
        prices.push($(".priceclass").text());
        console.log(x=x+1);
    });
});

控制台输出1 2 3 4 5 6 7 8 9

我在想我可能会犯这个错误。我一直试图将一些数字推到一个数组上，然后在回调之外对该数组做一些事情。

无论如何，似乎很清楚＆gt; 1个restler eventlisteners根本不会一起工作。

也许改写这个问题会有所帮助：我如何抓取大量网址，然后对这些数据采取行动？

目前正在调查请求＆amp;异步库，来自熄灭的node.io library

的代码

Answer 1

要回答重新提到的问题，scramjet如果您使用ES6 +和我假设您执行的节点，则非常适合：

我如何抓取大量网址，然后对该数据采取行动？

安装软件包：

npm install scramjet node-fetch --save

Scramjet适用于流 - 它将读取您的网址列表，并使每个网址成为您可以使用的流，就像使用数组一样简单。 node-fetch是一个遵循标准Fetch Web API的简单节点模块。

一个简单的例子，它也从文件中读取url，假设你每行存储一个：

const {StringStream} = require("scramjet");
const fs = require("fs")
const fetch = require("node-fetch");

fs.createReadStream(process.argv[2])     // open the file for reading
    .pipe(new StringStream())            // redirect it to scramjet stream
    .split("\n")                         // split line by line
    .map((url) => fetch(url))            // get the URL from the endpoint
    .map((resp) => JSON.parse(resp))     // parse the response
    .toArray()                           // accumulate the data into an Array
    .then(
         (data) => doYourStuff(data),    // do the calculations
         (err) => showErrorMessage(data)
    )

由于scramjet的工作方式，您无需担心错误处理（自动捕获所有错误）和管理同步请求。如果你可以通过url解析文件url，那么你也可以使这个内存和资源有效 - 因为它不会准备好并尝试一次获取所有项目，但它会并行完成一些工作。

scramjet docs中有更多示例和完整的API说明。

对数组中的每个元素执行异步函数？

1 个答案:

我如何抓取大量网址，然后对该数据采取行动？