Question

我正在开发一个Node.js web scraper应用程序，其代码如下所示，并尝试在功能上定位我的代码。见下文：

const Promise = require('bluebird');
const fetch = require('node-fetch');
const cheerio = require('cheerio');

const scrapeUri = uri => fetch(uri); // how should i pass the uri from here
const fetchURIs = URIs => Promise.all(URIs.map(scrapeUri));
const getBodies = pages => Promise.all(pages.map(page => page.text()));
const toSource = source => cheerio.load(source);
const shouldScrape = ($) => {
  const shouldIndex = $('meta[name="robots"]').attr('content');
  if (['noindex', 'nofollow'].indexOf(shouldIndex) !== -1) {
    return false;
  }
  return true;
};

const objectifyContent = ($) => { // to be accessed here
  return {
    meta: {
      index_timestamp: new Date(),
      title: $('title').html(),
      // TODO: this will totally fail in some instances, need to pass uri from initial instance
      uri: $('link[rel="canonical"]').attr('href'),
      description: $('meta[name="description"]').attr('content'),
    },
  };
};

在objectifyContent中，从初始scrapeUri访问uri的纯粹方式是什么，而不是通过访问规范来获取页面的网址？我知道一些方法我可以设置一个变量并让它继承范围，但我想知道在Node.js的上下文中是否有更清晰，更实用的方法。

来电者会像： fetchUris(myUris).then(values => getBodies(values).then(sources => res.send(sources.map(toSource).filter(shouldScrape).map(objectifyContent));)

Answer 1

修改此scrapeUri以通过承诺传递URI，并相应地修改处理程序

const scrapeUri = uri => fetch(uri).then(
  webpage => [uri, webpage]
)

在一系列功能中传递数据的最佳方法是什么？

1 个答案: