Question

  async traverse(url) {
    const ts = new TournamentScraper()
    const ms = new MatchScraper()
    const results = []
    const tournaments = await ts.run(url)
    for(let href of tournaments.map(t => t.href)){
      let matches = await ms.run(href)
      let pages = ms.getPages()
      let seasons = ms.getSeasons()
      //console.log(pages)
      //console.log(seasons)
      results.push(matches)
      for(let href of pages) {
        //console.log(href)
        matches = await ms.run(href)
        //console.log(matches)
        results.push(matches)
      }
    }

    return results
  }

TournamentScraper返回一个对象数组，通常如下所示：

{name: 'Foo', href: 'www.example.org/tournaments/foo/'}

该链接指向锦标赛上赛季的第一页。此页面包含指向其他季节的链接和分页符（如果有）。

MatchScraper的run返回一些数据，并设置实例的dom属性。 getPages()和getSeasons()使用此属性，每个属性都返回一个链接数组。

结果只包含第一批匹配的问题。我可以在控制台日志中看到第2页的匹配项，但在traverse返回时它们不在结果数组中。

我发现这个rule在for循环中反对等待。问题是，我必须等待ms.run(href)，因为它设置了dom，而getPages()和getSeasons()需要设置它，才能提取所需的链接。

Answer 1

我认为这应该有效。它利用Promise all而不是for循环

const run = href => ms.run(href);

async function getMatches(href) {  
  const out = [];
  const matches = await run(href);
  const pages = ms.getPages();

  out.push(matches);

  if(pages.length) {
    const pageResults = await Promise.all(pages.map(href => run(href)));
    out.push(...pageResults);
  }

  return out;
}

async function traverse(url) {
  const ts = new TournamentScraper();
  const ms = new MatchScraper();
  const tournaments = await ts.run(url)
  const matches = await Promise.all(tournaments.map(t => getMatches(t.href)));
  return matches.reduce((a, b) => { 
    a.push(...b);
    return a;
  }, []);
}

等待嵌套的...循环

1 个答案: