p木偶桌

时间:2019-02-26 10:56:27

标签: javascript node.js web-scraping puppeteer

我有以下带有puppeter的脚本可以正常工作,该代码提取了有关表的所有信息。

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  const tableRows = await page.$$('table > tbody tr');

    await page.goto("https://www.mismarcadores.com/baloncesto/espana/liga-endesa/partidos/");

    const time = await page.evaluate(() => {

        const tables = Array.from(document.querySelectorAll('table tr .time'));
        return tables.map(table => table.textContent)
     });

    const teamHome = await page.evaluate(() => {
        const tables = Array.from(document.querySelectorAll('table tr .team-home'));
        return tables.map(table => table.textContent)
     });

     const teamAway = await page.evaluate(() => {
        const tables = Array.from(document.querySelectorAll('table tr .team-away'));
        return tables.map(table => table.textContent)
     });


     for (let i = 0; i < time.length; i++) {
        console.log(time[i]);
        console.log(teamHome[i]);
        console.log(teamAway[i]);
     }  

  await browser.close();
})();

现在,我尝试以更好的方式创建它,并且我有以下代码。

(async () => {
  const browser = await puppeteer.launch({ headless: false });
  const page = await browser.newPage();
    await page.goto("https://www.mismarcadores.com/baloncesto/espana/liga-endesa/partidos/");
    console.log("started evalating");
    var data = await page.evaluate(() => {
      Array.from(
        document.querySelectorAll('table tr')
      ).map(row => {
        return {
          time: row.querySelector(".time"),
          teamHome: row.querySelector(".team-home"),
          teamAway: row.querySelector(".team-away")
        };
      });
    });
  console.log(data);
})();

当我尝试执行第二个脚本时,我收到并未定义。

结果将是将第一个脚本传递给第二个脚本。

有人可以帮助我吗?

1 个答案:

答案 0 :(得分:1)

您需要更多指定tr元素(例如通过添加.stage-scheduled类)并返回.textContent属性,而不是元素本身。试试这个:

    var data = await page.evaluate(() => {
      return Array.from(
        document.querySelectorAll('table tr.stage-scheduled')
      ).map(row => {
        return {
          time: row.querySelector(".time").textContent,
          teamHome: row.querySelector(".team-home").textContent,
          teamAway: row.querySelector(".team-away").textContent,
        };
      });
    });