木偶:如何使用上下文节点评估XPath?

时间:2020-11-09 13:39:23

标签: node.js xpath puppeteer

来自docenter image description here

所以我尝试了这段代码:

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('http://personalitycore.com/a.html');
    let p = (await page.$x('/html/body/p'))[0]
    console.log("Var[p] Class: " + p.constructor.name)
    console.log("Var[p] Tag: " + await p.evaluate(e => e.tagName, p))
    let spans = await p.$x('/*')
    for (var i = 0; i < spans.length; i++) {
        console.log("Var[spans] Tag: " + await spans[i].evaluate(e => e.tagName, spans[i]))
        console.log("Var[spans] Text: " + await spans[i].evaluate(e => e.textContent, spans[i]))
    }
    await browser.close();
})();

http://personalitycore.com/a.html的HTML是:

<head>
</head>
<body>
<p>
text_node1
<span>span_node1</span>
text_node2
<span>span_node2</span>
</p>
</body>

结果:

/usr/local/bin/node example.js
Var[p] Class: ElementHandle
Var[p] Tag: P
Var[spans] Tag: HTML
Var[spans] Text: 

text_node1
span_node1
text_node2
span_node2

我很困惑。根据文档,pElementHandle,并且评估xpath /*应该得到[TextNode, Span, TextNode, Span]

但是它返回了整个页面,标签为HTML

所以,我的问题:

  1. 我的代码中是否有任何错误,所以我没有得到预期的结果?
  2. 如何使用上下文节点评估XPath?在我的示例中,我想在标签/*上评估p

1 个答案:

答案 0 :(得分:1)

您只需要将上下文节点符号(点)添加到XPath:'./*'。没有它,'/*'的意思是“文档的所有子元素”,即html元素。

import puppeteer from 'puppeteer';

const browser = await puppeteer.launch();

const html = `
  <!doctype html>
  <html>
    <head>
    </head>
    <body>
      <p>
        text_node1
        <span>span_node1</span>
        text_node2
        <span>span_node2</span>
      </p>
    </body>
  </html>`;

try {
  const page = await browser.newPage();
  await page.goto('http://personalitycore.com/a.html');

  const [p] = await page.$x('/html/body/p');
  console.log("Var[p] Class: " + p.constructor.name);
  console.log("Var[p] Tag: " + await p.evaluate(e => e.tagName, p));

  const spans = await p.$x('./*');
  for (let i = 0; i < spans.length; i++) {
      console.log("Var[spans] Tag: " + await spans[i].evaluate(e => e.tagName, spans[i]));
      console.log("Var[spans] Text: " + await spans[i].evaluate(e => e.textContent, spans[i]));
  }
} catch(err) { console.error(err); } finally { await browser.close(); }

输出:

Var[p] Class: ElementHandle
Var[p] Tag: P
Var[spans] Tag: SPAN
Var[spans] Text: span_node1
Var[spans] Tag: SPAN
Var[spans] Text: span_node2