Question

我有一个简单但格式不正确的html页面，其中包含所有错误：

<HTML>
<head>
  <title>Official game sheet</title>
</head>
<body class="sheet">
</BODY>
</HTML>

试图在从此html解析的文档上应用xpath // title。

const document = parse5.parse(xmlString);
const xhtml = xmlser.serializeToString(document);
const doc = new dom().parseFromString(xhtml);
const select = xpath.useNamespaces({
  "x": "http://www.w3.org/1999/xhtml"
});
const nodes = select("//title", doc);
console.log(nodes);

尝试解决方案from here失败。返回的节点列表为空。

Here you can see the problem.

Answer 1

这里是@neptune，您不需要parse5或xmlser，仅需要xpath和xmldom。

var xpath = require('xpath');
var dom = require('xmldom').DOMParser;
var xmlString = `
<HTML>
<head>
  <title>Official game sheet</title>
  <custom>Here we are</custom>
<body class="sheet">
</BODY>
</HTML>`;

//const document = parse5.parse(xmlString);
//const xhtml = xmlser.serializeToString(document);
const doc = new dom().parseFromString(xmlString);
const nodes = xpath.select("//custom", doc);
//console.log(document);

console.log(nodes[0].localName + ": " + nodes[0].firstChild.data);
console.log("Node: " + nodes[0].toString());

Answer 2

请更正各行以获得标题

const nodes = select("//x:title//text()", doc);
console.log(nodes[0].data)

xhtml上的Node.js XPath选择器不起作用

2 个答案: