我正在尝试从由cherio.js解析的html文档中删除<!DOCTYPE html>
和<?xml ...>
。有可能做到吗?
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head></head>
<body>
<div>text</div>
</body>
</html>
答案 0 :(得分:1)
您可以简单地提取html。您需要做的就是再次添加html标签
const cheerio = require('cheerio');
const html = `
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head></head>
<body>
<div>text</div>
</body>
</html>
`;
const $ = cheerio.load(html);
console.log($('html').html());