如何使用cheerio.js从文档中删除

时间:2019-03-20 15:44:56

标签: javascript node.js web-scraping cheerio

我正在尝试从由cherio.js解析的html文档中删除<!DOCTYPE html><?xml ...>。有可能做到吗?

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html>
  <head></head>
  <body>
    <div>text</div>
  </body>
</html>

1 个答案:

答案 0 :(得分:1)

您可以简单地提取html。您需要做的就是再次添加html标签

const cheerio = require('cheerio');

const html = `
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html>
  <head></head>
  <body>
    <div>text</div>
  </body>
</html>
`;
const $ = cheerio.load(html);
console.log($('html').html());