Question

我有一个非常大的xml文件，是通过从理货中导出所有数据而获得的，我正在尝试使用网络抓取功能使用cheerio将元素从我的代码中提取出来，但是我在格式化或类似操作上遇到了麻烦。使用fs.readFileSync（）读取可以正常工作，console.log显示完整的xml文件，但是当我使用fs.writeFileSync写入文件时，它看起来像这样：

我的网络抓取代码输出空文件：

const cheerio = require('cheerio');
const fs = require ('fs');



var xml = fs.readFileSync('Master.xml','utf8');

             const htmlC = cheerio.load(xml);
                     var list = [];
             list = htmlC('ENVELOPE').find('BODY>TALLYMESSAGE>STOCKITEM>LANGUAGENAME.LIST>NAME.LIST>NAME').each(function (index, element) {
                list.push(htmlC(element).attr('data-prefix'));
             })
             console.log(list)
             fs.writeFileSync("data.html",list,()=>{})

Answer 1

您可以尝试检查以确保Cheerio不会解码所有HTML实体。更改：

const htmlC = cheerio.load(xml);

收件人：

const htmlC = cheerio.load(xml, { decodeEntities: false });

使用nodeJs进行XML抓取

1 个答案: