Cheerio:如何在标签中获取文本数组

时间:2019-01-24 10:31:32

标签: javascript node.js cheerio

HTML来源:

<td bgcolor="#ffffbb" colspan=2><font face="Verdana" size=1>2644-3/4<br>QPSK<br><font color="darkgreen">&nbsp;&nbsp;301</font> - 4864</td>

我想在标签 td 中获取文本数组。像这样

[“ 2644-3 / 4”,“ QPSK”,“ 301-4864”]

应该使用哪种方法更好?

谢谢!

2 个答案:

答案 0 :(得分:0)

您的HTML无法解析,因此我认为解决此问题的唯一方法是修复它,然后使用正则表达式选择信息:

// The fixed HTML. The td is wrapped in table/tr elements
// Ideally there should be a </font> tag too but Cheerio seems to ignore that 
const html = '<table><tr><td bgcolor="#ffffbb" colspan=2><font face="Verdana" size=1>2644-3/4<br>QPSK<br><font color="darkgreen">&nbsp;&nbsp;301</font> - 4864</td></tr></table>';
const $ = cheerio.load(html);

// Grab the cell
const $td = $('td');

// (\d{4}-\d\/\d) - matches first group
// ([A-Z]{4}) - matches the second group
// (?:.*) - non-capture group
// (\d{3} - \d{4}) - matches the final group
const re = /(\d{4}-\d\/\d)([A-Z]{4})(?:.*)(\d{3} - \d{4})/;

// Match the text against the regex and remove the full match
const arr = $td.text().match(re).slice(1);

// Outputs `["2644-3/4","QPSK","301 - 4864"]`
console.log(arr);

答案 1 :(得分:0)

让我们开始:

let td = '<td bgcolor="#ffffbb" colspan=2><font face="Verdana" size=1>2644-3/4<br>QPSK<br><font color="darkgreen">&nbsp;&nbsp;301</font> - 4864</td>'

怎么样:

td.split('<br>').map(part => cheerio.load(part).text().trim())
// Array(3) ["2644-3/4", "QPSK", "301 - 4864"]