HTML来源:
<td bgcolor="#ffffbb" colspan=2><font face="Verdana" size=1>2644-3/4<br>QPSK<br><font color="darkgreen"> 301</font> - 4864</td>
我想在标签 td 中获取文本数组。像这样
[“ 2644-3 / 4”,“ QPSK”,“ 301-4864”]
应该使用哪种方法更好?
谢谢!
答案 0 :(得分:0)
您的HTML无法解析,因此我认为解决此问题的唯一方法是修复它,然后使用正则表达式选择信息:
// The fixed HTML. The td is wrapped in table/tr elements
// Ideally there should be a </font> tag too but Cheerio seems to ignore that
const html = '<table><tr><td bgcolor="#ffffbb" colspan=2><font face="Verdana" size=1>2644-3/4<br>QPSK<br><font color="darkgreen"> 301</font> - 4864</td></tr></table>';
const $ = cheerio.load(html);
// Grab the cell
const $td = $('td');
// (\d{4}-\d\/\d) - matches first group
// ([A-Z]{4}) - matches the second group
// (?:.*) - non-capture group
// (\d{3} - \d{4}) - matches the final group
const re = /(\d{4}-\d\/\d)([A-Z]{4})(?:.*)(\d{3} - \d{4})/;
// Match the text against the regex and remove the full match
const arr = $td.text().match(re).slice(1);
// Outputs `["2644-3/4","QPSK","301 - 4864"]`
console.log(arr);
答案 1 :(得分:0)
让我们开始:
let td = '<td bgcolor="#ffffbb" colspan=2><font face="Verdana" size=1>2644-3/4<br>QPSK<br><font color="darkgreen"> 301</font> - 4864</td>'
怎么样:
td.split('<br>').map(part => cheerio.load(part).text().trim())
// Array(3) ["2644-3/4", "QPSK", "301 - 4864"]